HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Datamonkey Server >> Datamonkey feedback >> How to confirm GARD result?
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1228187857

Message started by Sundy on Dec 1st, 2008 at 7:17pm

Title: How to confirm GARD result?
Post by Sundy on Dec 1st, 2008 at 7:17pm
Hi!
I am using GARD to check recombination for my data. After running GARD on Datamonkey, two breakpoints were found.
My question is: How to verify they are real recombination or not?

The three inferred segment topologies are really different, but I can not figure out which sequence was recombinated from which. So I don't know whether these breakpoints are real recombination sites.

I checked some references, some said Shimodaira and Hasegawa test was used. but they did not say how to perform this test.
Could anyone tell me how to perform SH test? Is there any program to run this test?

I don't have too much knowledge about statistics. It is difficult for me to understand the orginial paper about SH test (Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons of loglikelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:1114–1116.)

Thanks!

Sundy

Title: Re: How to confirm GARD result?
Post by Sergei on Dec 2nd, 2008 at 7:39am
Dear Sundy,

HYPHY includes a GARDProcessor.bf module (see e.g.
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login). It takes GARD output files and performs a number of tests on the partitions, including pairwise KH (= SH in this setting) tests.

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 2nd, 2008 at 9:05am
Dear Sergei,
Thank you very much for your so quick reply.

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 2nd, 2008 at 11:41am
Dear Sergei,

Sorry to trouble you again.
I am trying to use the HyPhy
Standard Analyses>Recombination>GARDProcessor.bf>
"Please load a nucleotide data file"

After I loaded my sequences data. It required me
"please load a GA partition analysis result file"

My question is "How can I get the GA partition analysis result file"?

Sorry, I am a new learner of HyPhy.
Thanks.

Sundy

Title: Re: How to confirm GARD result?
Post by Sergei on Dec 2nd, 2008 at 1:12pm
Dear Sundy,

It's a file produced by GARD. If you run the online version, it is the one accessible via the [Raw] link, otherwise it is one of the files (ending in splits) output by GARD.bf

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 2nd, 2008 at 8:22pm
Dear Sergei,

Thank you so much for your quick reply.
I think I finally get some idea about HyPhy ^-^. Really appreciate you!

But I need your help again.
Because I am not sure how to read the results.
I got the following results, after running HyPhy using the GA partition gotten from [Raw] link.

++++++++++++++++
HYPHY 1.0020080508beta(MP) for Windows (Win32) console saved on Tue Dec  2 23:05:11 2008

Loaded 22 model templates from E:\Software\HYPHY_Win32(1)\HYPHY_Win32\SubstitutionModels
Loaded 12 genetic code tables from E:\Software\HYPHY_Win32(1)\HYPHY_Win32\GeneticCodes

2 Intel x86 architecture  processors detected.
You can change the number of processors utilized by HyPhy via the HyPhy settings dialog.

Please cite S.L. Kosakovsky Pond, S. D. W. Frost and S.V. Muse. (2005) HyPhy: hypothesis testing using phylogenies. Bioinformatics 21: 676-679 if you use HyPhy in a publication
If you are a new HyPhy user, the tutorial located at Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login may be a good starting point.

Loaded 2 partitions

Sequences :8
Sites     :6462
Variable  :818


f(A) = 0.289326
f(C) = 0.194309
f(G) = 0.206747
f(T) = 0.309618


c-AIC = 25997
Log Likelihood = -12977.4383434101;
Shared Parameters:
GT=0.160283=0.160283
CT=2.43=2.43
CG=0.0611013=0.0611013
AT=0.402894=0.402894
AC=0.158056=0.158056
alpha=0.010005=0.010005
betaQ=1.59101=1.59101
betaP=0.681693=0.681693

Tree givenTree=((((124_Bat_SARS_273:0.0502577,125_Bat_SARS_279:0.0502527)Node3:0.0610553,130_PC03_SZ3:0.000585082)Node2:0.000196157,3_HP03E_GZ02:0.000312407)Node1:0,(15_HP03M_BJ02:0.000620001,31_HP03L_Tor2:0.000150991)Node8:0.000150991,(106_HP04_GZ0402:0.000620001,PC04_PC4_136_AY613949:0.000312407)Node11:0.00108391);

Mean divergence : 5.25787%


Fitting a single-tree, multiple partition model
Log Likelihood = -10325.9738761955;
Shared Parameters:
S_1=1.40178
betaP=0.681693=0.681693
betaQ=1.59101=1.59101
alpha=0.010005=0.010005
AC=0.158056=0.158056
AT=0.402894=0.402894
CG=0.0611013=0.0611013
CT=2.43=2.43
GT=0.160283=0.160283

Tree tree_0=((((124_Bat_SARS_273:0.0448527,125_Bat_SARS_279:0.0353642)Node3:0.0525925,130_PC03_SZ3:0.000655097)Node2:0.00020522,3_HP03E_GZ02:0.000171432)Node1:0,(15_HP03M_BJ02:0.000343505,31_HP03L_Tor2:0.000171517)Node8:0,(106_HP04_GZ0402:0.000516532,PC04_PC4_136_AY613949:0.000343924)Node11:0.000688349);
Tree tree_1=((((124_Bat_SARS_273:0.0628736,125_Bat_SARS_279:0.0495728)Node3:0.073723,130_PC03_SZ3:0.000918301)Node2:0.000287673,3_HP03E_GZ02:0.00024031)Node1:0,(15_HP03M_BJ02:0.000481519,31_HP03L_Tor2:0.000240429)Node8:0,(106_HP04_GZ0402:0.000724064,PC04_PC4_136_AY613949:0.000482105)Node11:0.000964913);


Fitting a mutilple tree, multiple partition model
Log Likelihood = -10315.2583699075;
Shared Parameters:
betaP=0.681693=0.681693
betaQ=1.59101=1.59101
alpha=0.010005=0.010005
AC=0.158056=0.158056
AT=0.402894=0.402894
CG=0.0611013=0.0611013
CT=2.43=2.43
GT=0.160283=0.160283

Tree tree_0=(((((124_Bat_SARS_273:0.0502045,125_Bat_SARS_279:0.0345148)Node4:0.0483357,130_PC03_SZ3:0.000949729)Node3:9.48274e-05,(106_HP04_GZ0402:0.000783531,PC04_PC4_136_AY613949:0)Node8:0.000782496)Node2:0,3_HP03E_GZ02:0)Node1:0,15_HP03M_BJ02:0.000260537,31_HP03L_Tor2:0);
Tree tree_1=((((124_Bat_SARS_273:0.0457419,125_Bat_SARS_279:0.0533774)Node3:0.0870497,130_PC03_SZ3:0)Node2:0.000706954,31_HP03L_Tor2:0.000707831)Node1:0,(3_HP03E_GZ02:0.000706903,15_HP03M_BJ02:0.000707295)Node8:0,(106_HP04_GZ0402:0,PC04_PC4_136_AY613949:0.00142015)Node11:0.000707243);


Versus the single partition model: c-AIC = 20698.9
Delta AIC = 5298.13



Versus the single tree/multiple partition model: Delta AIC = -2.78213

Partition 1 : 3837 sites
Partition 2 : 1422 sites


Fitting tree 1 to partition 1
Log Likelihood = -7368.17401533097;
Shared Parameters:
betaP=0.681693=0.681693
betaQ=1.59101=1.59101
alpha=0.010005=0.010005
AC=0.158056=0.158056
AT=0.402894=0.402894
CG=0.0611013=0.0611013
CT=2.43=2.43
GT=0.160283=0.160283

Tree aTree=(((((124_Bat_SARS_273:0.0502044,125_Bat_SARS_279:0.0345152)Node4:0.0483353,130_PC03_SZ3:0.00094992)Node3:9.48549e-05,(106_HP04_GZ0402:0.000783767,PC04_PC4_136_AY613949:0)Node8:0.000782828)Node2:0,3_HP03E_GZ02:0)Node1:0,15_HP03M_BJ02:0.000260704,31_HP03L_Tor2:0);


Fitting tree 2 to partition 1
Log Likelihood = -7368.1740159421;
Shared Parameters:
betaP=0.681693=0.681693
betaQ=1.59101=1.59101
alpha=0.010005=0.010005
AC=0.158056=0.158056
AT=0.402894=0.402894
CG=0.0611013=0.0611013
CT=2.43=2.43
GT=0.160283=0.160283

Tree aTree=((((124_Bat_SARS_273:0.0502049,125_Bat_SARS_279:0.034515)Node3:0.0483355,130_PC03_SZ3:0.000949661)Node2:9.47293e-05,31_HP03L_Tor2:0)Node1:0,(3_HP03E_GZ02:0,15_HP03M_BJ02:0.000260946)Node8:0,(106_HP04_GZ0402:0.000783819,PC04_PC4_136_AY613949:0)Node11:0.000782895);

KH Testing partition 1
Tree 2 base LRT = 1.22224e-06. p-value = 0.4817


Fitting tree 1 to partition 2
Log Likelihood = -2947.08435430648;
Shared Parameters:
betaP=0.681693=0.681693
betaQ=1.59101=1.59101
alpha=0.010005=0.010005
AC=0.158056=0.158056
AT=0.402894=0.402894
CG=0.0611013=0.0611013
CT=2.43=2.43
GT=0.160283=0.160283

Tree aTree=(((((124_Bat_SARS_273:0.0457418,125_Bat_SARS_279:0.0533773)Node4:0.0870492,130_PC03_SZ3:0)Node3:0.000707349,(106_HP04_GZ0402:0,PC04_PC4_136_AY613949:0.00141987)Node8:0.000707103)Node2:0,3_HP03E_GZ02:0.000706771)Node1:0,15_HP03M_BJ02:0.000707307,31_HP03L_Tor2:0.000707425);


Fitting tree 2 to partition 2
Log Likelihood = -2947.08435419031;
Shared Parameters:
betaP=0.681693=0.681693
betaQ=1.59101=1.59101
alpha=0.010005=0.010005
AC=0.158056=0.158056
AT=0.402894=0.402894
CG=0.0611013=0.0611013
CT=2.43=2.43
GT=0.160283=0.160283

Tree aTree=((((124_Bat_SARS_273:0.0457417,125_Bat_SARS_279:0.0533769)Node3:0.0870492,130_PC03_SZ3:0)Node2:0.000706985,31_HP03L_Tor2:0.000707755)Node1:0,(3_HP03E_GZ02:0.00070661,15_HP03M_BJ02:0.00070704)Node8:0,(106_HP04_GZ0402:0,PC04_PC4_136_AY613949:0.00141997)Node11:0.000707026);

KH Testing partition 2
Tree 1 base LRT = 2.32327e-07. p-value = 0.4839
A total of 0/1 significant couplings
Mean splits identify: 1

+++++++++++++++++

Do we just see these parts?

-------------------
KH Testing partition 1
Tree 2 base LRT = 1.22224e-06. p-value = 0.4817

KH Testing partition 2
Tree 1 base LRT = 2.32327e-07. p-value = 0.4839
A total of 0/1 significant couplings
Mean splits identify: 1
----------------------

Does these mean that the KH test indicates there is no significant evidence (P>0.05) of recombination? The breakpoints are not real recombination sites?

Thank you very much!!
Best regards

Sundy

Title: Re: How to confirm GARD result?
Post by Sergei on Dec 2nd, 2008 at 10:21pm
Dear Sundy,

You are looking at the correct output for KH tests.

The trees actually appear to be identical (up to some really short internal branches). If GARD returned a significant result, this is due to different branch lengths between partitions, not different topologies. This could be due to ancient recombination events, or due to spatial rate variation.

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 4th, 2008 at 1:28pm
Dear Sergei,

Thank you very much for your reply and interpretation.
I analyzed another data set of mine. The datamonkey analysis indicated there are 16 breakpoints. Then I run HyPhy using the spilts file. And I got the results as attached file.

Last part as following:
+++++++++
KH Testing partition 15
Tree 14 base LRT = 0.875094. p-value = 0.0001
A total of 14/14 significant couplings
Mean splits identify: 0.0571429
+++++++++++++++++++++++++++++++

I found that all the p-value = 0.0001 (KH test for 15 partition)

Does this mean that all the breakpoints may be real recombination sites? If it is right, there are too many recombination events.

Or do I need check some other parameters to verify which breakpoints are really significant?

Thank you very mcuh!
Best regards,

Sundy

http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KH_test.zip (3 KB | )

Title: Re: How to confirm GARD result?
Post by Sergei on Dec 7th, 2008 at 6:06pm
Dear Sundy,

Based on the information you provide, all breakpoints are indeed significant (in the KH sense at least). Not sure what else you can do easily - why do you think there are too many breakpoints?

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 10th, 2008 at 11:13am
Dear Sergei,

Thank you very much for your replay.
I am just thinking 16 breakpoints means there are 16 times of recombination. It is strange.

I have another question:
If GARD identified only one breakpoint, could KH test verify the significant by using HYPHY?

I have a dataset as attached file, I run GARD and GARDProcessor.bf, and got the results as attached file "30 SARS-Spike HYPHY.txt"

The last part is as following:
KH Testing partition 1
A total of 0/0 significant couplings
Mean splits identify: -nan

I am confused with the "0/0 significant couplings" Is it significant or not? thanks,
and was the meaning of "Mean splits identify: -nan" I know sometime it is a number such as Mean splits identify: 0.116667"

Best regards,

Sundy

Title: Re: How to confirm GARD result?
Post by Sergei on Dec 10th, 2008 at 11:15am
Dear Sundy,

I think that GARDProcessor.bf may be mishandling the case of one breakpoint at reporting stage. Can you send me your files so that I can confirm and fix the issue?

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 10th, 2008 at 11:15am
I forgot the attachment
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=30S.zip (6 KB | )

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 10th, 2008 at 11:25am
Dear Sergei,

Thank you very much.
Actually, I have three datasets as the attached file. I am trying to detect the recombination for them.

Could you pleas help me to check? I would greatly appreciate!

Best regards,

Sundy
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=For_recombination_test.zip (21 KB | )

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 10th, 2008 at 11:34am
Because the sequences of Bat-SARS are quite difference with the other sequences, I tried to performed GARD for 4 BAT-SARS sequences and other sequences separately.
I got lots of breakpoints for 4 BAT-SARS sequences in each dataset, and only 1 or 0 breakpoint for the other sequences.

Thank you very much!

Sundy

Title: Re: How to confirm GARD result?
Post by Sergei on Dec 11th, 2008 at 8:19am
Dear Sundy,

The reason you were getting 'nan' with GARDProcessor.bf was because it expects the _splits file to start with two blank likes (for no other reason as to be compatible with an old formatting quirk in GARD) and if they are not there - it will not read the data correctly. E.g. here is the _splits file for your G4-SARS-59S file.


Quote:
0-1490
(((((((((((((PC03_SZ13_AY304487_0305:0,PC03_SZ1_AY304489_0305:0):0.000638888,PC03_SZ3_AY304486_0305:0.000639661):0.00201
583,((Bat_SARS_273_DQ648856_S:0.285893,((Bat_SARS_279_DQ648857_S:0.123077,Bat_SARS_Rp3_DQ071615_S:0.0872019):0.030949,Ba
t_SARS_HKU3_DQ022305_S:0.0962264):0):1.6738,((((HP03E_GZ02_AY390556:0,HP03L_GD01_AY278489:0):0.00123832,((HP03E_HGZ8L1_A
_AY394981_CG:0,HP03E_ZS_A_AY394997:0):0.000614304,HP03M_JMD_AY394988_CG:0):0.000614595):0,HP03M_BJ02_AY278487:0.00123514
):0.000623404,(((((HP03E_HGZ8L1_B_AY394982_CG:0,(((HP03M_BJ04_AY279354:0.000613459,HP03M_HZS2_Fb_AY394987:0):0,((((((((H
P03L_Tor2_NC_004718:0,(HP03L_Sin3765V_AY559084:0,HP03L_Sin845_AY559093:0.000612755):0.000612974):0,HP03L_AS_AY427439:0):
0,HP03L_Frank_AY291315:0):0,HP03L_GZ_B_AY394978:0):0,HP03L_SH_QXC1_AY463059:0):0,HP03L_GZ_C_AY394979:0.00061234):0,HP03L
_CUHK_LC2_AY394999:0.00061388):0,HP03L_Sino1_11_AY485277:0.000613116):0.00061234):0.00061234,(HP03M_GZ50_AY304495:0,HP03
M_GZ_A_AY394977_CG:0.00061234):0.000612479):0):0,HP03E_HSZ_Bb_AY394985:0):0,HP03E_HSZ_Cb_AY394986:0):0,HP03M_HZS2_C_AY39
4992:0):0,HP03M_BJ03_AY278490:0.000613006):0.00124613):0.00198581):0):0.00329353,(((PC04_A021_AY687356_S:0,PC04_HC_SZ_DM
1_AY545915:0):0.000644733,PC04_B024_AY687360_S:0):0.00128397,(((PC04_A022_AY686863:0,PC04_B029_AY687361_S:0):0,PC04_cive
t007_AY572034:0):0.000630181,(PC04_C025_AY687370_S:0,PC04_HC_SZ_61_AY515512:0.000627921):0):0):0.00130383):0,(PC04_A001_
AY687354_S:0,PC04_C013_AY687365_S:0.00061531):0):0.000616592,(PC04_PC4_13_AY613948:0.000636761,PC04_HC_SZ_79_AY545914:0.
000618928):0.000618842):0,(PC04_C029_AY687372_S:0.000596247,CFB04_SZ_AY545919:0.00126154):0.000636761):0,((((PC04_PC4_11
5_AY627044_S:0,(HP04_GZ0401_AY568539:0.0012382,HP04_GZ0402_AY613947:0.000612695):0):0,PC04_PC4_241_AY627048_S:0):0,PC04_
B012_AY687359_S:0):0.000623871,PC04_PC4_137_AY627045_S:0):0.00061709):0,PC04_C018_AY687368_S:0.00061521):0,PC04_PC4_136_
AY613949:0):0,PC04_PC4_199_AY627047:0):0,PC04_A013_AY687355_S:0):0,(PC04_PC4_205_AY613952_S:0.000615257,PC04_C028_AY6873
71_S:0):0,PC04_B033_AY687362_S:0.000614959)
1491-3764
(((((((((((((((((PC03_SZ13_AY304487_0305:0,PC03_SZ1_AY304489_0305:0.0019):0.000970853,PC03_SZ3_AY304486_0305:0):0.001926
45,(Bat_SARS_273_DQ648856_S:0.171533,((Bat_SARS_279_DQ648857_S:0.0436005,Bat_SARS_Rp3_DQ071615_S:0.0409013):0.0288179,Ba
t_SARS_HKU3_DQ022305_S:0.133101):0.0316166):0.126907):0.000946349,((((HP03E_HGZ8L1_A_AY394981_CG:0.000471595,HP03L_GD01_
AY278489:0.000948031):0,HP03E_ZS_A_AY394997:0):0.000474592,(((((((((HP03E_HGZ8L1_B_AY394982_CG:0,(((HP03M_BJ03_AY278490:
0.000959516,((HP03M_BJ04_AY279354:0.000477548,((HP03M_HZS2_C_AY394992:0,HP03L_Frank_AY291315:0):0.000477139,(HP03L_Tor2_
NC_004718:0,HP03L_SH_QXC1_AY463059:0.00144432):0.000476845):0):0,HP03L_GZ_B_AY394978:0.00047722):0):0,HP03L_GZ_C_AY39497
9:0.000959516):0,HP03L_CUHK_LC2_AY394999:0.000477047):0):0,HP03M_BJ02_AY278487:0):0,HP03M_HZS2_Fb_AY394987:0):0,HP03L_AS
_AY427439:0):0,HP03L_Sino1_11_AY485277:0):0,(HP03L_Sin3765V_AY559084:0.000481251,HP03L_Sin845_AY559093:0.000482511):0.00
0958175):0.000477356,HP03M_GZ50_AY304495:0):0,HP03M_GZ_A_AY394977_CG:0):0,HP03M_JMD_AY394988_CG:0):0):0.000477349,(HP03E
_HSZ_Bb_AY394985:0,HP03E_HSZ_Cb_AY394986:0.000477285):0):0.00047524):0,HP03E_GZ02_AY390556:0):0.0014118,HP04_GZ0402_AY61
3947:0.00194111):0.000990094,PC04_PC4_137_AY627045_S:0.000481253):0.000472576,PC04_PC4_199_AY627047:0):0.000465501,(PC04
_PC4_241_AY627048_S:0.000472816,PC04_B012_AY687359_S:0.000943099):0.000475402):0,(PC04_B024_AY687360_S:0.000465766,PC04_
C018_AY687368_S:0):0.000465963):0,(PC04_A021_AY687356_S:0.000481484,(((PC04_A022_AY686863:0.000465646,PC04_civet007_AY57
2034:0):0,PC04_HC_SZ_61_AY515512:0):0,PC04_HC_SZ_79_AY545914:0):0):0.000465906):0,PC04_PC4_136_AY613949:0.000465727):0,P
C04_PC4_13_AY613948:0):0,PC04_B029_AY687361_S:0):0,PC04_HC_SZ_DM1_AY545915:0):0,((HP04_GZ0401_AY568539:0,PC04_A013_AY687
355_S:0.000465665):0,PC04_C029_AY687372_S:0):0):0,((((PC04_PC4_115_AY627044_S:0,PC04_C028_AY687371_S:0.000935622):0,PC04
_PC4_205_AY613952_S:0):0,PC04_B033_AY687362_S:0):0,CFB04_SZ_AY545919:0):0,((PC04_A001_AY687354_S:0,PC04_C025_AY687370_S:
0.000465657):0,PC04_C013_AY687365_S:0):0)


It is actually quite common to lose power in breakpoint detection as you increase the number of sequences. Think of it as signal dilution: 1/4 recombinant sequences in a 4-taxon analysis is more obvious than 1/50 recombinant sequences in a 50-taxon analysis (of course it depends on the sequences involved, and how strong the recombination signal is). Also, while it is possible to infer a 4-taxon tree from a short sequence (e.g. 100bp) somewhat accurately, doing so for a 50-taxon tree is less likely - hence you lose breakpoints in the noise.

Cheers,
Sergei
HTH,
Sergei

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 11th, 2008 at 9:46am
Dear Sergei,
Thank you very much for your interpretation.
Yes, I see. I think I should delete some of the very similar sequence in my datasets, then run GARD. Thanks.

So "it expects the _splits file to start with two blank likes"
Does this means that if there is only one breakpoint, GARDProcessor.bf cannot verify the significant of the breakpoint?

And another question,
After running GARD, the result web-page has a such sentence: "To process RAW files you need to download this HyPhy batchfile, run it locally in HyPhy and input the raw file when prompted."
Could you please tell how to use the "HyPhy batchfile".
Do I need use the "HyPhy batchfile" when I run GARDProcessor.bf?

Thank you!
Best regards,

Sundy

Title: Re: How to confirm GARD result?
Post by Sergei on Dec 11th, 2008 at 3:21pm
Dear Sundy,

You can run GARDProcessor.bf on the output with a single breakpoint: just make sure the _splits file starts with two blanks lines. GARDProcessor.bf is the same as the batchfile that Datamonkey instructs you do download.

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 11th, 2008 at 6:19pm
Dear Sergei,

Thank you so much!!
I miss understood your meaning and I get the idea now. :)

Best regards,

Sundy

Title: Re: How to confirm GARD result?
Post by Sundy on Dec 11th, 2008 at 6:47pm
Dear Sergei,

I got another problem (as enclosed jpg), when I run a data as attached file with GARDProcessor.bf.

Could you please help me check it?
Thank you!!

Sundy
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=6Spike.zip (58 KB | )

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 12th, 2009 at 3:21pm
This site is really helpful. THanks.
I have two question. 18 sequences were analyzed by GARD for recombination breakpoints. 6 breakpoints were found. Then I use the raw file, splits file, and recombinationprocessor for KH testing in HyPhy. When testing the last partition 7, the P-value is 0.1809, while other 6 partitions p-value for KH testing all <0.001, but results showed "A total of 4/6 significant couplings". One partition is gone, 6 left, should it be 5 breakpoints?

From websites, it is said that AICc>10 is significant, but I found breakpoints with an improvement of AIC of 2.7 could be significant, because p-value of either side of sequences for KH testing is less than 0.001. So, I want to know how I can make sure the results? See AICc or just see P-value?

Thanks. I am a fresh on this statistic field. Thanks in advance.



Title: Re: How to confirm GARD result?
Post by Sergei on Jan 12th, 2009 at 3:39pm
Dear kevintz,

A AIC-c improvement of 10 is only a guideline (and probably a rather conservative one), hence it's no surprise that a KH test shows topological discordance for a smaller AIC-c. You can have a major AIC-c improvement but no topological difference due to differing branch lengths (imagine that a part of the alignment evolves 10 times faster than the rest of the alignment - for example a hypervariable region in a gene), hence the need to confirm discordance using the KH test. A strong c-AIC difference coupled with low KH p-values is probably the most reliable indicator that there indeed is recombination in the sample.

As far as your example goes, there should indeed be 5/6 couplings if no other KH test is insignificant. Can you perhaps paste in the entire output of the processor file for me to look at?

Sergei



Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 13th, 2009 at 1:42am
Thanks for your reply.
the following is the file.
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KHtest_of_GARD.bf (26 KB | )

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 13th, 2009 at 6:55am
Another question, GARD for breakpoints showed 2 breakpoints with AICc value of 45.53, and 63.1. KH test indicates good evidence of the two breakpoints. I submitted the GARD results directly to datamonkey by press the link below the results, but the datamonkey gave just one partition. What happened? I am confused.

Thanks.

Best regards

Title: Re: How to confirm GARD result?
Post by Sergei on Jan 13th, 2009 at 7:34am
Dear kevinz,

I am afraid I don't understand the question. How did you obtain the two breakpoints originally before submitting them to Datamonkey?

Sergei

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 13th, 2009 at 7:59am
Sorry for confusion. I use another dataset for GARD analysis, and get two breakpoints.

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 13th, 2009 at 7:59am
Here is the link.Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

And the KH test results
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KH_test-2.bf (11 KB | )

Title: Re: How to confirm GARD result?
Post by Sergei on Jan 13th, 2009 at 8:07am

kevintz wrote on Jan 13th, 2009 at 1:42am:
Thanks for your reply.
the following is the file.


Another p-value (partition 5, 0.0014) becomes greater than the threshold of 0.01 following a Bonferroni correction for multiple tests; hence that partition is judged insignificantly supported.

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by Sergei on Jan 13th, 2009 at 8:10am

kevintz wrote on Jan 13th, 2009 at 7:59am:
Sorry for confusion. I use another dataset for GARD analysis, and get two breakpoints.


This happened because Datamonkey removed a duplicate sequence from the alignment and the trees provided by GARD now had too many sequences. I meant to add some code for Datamonkey to handle these cases automatically and trim removed species from the tree but never got around to it. Please resubmit the analysis to GARD with the reduced alignment and all should work well.

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 13th, 2009 at 8:24am

Sergei wrote on Jan 13th, 2009 at 8:10am:

kevintz wrote on Jan 13th, 2009 at 7:59am:
Sorry for confusion. I use another dataset for GARD analysis, and get two breakpoints.


This happened because Datamonkey removed a duplicate sequence from the alignment and the trees provided by GARD now had too many sequences. I meant to add some code for Datamonkey to handle these cases automatically and trim removed species from the tree but never got around to it. Please resubmit the analysis to GARD with the reduced alignment and all should work well.

Cheers,
Sergei


Does that mean I should remove the duplicate sequence, and then use left sequences for GARD analysis and followed datamonkey selection analysis?

Title: Re: How to confirm GARD result?
Post by Sergei on Jan 13th, 2009 at 12:50pm
Dear kevintz,

You're correct. Please filter the duplicate sequences. You can use HyPhy to do it for you (see 2.3.3 of Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login)

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 13th, 2009 at 2:32pm
Dear Sergi,
I get it. The program runs well.
Really thanks.

One more basic question, does the AICc value refer to that from GARD result profile, or from the KH test result file? I find both of them include a AICc, but with different value. Which one is better?

Best regards,
kevintz

Title: Re: How to confirm GARD result?
Post by Sergei on Jan 14th, 2009 at 12:35pm
Dear kevintz,

KH test file should always have a somewhat better AICc than the GARD file, because GARD only estimates several model parameters (e.g. nucleotide biases and site-to-site rate variation parameters) only once (assuming no recombination) to speed up the run and the KH analysis estimates those parameters assuming recombination. The difference in AICc should not be very large however.

Sergei

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 14th, 2009 at 11:01pm
THanks

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 18th, 2009 at 10:41am

Sergei wrote on Jan 13th, 2009 at 8:07am:

kevintz wrote on Jan 13th, 2009 at 1:42am:
Thanks for your reply.
the following is the file.


Another p-value (partition 5, 0.0014) becomes greater than the threshold of 0.01 following a Bonferroni correction for multiple tests; hence that partition is judged insignificantly supported.

Cheers,
Sergei


Dear Sergei,
Sorry to bother you again. :) :)
I read your article on journal Gene 397 (2007) 38-50, entitled "Evolution of the interferon alpha gene family in eutherian mammals", and found Table 1 refered to KH tesing with all the P-value <0.01. While in my data set, the KH testing resluts show the p-value much smaller as far as 0.001. As you said, Bonferroni correction might be performed. So, my question is does the KH test doing Bonferroni correction automatically? if not, how? if yes, why not to refer in the article?

Thanks a million.

Best regards,
Tu Zeng

Title: Re: How to confirm GARD result?
Post by Sergei on Jan 18th, 2009 at 5:33pm
Dear Tu Zeng,

The numbers in the paper are quoted following a Bonferroni correction. The KH test provides raw numbers for the p-values, but includes a Bonferroni correction when counting the number of significant partitions.

HTH,
Sergei

Title: Re: How to confirm GARD result?
Post by kevintz on Jan 20th, 2009 at 2:38am

Sergei wrote on Jan 18th, 2009 at 5:33pm:
Dear Tu Zeng,

The numbers in the paper are quoted following a Bonferroni correction. The KH test provides raw numbers for the p-values, but includes a Bonferroni correction when counting the number of significant partitions.

HTH,
Sergei


Thanks it helps me a lot.
And How I can know the value of the Bonferroni correction through Hyphy or KH test? whether it set to 0.05 or 0.01 after KH test?  
Thanks a lot.
Best,
kevintz.

PS: tuzeng is my name, sorry for confusion.

Title: Re: How to confirm GARD result?
Post by Sergei on Jan 20th, 2009 at 9:27am
Dear tuzeng,

The processor file considers a Bonferroni-corrected p<=0.01 to be significant.

Cheers,
Sergei

Title: Re: How to confirm GARD result?
Post by Sundy on Feb 12th, 2009 at 7:48am

kevintz wrote on Jan 18th, 2009 at 10:41am:
Dear Sergei,
Sorry to bother you again. :) :)
I read your article on journal Gene 397 (2007) 38-50, entitled "Evolution of the interferon alpha gene family in eutherian mammals", and found Table 1 refered to KH tesing with all the P-value <0.01. While in my data set, the KH testing resluts show the p-value much smaller as far as 0.001. As you said, Bonferroni correction might be performed. So, my question is does the KH test doing Bonferroni correction automatically? if not, how? if yes, why not to refer in the article?

Thanks a million.

Best regards,
Tu Zeng


Dear Sergei,

I am very interested in the table 1 of above paper, and I also several basice questions aobut it.

1. Where can I find the AICc improvement in my KH test result file? for example from my attached file. Is it Delta AIC = 280.92?
2. Where can I find the LHS vs. RHS and RHS vs LHS. For example from my attached file.

Thank you very much!!

Sundy


Due to the character limitation, I deleted the "Tree aTree=[...]" and attached the entire file.


****************
Loaded 5 partitions

Sequences :10
Sites     :3666
Variable  :539


f(A) = 0.285291
f(C) = 0.227325
f(G) = 0.205476
f(T) = 0.281908


c-AIC = 17279.5
Log Likelihood = -8614.56089283971;
Shared Parameters:
GT=0.221445=0.221445
CT=1.90112=1.90112
CG=0.10546=0.10546
AT=0.471935=0.471935
AC=0.281374=0.281374
alpha=0.0102205=0.0102205
betaQ=3.72107=3.72107
betaP=4.31789=4.31789

Tree givenTree=(((Bat_SARS_273_DQ648856:0.0496989,((Bat_SARS_279_DQ648857:0.0282901,Bat_SARS_Rp_DQ071615:0.0244375)Node5:0.00575514,Bat_SARS_HKU3_DQ022305:0.0359314)Node4:0.0267917)Node2:0.0374558,3_HP03E_GZ02:0.000232858)Node1:0.000864494,(130_PC03_SZ13:0.000540745,(110_PC04_PC4_136:0.00192447,106_HP04_GZ0402:0.00137285)Node12:0.0016568)Node10:0.00110442,(15_HP03M_BJ02:0.000823405,31_HP03L_Tor2:0.000270703)Node15:0.000270703);
Mean divergence : 6.49111%


Fitting a single-tree, multiple partition model
Log Likelihood = -8578.32862962121;
Shared Parameters:
S_1=0.92014
S_2=0.382852
S_3=0.560115
S_4=0.375399
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree tree_0=



Fitting a mutilple tree, multiple partition model
Log Likelihood = -8403.83189116324;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree tree_0=


Versus the single partition model: c-AIC = 16998.6
Delta AIC = 280.92



Versus the single tree/multiple partition model: Delta AIC = 216.577

Partition 1 : 381 sites
Partition 2 : 343 sites
Partition 3 : 1080 sites
Partition 4 : 722 sites
Partition 5 : 1140 sites


Fitting tree 1 to partition 1
Log Likelihood = -1181.64638188419;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=


Fitting tree 2 to partition 1
Log Likelihood = -1195.56564111684;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=

KH Testing partition 1
Tree 2 base LRT = 27.8385. p-value = 0.025


Fitting tree 1 to partition 2
Log Likelihood = -977.448251979626;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=


Fitting tree 2 to partition 2
Log Likelihood = -968.982471135205;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=


Fitting tree 3 to partition 2
Log Likelihood = -972.312843728842;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=

KH Testing partition 2
Tree 1 base LRT = 16.9316. p-value = 0.0001
Tree 3 base LRT = 6.66075. p-value = 0.0001


Fitting tree 2 to partition 3
Log Likelihood = -2258.19915763715;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=


Fitting tree 3 to partition 3
Log Likelihood = -2244.90399830231;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=


Fitting tree 4 to partition 3
Log Likelihood = -2285.99836857422;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=

KH Testing partition 3
Tree 2 base LRT = 26.5903. p-value = 0.0001
Tree 4 base LRT = 82.1887. p-value = 0.0001


Fitting tree 3 to partition 4
Log Likelihood = -1640.47340028827;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=


Fitting tree 4 to partition 4
Log Likelihood = -1632.78789828693;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=

Fitting tree 5 to partition 4
Log Likelihood = -1639.94300504171;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=

KH Testing partition 4
Tree 3 base LRT = 15.371. p-value = 0.0001
Tree 5 base LRT = 14.3102. p-value = 0.0001


Fitting tree 4 to partition 5
Log Likelihood = -2375.98828232171;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=


Fitting tree 5 to partition 5
Log Likelihood = -2375.51119382618;
Shared Parameters:
betaP=4.31789=4.31789
betaQ=3.72107=3.72107
alpha=0.0102205=0.0102205
AC=0.281374=0.281374
AT=0.471935=0.471935
CG=0.10546=0.10546
CT=1.90112=1.90112
GT=0.221445=0.221445

Tree aTree=

KH Testing partition 5
Tree 4 base LRT = 0.954177. p-value = 0.0001
A total of 3/4 significant couplings
Mean splits identify: 0.26

***********************



http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KH_test_001.zip (2 KB | )

Title: Re: How to confirm GARD result?
Post by Sergei on Feb 12th, 2009 at 2:29pm
Dear Sundy,

The first Delta AIC is the one you want (the second tests a slightly different model, and it is not mentioned in the paper).

The LHS vs RHS values are reported in fragments like this:


Code (]
KH Testing partition 1
Tree 2 base LRT = 27.8385. p-value = 0.025
[/code):



This specific output states that the KH p-value for the LHS (partition 1) vs the tree from the RHS (partition 2) is 0.025. Hence this would be the LHS v RHS value from the paper. Looking further down, you will find

[code]
KH Testing partition 2
Tree 1 base LRT = 16.9316. p-value = 0.0001


which states that the RHS (partition 2) has KH p-value of 0.0001 vs the tree from the partition 1 (LHS) -- this is the RHS vs LHS value.

If you have multiple breakpoints, a pair of p-values like this will be reported for each breakpoint.

HTH,
Sergei


Title: Re: How to confirm GARD result?
Post by Sundy on Feb 12th, 2009 at 2:59pm
Dear Sergei,

Thank you so much!
I got it
Cheers,

Sundy

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.