HyPhy message board | |
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Datamonkey Server >> Datamonkey feedback >> How to confirm GARD result? http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1228187857 Message started by Sundy on Dec 1st, 2008 at 7:17pm |
Title: How to confirm GARD result? Post by Sundy on Dec 1st, 2008 at 7:17pm
Hi!
I am using GARD to check recombination for my data. After running GARD on Datamonkey, two breakpoints were found. My question is: How to verify they are real recombination or not? The three inferred segment topologies are really different, but I can not figure out which sequence was recombinated from which. So I don't know whether these breakpoints are real recombination sites. I checked some references, some said Shimodaira and Hasegawa test was used. but they did not say how to perform this test. Could anyone tell me how to perform SH test? Is there any program to run this test? I don't have too much knowledge about statistics. It is difficult for me to understand the orginial paper about SH test (Shimodaira, H., and M. Hasegawa. 1999. Multiple comparisons of loglikelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16:1114–1116.) Thanks! Sundy |
Title: Re: How to confirm GARD result? Post by Sundy on Dec 2nd, 2008 at 9:05am
Dear Sergei,
Thank you very much for your so quick reply. |
Title: Re: How to confirm GARD result? Post by Sundy on Dec 2nd, 2008 at 11:41am
Dear Sergei,
Sorry to trouble you again. I am trying to use the HyPhy Standard Analyses>Recombination>GARDProcessor.bf> "Please load a nucleotide data file" After I loaded my sequences data. It required me "please load a GA partition analysis result file" My question is "How can I get the GA partition analysis result file"? Sorry, I am a new learner of HyPhy. Thanks. Sundy |
Title: Re: How to confirm GARD result? Post by Sergei on Dec 2nd, 2008 at 1:12pm
Dear Sundy,
It's a file produced by GARD. If you run the online version, it is the one accessible via the [Raw] link, otherwise it is one of the files (ending in splits) output by GARD.bf Cheers, Sergei |
Title: Re: How to confirm GARD result? Post by Sergei on Dec 2nd, 2008 at 10:21pm
Dear Sundy,
You are looking at the correct output for KH tests. The trees actually appear to be identical (up to some really short internal branches). If GARD returned a significant result, this is due to different branch lengths between partitions, not different topologies. This could be due to ancient recombination events, or due to spatial rate variation. Cheers, Sergei |
Title: Re: How to confirm GARD result? Post by Sundy on Dec 4th, 2008 at 1:28pm
Dear Sergei,
Thank you very much for your reply and interpretation. I analyzed another data set of mine. The datamonkey analysis indicated there are 16 breakpoints. Then I run HyPhy using the spilts file. And I got the results as attached file. Last part as following: +++++++++ KH Testing partition 15 Tree 14 base LRT = 0.875094. p-value = 0.0001 A total of 14/14 significant couplings Mean splits identify: 0.0571429 +++++++++++++++++++++++++++++++ I found that all the p-value = 0.0001 (KH test for 15 partition) Does this mean that all the breakpoints may be real recombination sites? If it is right, there are too many recombination events. Or do I need check some other parameters to verify which breakpoints are really significant? Thank you very mcuh! Best regards, Sundy http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KH_test.zip (3 KB | )
|
Title: Re: How to confirm GARD result? Post by Sergei on Dec 7th, 2008 at 6:06pm
Dear Sundy,
Based on the information you provide, all breakpoints are indeed significant (in the KH sense at least). Not sure what else you can do easily - why do you think there are too many breakpoints? Cheers, Sergei |
Title: Re: How to confirm GARD result? Post by Sundy on Dec 10th, 2008 at 11:13am
Dear Sergei,
Thank you very much for your replay. I am just thinking 16 breakpoints means there are 16 times of recombination. It is strange. I have another question: If GARD identified only one breakpoint, could KH test verify the significant by using HYPHY? I have a dataset as attached file, I run GARD and GARDProcessor.bf, and got the results as attached file "30 SARS-Spike HYPHY.txt" The last part is as following: KH Testing partition 1 A total of 0/0 significant couplings Mean splits identify: -nan I am confused with the "0/0 significant couplings" Is it significant or not? thanks, and was the meaning of "Mean splits identify: -nan" I know sometime it is a number such as Mean splits identify: 0.116667" Best regards, Sundy |
Title: Re: How to confirm GARD result? Post by Sergei on Dec 10th, 2008 at 11:15am
Dear Sundy,
I think that GARDProcessor.bf may be mishandling the case of one breakpoint at reporting stage. Can you send me your files so that I can confirm and fix the issue? Cheers, Sergei |
Title: Re: How to confirm GARD result? Post by Sundy on Dec 10th, 2008 at 11:15am
I forgot the attachment
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=30S.zip (6 KB | )
|
Title: Re: How to confirm GARD result? Post by Sundy on Dec 10th, 2008 at 11:25am
Dear Sergei,
Thank you very much. Actually, I have three datasets as the attached file. I am trying to detect the recombination for them. Could you pleas help me to check? I would greatly appreciate! Best regards, Sundy http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=For_recombination_test.zip (21 KB | )
|
Title: Re: How to confirm GARD result? Post by Sundy on Dec 10th, 2008 at 11:34am
Because the sequences of Bat-SARS are quite difference with the other sequences, I tried to performed GARD for 4 BAT-SARS sequences and other sequences separately.
I got lots of breakpoints for 4 BAT-SARS sequences in each dataset, and only 1 or 0 breakpoint for the other sequences. Thank you very much! Sundy |
Title: Re: How to confirm GARD result? Post by Sergei on Dec 11th, 2008 at 8:19am
Dear Sundy,
The reason you were getting 'nan' with GARDProcessor.bf was because it expects the _splits file to start with two blank likes (for no other reason as to be compatible with an old formatting quirk in GARD) and if they are not there - it will not read the data correctly. E.g. here is the _splits file for your G4-SARS-59S file. Quote:
It is actually quite common to lose power in breakpoint detection as you increase the number of sequences. Think of it as signal dilution: 1/4 recombinant sequences in a 4-taxon analysis is more obvious than 1/50 recombinant sequences in a 50-taxon analysis (of course it depends on the sequences involved, and how strong the recombination signal is). Also, while it is possible to infer a 4-taxon tree from a short sequence (e.g. 100bp) somewhat accurately, doing so for a 50-taxon tree is less likely - hence you lose breakpoints in the noise. Cheers, Sergei HTH, Sergei |
Title: Re: How to confirm GARD result? Post by Sundy on Dec 11th, 2008 at 9:46am
Dear Sergei,
Thank you very much for your interpretation. Yes, I see. I think I should delete some of the very similar sequence in my datasets, then run GARD. Thanks. So "it expects the _splits file to start with two blank likes" Does this means that if there is only one breakpoint, GARDProcessor.bf cannot verify the significant of the breakpoint? And another question, After running GARD, the result web-page has a such sentence: "To process RAW files you need to download this HyPhy batchfile, run it locally in HyPhy and input the raw file when prompted." Could you please tell how to use the "HyPhy batchfile". Do I need use the "HyPhy batchfile" when I run GARDProcessor.bf? Thank you! Best regards, Sundy |
Title: Re: How to confirm GARD result? Post by Sergei on Dec 11th, 2008 at 3:21pm
Dear Sundy,
You can run GARDProcessor.bf on the output with a single breakpoint: just make sure the _splits file starts with two blanks lines. GARDProcessor.bf is the same as the batchfile that Datamonkey instructs you do download. Cheers, Sergei |
Title: Re: How to confirm GARD result? Post by Sundy on Dec 11th, 2008 at 6:19pm
Dear Sergei,
Thank you so much!! I miss understood your meaning and I get the idea now. :) Best regards, Sundy |
Title: Re: How to confirm GARD result? Post by Sundy on Dec 11th, 2008 at 6:47pm
Dear Sergei,
I got another problem (as enclosed jpg), when I run a data as attached file with GARDProcessor.bf. Could you please help me check it? Thank you!! Sundy http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=6Spike.zip (58 KB | )
|
Title: Re: How to confirm GARD result? Post by kevintz on Jan 12th, 2009 at 3:21pm
This site is really helpful. THanks.
I have two question. 18 sequences were analyzed by GARD for recombination breakpoints. 6 breakpoints were found. Then I use the raw file, splits file, and recombinationprocessor for KH testing in HyPhy. When testing the last partition 7, the P-value is 0.1809, while other 6 partitions p-value for KH testing all <0.001, but results showed "A total of 4/6 significant couplings". One partition is gone, 6 left, should it be 5 breakpoints? From websites, it is said that AICc>10 is significant, but I found breakpoints with an improvement of AIC of 2.7 could be significant, because p-value of either side of sequences for KH testing is less than 0.001. So, I want to know how I can make sure the results? See AICc or just see P-value? Thanks. I am a fresh on this statistic field. Thanks in advance. |
Title: Re: How to confirm GARD result? Post by Sergei on Jan 12th, 2009 at 3:39pm
Dear kevintz,
A AIC-c improvement of 10 is only a guideline (and probably a rather conservative one), hence it's no surprise that a KH test shows topological discordance for a smaller AIC-c. You can have a major AIC-c improvement but no topological difference due to differing branch lengths (imagine that a part of the alignment evolves 10 times faster than the rest of the alignment - for example a hypervariable region in a gene), hence the need to confirm discordance using the KH test. A strong c-AIC difference coupled with low KH p-values is probably the most reliable indicator that there indeed is recombination in the sample. As far as your example goes, there should indeed be 5/6 couplings if no other KH test is insignificant. Can you perhaps paste in the entire output of the processor file for me to look at? Sergei Cheers, Sergei |
Title: Re: How to confirm GARD result? Post by kevintz on Jan 13th, 2009 at 1:42am
Thanks for your reply.
the following is the file. http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KHtest_of_GARD.bf (26 KB | )
|
Title: Re: How to confirm GARD result? Post by kevintz on Jan 13th, 2009 at 6:55am
Another question, GARD for breakpoints showed 2 breakpoints with AICc value of 45.53, and 63.1. KH test indicates good evidence of the two breakpoints. I submitted the GARD results directly to datamonkey by press the link below the results, but the datamonkey gave just one partition. What happened? I am confused.
Thanks. Best regards |
Title: Re: How to confirm GARD result? Post by Sergei on Jan 13th, 2009 at 7:34am
Dear kevinz,
I am afraid I don't understand the question. How did you obtain the two breakpoints originally before submitting them to Datamonkey? Sergei |
Title: Re: How to confirm GARD result? Post by kevintz on Jan 13th, 2009 at 7:59am
Sorry for confusion. I use another dataset for GARD analysis, and get two breakpoints.
|
Title: Re: How to confirm GARD result? Post by kevintz on Jan 13th, 2009 at 7:59am |
Title: Re: How to confirm GARD result? Post by Sergei on Jan 13th, 2009 at 8:07am kevintz wrote on Jan 13th, 2009 at 1:42am:
Another p-value (partition 5, 0.0014) becomes greater than the threshold of 0.01 following a Bonferroni correction for multiple tests; hence that partition is judged insignificantly supported. Cheers, Sergei |
Title: Re: How to confirm GARD result? Post by Sergei on Jan 13th, 2009 at 8:10am kevintz wrote on Jan 13th, 2009 at 7:59am:
This happened because Datamonkey removed a duplicate sequence from the alignment and the trees provided by GARD now had too many sequences. I meant to add some code for Datamonkey to handle these cases automatically and trim removed species from the tree but never got around to it. Please resubmit the analysis to GARD with the reduced alignment and all should work well. Cheers, Sergei |
Title: Re: How to confirm GARD result? Post by kevintz on Jan 13th, 2009 at 8:24am Sergei wrote on Jan 13th, 2009 at 8:10am:
Does that mean I should remove the duplicate sequence, and then use left sequences for GARD analysis and followed datamonkey selection analysis? |
Title: Re: How to confirm GARD result? Post by Sergei on Jan 13th, 2009 at 12:50pm |
Title: Re: How to confirm GARD result? Post by kevintz on Jan 13th, 2009 at 2:32pm
Dear Sergi,
I get it. The program runs well. Really thanks. One more basic question, does the AICc value refer to that from GARD result profile, or from the KH test result file? I find both of them include a AICc, but with different value. Which one is better? Best regards, kevintz |
Title: Re: How to confirm GARD result? Post by Sergei on Jan 14th, 2009 at 12:35pm
Dear kevintz,
KH test file should always have a somewhat better AICc than the GARD file, because GARD only estimates several model parameters (e.g. nucleotide biases and site-to-site rate variation parameters) only once (assuming no recombination) to speed up the run and the KH analysis estimates those parameters assuming recombination. The difference in AICc should not be very large however. Sergei |
Title: Re: How to confirm GARD result? Post by kevintz on Jan 14th, 2009 at 11:01pm
THanks
|
Title: Re: How to confirm GARD result? Post by kevintz on Jan 18th, 2009 at 10:41am Sergei wrote on Jan 13th, 2009 at 8:07am:
Dear Sergei, Sorry to bother you again. :) :) I read your article on journal Gene 397 (2007) 38-50, entitled "Evolution of the interferon alpha gene family in eutherian mammals", and found Table 1 refered to KH tesing with all the P-value <0.01. While in my data set, the KH testing resluts show the p-value much smaller as far as 0.001. As you said, Bonferroni correction might be performed. So, my question is does the KH test doing Bonferroni correction automatically? if not, how? if yes, why not to refer in the article? Thanks a million. Best regards, Tu Zeng |
Title: Re: How to confirm GARD result? Post by Sergei on Jan 18th, 2009 at 5:33pm
Dear Tu Zeng,
The numbers in the paper are quoted following a Bonferroni correction. The KH test provides raw numbers for the p-values, but includes a Bonferroni correction when counting the number of significant partitions. HTH, Sergei |
Title: Re: How to confirm GARD result? Post by kevintz on Jan 20th, 2009 at 2:38am Sergei wrote on Jan 18th, 2009 at 5:33pm:
Thanks it helps me a lot. And How I can know the value of the Bonferroni correction through Hyphy or KH test? whether it set to 0.05 or 0.01 after KH test? Thanks a lot. Best, kevintz. PS: tuzeng is my name, sorry for confusion. |
Title: Re: How to confirm GARD result? Post by Sergei on Jan 20th, 2009 at 9:27am
Dear tuzeng,
The processor file considers a Bonferroni-corrected p<=0.01 to be significant. Cheers, Sergei |
Title: Re: How to confirm GARD result? Post by Sundy on Feb 12th, 2009 at 7:48am kevintz wrote on Jan 18th, 2009 at 10:41am:
Dear Sergei, I am very interested in the table 1 of above paper, and I also several basice questions aobut it. 1. Where can I find the AICc improvement in my KH test result file? for example from my attached file. Is it Delta AIC = 280.92? 2. Where can I find the LHS vs. RHS and RHS vs LHS. For example from my attached file. Thank you very much!! Sundy Due to the character limitation, I deleted the "Tree aTree=[...]" and attached the entire file. **************** Loaded 5 partitions Sequences :10 Sites :3666 Variable :539 f(A) = 0.285291 f(C) = 0.227325 f(G) = 0.205476 f(T) = 0.281908 c-AIC = 17279.5 Log Likelihood = -8614.56089283971; Shared Parameters: GT=0.221445=0.221445 CT=1.90112=1.90112 CG=0.10546=0.10546 AT=0.471935=0.471935 AC=0.281374=0.281374 alpha=0.0102205=0.0102205 betaQ=3.72107=3.72107 betaP=4.31789=4.31789 Tree givenTree=(((Bat_SARS_273_DQ648856:0.0496989,((Bat_SARS_279_DQ648857:0.0282901,Bat_SARS_Rp_DQ071615:0.0244375)Node5:0.00575514,Bat_SARS_HKU3_DQ022305:0.0359314)Node4:0.0267917)Node2:0.0374558,3_HP03E_GZ02:0.000232858)Node1:0.000864494,(130_PC03_SZ13:0.000540745,(110_PC04_PC4_136:0.00192447,106_HP04_GZ0402:0.00137285)Node12:0.0016568)Node10:0.00110442,(15_HP03M_BJ02:0.000823405,31_HP03L_Tor2:0.000270703)Node15:0.000270703); Mean divergence : 6.49111% Fitting a single-tree, multiple partition model Log Likelihood = -8578.32862962121; Shared Parameters: S_1=0.92014 S_2=0.382852 S_3=0.560115 S_4=0.375399 betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree tree_0= Fitting a mutilple tree, multiple partition model Log Likelihood = -8403.83189116324; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree tree_0= Versus the single partition model: c-AIC = 16998.6 Delta AIC = 280.92 Versus the single tree/multiple partition model: Delta AIC = 216.577 Partition 1 : 381 sites Partition 2 : 343 sites Partition 3 : 1080 sites Partition 4 : 722 sites Partition 5 : 1140 sites Fitting tree 1 to partition 1 Log Likelihood = -1181.64638188419; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= Fitting tree 2 to partition 1 Log Likelihood = -1195.56564111684; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= KH Testing partition 1 Tree 2 base LRT = 27.8385. p-value = 0.025 Fitting tree 1 to partition 2 Log Likelihood = -977.448251979626; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= Fitting tree 2 to partition 2 Log Likelihood = -968.982471135205; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= Fitting tree 3 to partition 2 Log Likelihood = -972.312843728842; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= KH Testing partition 2 Tree 1 base LRT = 16.9316. p-value = 0.0001 Tree 3 base LRT = 6.66075. p-value = 0.0001 Fitting tree 2 to partition 3 Log Likelihood = -2258.19915763715; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= Fitting tree 3 to partition 3 Log Likelihood = -2244.90399830231; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= Fitting tree 4 to partition 3 Log Likelihood = -2285.99836857422; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= KH Testing partition 3 Tree 2 base LRT = 26.5903. p-value = 0.0001 Tree 4 base LRT = 82.1887. p-value = 0.0001 Fitting tree 3 to partition 4 Log Likelihood = -1640.47340028827; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= Fitting tree 4 to partition 4 Log Likelihood = -1632.78789828693; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= Fitting tree 5 to partition 4 Log Likelihood = -1639.94300504171; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= KH Testing partition 4 Tree 3 base LRT = 15.371. p-value = 0.0001 Tree 5 base LRT = 14.3102. p-value = 0.0001 Fitting tree 4 to partition 5 Log Likelihood = -2375.98828232171; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= Fitting tree 5 to partition 5 Log Likelihood = -2375.51119382618; Shared Parameters: betaP=4.31789=4.31789 betaQ=3.72107=3.72107 alpha=0.0102205=0.0102205 AC=0.281374=0.281374 AT=0.471935=0.471935 CG=0.10546=0.10546 CT=1.90112=1.90112 GT=0.221445=0.221445 Tree aTree= KH Testing partition 5 Tree 4 base LRT = 0.954177. p-value = 0.0001 A total of 3/4 significant couplings Mean splits identify: 0.26 *********************** http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=KH_test_001.zip (2 KB | )
|
Title: Re: How to confirm GARD result? Post by Sergei on Feb 12th, 2009 at 2:29pm
Dear Sundy,
The first Delta AIC is the one you want (the second tests a slightly different model, and it is not mentioned in the paper). The LHS vs RHS values are reported in fragments like this: Code (] KH Testing partition 1 Tree 2 base LRT = 27.8385. p-value = 0.025 [/code):
which states that the RHS (partition 2) has KH p-value of 0.0001 vs the tree from the partition 1 (LHS) -- this is the RHS vs LHS value. If you have multiple breakpoints, a pair of p-values like this will be reported for each breakpoint. HTH, Sergei |
Title: Re: How to confirm GARD result? Post by Sundy on Feb 12th, 2009 at 2:59pm
Dear Sergei,
Thank you so much! I got it Cheers, Sundy |
HyPhy message board » Powered by YaBB 2.5.2! YaBB Forum Software © 2000-2024. All Rights Reserved. |