HyPhy message board - Print Page

Dear Datamonkey population,
I am using GARD (and SCUEAL but this post is about GARD) to analyze a dataset of ~130 HIV subtype B env sequences from individuals around the globe. When I process the entire dataset through GARD, there are no significantly supported breakpoints, BUT when a look at a distinct subpopulation (~35 taxa from the same geographic location) there are 4-6 breakpoints (depending on p-value I use) that are significantly supported. Alternatively, when I look at the rest of the remaining sequences (~95 taxa) there are 0 statistically supported breakpoints. Am I correct in thinking that when the entire dataset is analyzed that the lack of breakpoints in the 95 taxa dataset "muffles" the signal I see when looking only at the distinct subpopulation? Or could there be an ascertainment bias where breakpoints are more likely to be found in a smaller dataset? Thank you!

-Crystal 8-)

HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl Theoretical questions >> Sequence Analysis >> GARD: whole vs partial dataset http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1308177463 Message started by CrystalH on Jun 15^th, 2011 at 3:37pm