HyPhy message board | |
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> positive selection analysis http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1327996407 Message started by lun on Jan 30th, 2012 at 11:53pm |
Title: positive selection analysis Post by lun on Jan 30th, 2012 at 11:53pm
Dear Sergei,
I have a alignment with 9 sequences of a gene isolated from 9 strains of virus in different sampling year: 4 sequences in 2005-2006 3 sequences in 2007-2008 2 sequences in 2010-2011 All these sequences have over 90% nucleotide identity, and the 2 seqeunces in 2010-2011 have shown more substitution. I have used SLAC, FEL and REL to estimate any positive selection sites appear over these years on this gene. Only REL shown around 60 positively selected sites (poster. prob > 90%) when all 9 strains are tested. (REL show no positively selected sites when 2 strains in 2010-2011 was removed.) So it sounds logic for me to further test the sequences using Branch-site REL, and found that the branch with the 2 sequences in 2010-2011 is under episodic diversifying selection at p <= 0.05. My question is: 1. Can I conclude that the gene of the virus isolates in 2010-2011 is under some episodic selection like some encoded protein function constraints? As this test is rather new implemented in datamonkey, I found little references on reporting and interpreting the results. 2. Is it true that REL is more powerful on testing on small number of sequences and on low divergence of sequences? Should I also report the results of SLAC and FEL in my manuscript though both test found no positively selected sites? Thanks for answering. lun |
Title: Re: positive selection analysis Post by lun on Jan 31st, 2012 at 12:19am
Dear Sergie,
One more question, I would like to perform recombination analysis among these 9 isolates of virus and other related virus using GARD. Do you think it is better to test it using the multiple alignment or in pairwise manner? Thanks. Best regards, lun |
Title: Re: positive selection analysis Post by konrad on Jan 31st, 2012 at 11:46am
Dear Lun,
Thanks for your questions: > 1. Can I conclude that the gene of the virus isolates in 2010-2011 is under some episodic selection Yes. > 2. Is it true that REL is more powerful on testing on small number of sequences and on low divergence of sequences? No. Detection of individual sites under positive selection using REL is unstable in the presence of heterotachy - when heterotachy is present, adding sequences may either increase or decrease the number of hits. We currently have a method under submission which corrects for this using the same idea as branch-site REL (we could email you a draft manuscript if you're interested). In the meantime, we recommend just using branch-site REL. > Should I also report the results of SLAC and FEL in my manuscript though both test found no positively selected sites? Since you know that heterotachy is present, we recommend only using branch-site REL. > I would like to perform recombination analysis among these 9 isolates of virus and other related virus using GARD. Do you think it is better to test it using the multiple alignment or in pairwise manner? GARD is intended to be used with multiple alignments. Hope this helps, Konrad Scheffler |
Title: Re: positive selection analysis Post by lun on Feb 1st, 2012 at 12:37am
Dear Dr. Konrad Scheffler,
Thank you for your helpful advice. Regarding to the REL question, I have read from Method part of the paper:"Differential stepwise evolution of SARS coronavirus functional proteins in different host species." It claimed that "REL method is often the only method that can infer selection from small (5-15 sequences) or low divergence alignments and tends to be the most powerful of the three tests.", but sadly without reference support. Actually, I am also analysing the spike gene and the positively selected sites (predicted by REL) are found mostly distributed within the N terminal region. I think this is an interesting finding although the selection test might not be reliable due to the limited number of sequences? As this spike gene is encoding a protein interacting with the host, and the region with predicted positively selection sites are involved in binding, could the presence of heterotachy be an exception in this case? For clarifying the GARD question, I have a alignment of 9 sequences derived from a same species (species A). If I want to test recombination with other closely related species, let's say, species B, C, D. Should I perform it using the alignment with species A,B,C,D together? Or should I perform it three times separately for each species? Thank you very much for answering my questions. Best regards, Lun |
Title: Re: positive selection analysis Post by lun on Feb 1st, 2012 at 7:16pm
Many thanks to Konrad and Sergei.
Seemingly that there is some problem on model selection process on datamonkey. I have run the automatic model selection tool and found 010012 is the best model for my alignment, so I modified the matrix accordingly as: AC 1 AC AC 1 GT But turn out the model selection fit in phase 1 is 010015 when I run SLAC or MEME. And it seems to me that there is no this problem few weeks ago. Is it a bug or I did something wrong on the analysis? Best regards, Alan |
Title: Re: positive selection analysis Post by lun on Feb 2nd, 2012 at 2:44am
Dear Sergei,
Can I run the Branch-site REL and MEME analysis with user defined ML tree on datamonkey? If not, I might follow previous workshop material to do BSR. But for MEME, do you mind briefly describe how to perform MEME in Hyphy? Which batch file should I use? Thanks. Best regards, Lun |
Title: Re: positive selection analysis Post by Sergei on Feb 2nd, 2012 at 8:17am
Hi Lun,
Both methods are available on DataMonkey. Sergei lun wrote on Feb 2nd, 2012 at 2:44am:
|
Title: Re: positive selection analysis Post by Sergei on Feb 2nd, 2012 at 8:18am |
Title: Re: positive selection analysis Post by lun on Feb 3rd, 2012 at 8:54pm
Dear Sergei,
I think datamonkey only allow user to use NJ tree? I have got a problem when I add 3 additional sequences (v11-13, same virus species as v1-10, but different host). The results from NJ and ML tree are different, though both show two branched are under positive selection (attached file). And both results show that there is a much stronger w+ on the branch of virus from 2010-2011 (attached file). Would another branch detected under positive selection is a false positive due to the influence of virus from 2010-2011? As I see this difference between NJ and ML, I would like to try also MEME with ML in hyphy. But which batch file should I use? Best regards, Alan Sergei wrote on Feb 2nd, 2012 at 8:17am:
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=BSR.pdf (141 KB | )
|
HyPhy message board » Powered by YaBB 2.5.2! YaBB Forum Software © 2000-2024. All Rights Reserved. |