HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> positive selection analysis
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1327996407

Message started by lun on Jan 30th, 2012 at 11:53pm

Title: positive selection analysis
Post by lun on Jan 30th, 2012 at 11:53pm
Dear Sergei,

I have a alignment with 9 sequences of a gene isolated from 9 strains of virus in different sampling year:
4 sequences in 2005-2006
3 sequences in 2007-2008
2 sequences in 2010-2011

All these sequences have over 90% nucleotide identity, and the 2 seqeunces in 2010-2011 have shown more substitution. I have used SLAC, FEL and REL to estimate any positive selection sites appear over these years on this gene. Only REL shown around 60 positively selected sites (poster. prob > 90%) when all 9 strains are tested. (REL show no positively selected sites when 2 strains in 2010-2011 was removed.)

So it sounds logic for me to further test the sequences using Branch-site REL, and found that the branch with the 2 sequences in 2010-2011 is under episodic diversifying selection at p <= 0.05.


My question is:

1. Can I conclude that the gene of the virus isolates in 2010-2011 is under some episodic selection like some encoded protein function constraints? As this test is rather new implemented in datamonkey,  I found little references on reporting and interpreting the results.

2. Is it true that REL is more powerful on testing on small number of sequences and on low divergence of sequences? Should I also report the results of SLAC and FEL in my manuscript though both test found no positively selected sites?

Thanks for answering.

lun

Title: Re: positive selection analysis
Post by lun on Jan 31st, 2012 at 12:19am
Dear Sergie,

One more question, I would like to perform recombination analysis among these 9 isolates of virus and other related virus using GARD. Do you think it is better to test it using the multiple alignment or in pairwise manner?

Thanks.

Best regards,
lun

Title: Re: positive selection analysis
Post by konrad on Jan 31st, 2012 at 11:46am
Dear Lun,

Thanks for your questions:

> 1. Can I conclude that the gene of the virus isolates in 2010-2011 is under some episodic selection

Yes.

> 2. Is it true that REL is more powerful on testing on small number of sequences and on low divergence of sequences?

No. Detection of individual sites under positive selection using REL is unstable in the presence of heterotachy - when heterotachy is present, adding sequences may either increase or decrease the number of hits. We currently have a method under submission which corrects for this using the same idea as branch-site REL (we could email you a draft manuscript if you're interested). In the meantime, we recommend just using branch-site REL.

> Should I also report the results of SLAC and FEL in my manuscript though both test found no positively selected sites?

Since you know that heterotachy is present, we recommend only using branch-site REL.

>  I would like to perform recombination analysis among these 9 isolates of virus and other related virus using GARD. Do you think it is better to test it using the multiple alignment or in pairwise manner?

GARD is intended to be used with multiple alignments.

Hope this helps,
Konrad Scheffler

Title: Re: positive selection analysis
Post by lun on Feb 1st, 2012 at 12:37am
Dear Dr. Konrad Scheffler,

Thank you for your helpful advice.

Regarding to the REL question, I have read from Method part of the paper:"Differential stepwise evolution of SARS coronavirus functional proteins in different host species."

It claimed that "REL method is often the only method that can infer selection from small (5-15 sequences) or low divergence alignments and tends to be the most powerful of the three tests.", but sadly without reference support.

Actually, I am also analysing the spike gene and the positively selected sites (predicted by REL) are found mostly distributed within the N terminal region. I think this is an interesting finding although the selection test might not be reliable due to the limited number of sequences? As this spike gene is encoding a protein interacting with the host, and the region with predicted positively selection sites are involved in binding, could the presence of heterotachy be an exception in this case?

For clarifying the GARD question, I have a alignment of 9 sequences derived from a same species (species A). If I want to test recombination with other closely related species, let's say, species B, C, D. Should I perform it using the alignment with species A,B,C,D together? Or should I perform it three times separately for each species?

Thank you very much for answering my questions.

Best regards,
Lun



Title: Re: positive selection analysis
Post by Sergei on Feb 1st, 2012 at 12:19pm
Hi Lun,

For 9 sequences regardless of the method there is generally very little power to say anything about any particular sites. You can use BranchSiteREL to test whether or not a proportion of sites at a given lineage have been subjected to selection, but you won't have any confidence that any particular site is selected. You can run MEME on datamonkey.org if you are determined to try to find specific sites under selection: this method extends FEL to deal with heterotachy (as suggested by Konrad). Generally, I would advise you AGAINST reporting any specific sites under selection when you have 9 closely related sequences -- you should try to make statements about proportions of sites (BranchSiteREL or REL), e.g. there is evidence that X% of sites are evolving with dN/dS > 1. You can also supply a plot of site-wise dN/dS estimates (which datamonkey outputs), e.g. by REL, to indicate trends of spatial localization.

The quote from the SARS paper can be a bit misleading -- REL can pool signal from multiple sites to boost power to detect selection somewhere in the gene, but you still won't get great precision/power at any given site  (see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login, page 26):

The REL approach can be used to test whether positive selection operated on a proportion of sites in an alignment

As far as GARD is concerned, please perform the analysis with all 4 species, but don't include too many sequences per species. If you think there is no INTRA-sepcies recombination, I would include one sequence from A,B,C and D. See Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login page 8.

Sergei


Title: Re: positive selection analysis
Post by lun on Feb 1st, 2012 at 7:16pm
Many thanks to Konrad and Sergei.

Seemingly that there is some problem on model selection process on datamonkey. I have run the automatic model selection tool and found 010012 is the best model for my alignment, so I modified the matrix accordingly as:

AC  1  AC
     AC  1
         GT

But turn out the model selection fit in phase 1 is 010015 when I run SLAC or MEME. And it seems to me that there is no this problem few weeks ago. Is it a bug or I did something wrong on the analysis?

Best regards,
Alan

Title: Re: positive selection analysis
Post by lun on Feb 2nd, 2012 at 2:44am
Dear Sergei,

Can I run the Branch-site REL and MEME analysis with user defined ML tree on datamonkey? If not, I might follow previous workshop material to do BSR. But for MEME, do you mind briefly describe how to perform MEME in Hyphy? Which batch file should I use? Thanks.

Best regards,
Lun

Title: Re: positive selection analysis
Post by Sergei on Feb 2nd, 2012 at 8:17am
Hi Lun,

Both methods are available on DataMonkey.

Sergei


lun wrote on Feb 2nd, 2012 at 2:44am:
Dear Sergei,

Can I run the Branch-site REL and MEME analysis with user defined ML tree on datamonkey? If not, I might follow previous workshop material to do BSR. But for MEME, do you mind briefly describe how to perform MEME in Hyphy? Which batch file should I use? Thanks.

Best regards,
Lun


Title: Re: positive selection analysis
Post by Sergei on Feb 2nd, 2012 at 8:18am
Hi Alan,

010012 and 010015 specify the same model (the 6-digit string is not unique), see box on page 16 of Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Sergei

Title: Re: positive selection analysis
Post by lun on Feb 3rd, 2012 at 8:54pm
Dear Sergei,

I think datamonkey only allow user to use NJ tree?  I have got a problem when I add 3 additional sequences (v11-13, same virus species as v1-10, but different host). The results from NJ and ML tree are different, though both show two branched are under positive selection (attached file).

And both results show that there is a much stronger w+ on the branch of virus from 2010-2011 (attached file). Would another branch detected under positive selection is a false positive due to the influence of virus from 2010-2011?

As I see this difference between NJ and ML, I would like to try also MEME with ML in hyphy. But which batch file should I use?

Best regards,
Alan



Sergei wrote on Feb 2nd, 2012 at 8:17am:
Hi Lun,

Both methods are available on DataMonkey.

Sergei


lun wrote on Feb 2nd, 2012 at 2:44am:
Dear Sergei,

Can I run the Branch-site REL and MEME analysis with user defined ML tree on datamonkey? If not, I might follow previous workshop material to do BSR. But for MEME, do you mind briefly describe how to perform MEME in Hyphy? Which batch file should I use? Thanks.

Best regards,
Lun


http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=BSR.pdf (141 KB | )

Title: Re: positive selection analysis
Post by Sergei on Feb 3rd, 2012 at 9:37pm
Hi Alan,

See Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for the HyPhy instructions.

You should be able to supply the ML tree to Datamonkey as well. Just include it as a part of the alignment file, e.g. as shown in the examples on pages Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.