Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
positive selection analysis (Read 5970 times)
lun
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 11
positive selection analysis
Jan 30th, 2012 at 11:53pm
 
Dear Sergei,

I have a alignment with 9 sequences of a gene isolated from 9 strains of virus in different sampling year:
4 sequences in 2005-2006
3 sequences in 2007-2008
2 sequences in 2010-2011

All these sequences have over 90% nucleotide identity, and the 2 seqeunces in 2010-2011 have shown more substitution. I have used SLAC, FEL and REL to estimate any positive selection sites appear over these years on this gene. Only REL shown around 60 positively selected sites (poster. prob > 90%) when all 9 strains are tested. (REL show no positively selected sites when 2 strains in 2010-2011 was removed.)

So it sounds logic for me to further test the sequences using Branch-site REL, and found that the branch with the 2 sequences in 2010-2011 is under episodic diversifying selection at p <= 0.05.


My question is:

1. Can I conclude that the gene of the virus isolates in 2010-2011 is under some episodic selection like some encoded protein function constraints? As this test is rather new implemented in datamonkey,  I found little references on reporting and interpreting the results.

2. Is it true that REL is more powerful on testing on small number of sequences and on low divergence of sequences? Should I also report the results of SLAC and FEL in my manuscript though both test found no positively selected sites?

Thanks for answering.

lun
Back to top
 
 
IP Logged
 
lun
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 11
Re: positive selection analysis
Reply #1 - Jan 31st, 2012 at 12:19am
 
Dear Sergie,

One more question, I would like to perform recombination analysis among these 9 isolates of virus and other related virus using GARD. Do you think it is better to test it using the multiple alignment or in pairwise manner?

Thanks.

Best regards,
lun
Back to top
 
 
IP Logged
 
konrad
Junior Member
**
Offline


I love YaBB 1G - SP1!

Posts: 53
Re: positive selection analysis
Reply #2 - Jan 31st, 2012 at 11:46am
 
Dear Lun,

Thanks for your questions:

> 1. Can I conclude that the gene of the virus isolates in 2010-2011 is under some episodic selection

Yes.

> 2. Is it true that REL is more powerful on testing on small number of sequences and on low divergence of sequences?

No. Detection of individual sites under positive selection using REL is unstable in the presence of heterotachy - when heterotachy is present, adding sequences may either increase or decrease the number of hits. We currently have a method under submission which corrects for this using the same idea as branch-site REL (we could email you a draft manuscript if you're interested). In the meantime, we recommend just using branch-site REL.

> Should I also report the results of SLAC and FEL in my manuscript though both test found no positively selected sites?

Since you know that heterotachy is present, we recommend only using branch-site REL.

>  I would like to perform recombination analysis among these 9 isolates of virus and other related virus using GARD. Do you think it is better to test it using the multiple alignment or in pairwise manner?

GARD is intended to be used with multiple alignments.

Hope this helps,
Konrad Scheffler
Back to top
 
WWW WWW  
IP Logged
 
lun
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 11
Re: positive selection analysis
Reply #3 - Feb 1st, 2012 at 12:37am
 
Dear Dr. Konrad Scheffler,

Thank you for your helpful advice.

Regarding to the REL question, I have read from Method part of the paper:"Differential stepwise evolution of SARS coronavirus functional proteins in different host species."

It claimed that "REL method is often the only method that can infer selection from small (5-15 sequences) or low divergence alignments and tends to be the most powerful of the three tests.", but sadly without reference support.

Actually, I am also analysing the spike gene and the positively selected sites (predicted by REL) are found mostly distributed within the N terminal region. I think this is an interesting finding although the selection test might not be reliable due to the limited number of sequences? As this spike gene is encoding a protein interacting with the host, and the region with predicted positively selection sites are involved in binding, could the presence of heterotachy be an exception in this case?

For clarifying the GARD question, I have a alignment of 9 sequences derived from a same species (species A). If I want to test recombination with other closely related species, let's say, species B, C, D. Should I perform it using the alignment with species A,B,C,D together? Or should I perform it three times separately for each species?

Thank you very much for answering my questions.

Best regards,
Lun


Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: positive selection analysis
Reply #4 - Feb 1st, 2012 at 12:19pm
 
Hi Lun,

For 9 sequences regardless of the method there is generally very little power to say anything about any particular sites. You can use BranchSiteREL to test whether or not a proportion of sites at a given lineage have been subjected to selection, but you won't have any confidence that any particular site is selected. You can run MEME on datamonkey.org if you are determined to try to find specific sites under selection: this method extends FEL to deal with heterotachy (as suggested by Konrad). Generally, I would advise you AGAINST reporting any specific sites under selection when you have 9 closely related sequences -- you should try to make statements about proportions of sites (BranchSiteREL or REL), e.g. there is evidence that X% of sites are evolving with dN/dS > 1. You can also supply a plot of site-wise dN/dS estimates (which datamonkey outputs), e.g. by REL, to indicate trends of spatial localization.

The quote from the SARS paper can be a bit misleading -- REL can pool signal from multiple sites to boost power to detect selection somewhere in the gene, but you still won't get great precision/power at any given site  (see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login, page 26):

The REL approach can be used to test whether positive selection operated on a proportion of sites in an alignment

As far as GARD is concerned, please perform the analysis with all 4 species, but don't include too many sequences per species. If you think there is no INTRA-sepcies recombination, I would include one sequence from A,B,C and D. See Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login page 8.

Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
lun
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 11
Re: positive selection analysis
Reply #5 - Feb 1st, 2012 at 7:16pm
 
Many thanks to Konrad and Sergei.

Seemingly that there is some problem on model selection process on datamonkey. I have run the automatic model selection tool and found 010012 is the best model for my alignment, so I modified the matrix accordingly as:

AC  1  AC
     AC  1
         GT

But turn out the model selection fit in phase 1 is 010015 when I run SLAC or MEME. And it seems to me that there is no this problem few weeks ago. Is it a bug or I did something wrong on the analysis?

Best regards,
Alan
Back to top
 
 
IP Logged
 
lun
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 11
Re: positive selection analysis
Reply #6 - Feb 2nd, 2012 at 2:44am
 
Dear Sergei,

Can I run the Branch-site REL and MEME analysis with user defined ML tree on datamonkey? If not, I might follow previous workshop material to do BSR. But for MEME, do you mind briefly describe how to perform MEME in Hyphy? Which batch file should I use? Thanks.

Best regards,
Lun
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: positive selection analysis
Reply #7 - Feb 2nd, 2012 at 8:17am
 
Hi Lun,

Both methods are available on DataMonkey.

Sergei

lun wrote on Feb 2nd, 2012 at 2:44am:
Dear Sergei,

Can I run the Branch-site REL and MEME analysis with user defined ML tree on datamonkey? If not, I might follow previous workshop material to do BSR. But for MEME, do you mind briefly describe how to perform MEME in Hyphy? Which batch file should I use? Thanks.

Best regards,
Lun

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: positive selection analysis
Reply #8 - Feb 2nd, 2012 at 8:18am
 
Hi Alan,

010012 and 010015 specify the same model (the 6-digit string is not unique), see box on page 16 of Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
lun
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 11
Re: positive selection analysis
Reply #9 - Feb 3rd, 2012 at 8:54pm
 
Dear Sergei,

I think datamonkey only allow user to use NJ tree?  I have got a problem when I add 3 additional sequences (v11-13, same virus species as v1-10, but different host). The results from NJ and ML tree are different, though both show two branched are under positive selection (attached file).

And both results show that there is a much stronger w+ on the branch of virus from 2010-2011 (attached file). Would another branch detected under positive selection is a false positive due to the influence of virus from 2010-2011?

As I see this difference between NJ and ML, I would like to try also MEME with ML in hyphy. But which batch file should I use?

Best regards,
Alan


Sergei wrote on Feb 2nd, 2012 at 8:17am:
Hi Lun,

Both methods are available on DataMonkey.

Sergei

lun wrote on Feb 2nd, 2012 at 2:44am:
Dear Sergei,

Can I run the Branch-site REL and MEME analysis with user defined ML tree on datamonkey? If not, I might follow previous workshop material to do BSR. But for MEME, do you mind briefly describe how to perform MEME in Hyphy? Which batch file should I use? Thanks.

Best regards,
Lun


Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (141 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: positive selection analysis
Reply #10 - Feb 3rd, 2012 at 9:37pm
 
Hi Alan,

See Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for the HyPhy instructions.

You should be able to supply the ML tree to Datamonkey as well. Just include it as a part of the alignment file, e.g. as shown in the examples on pages Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged