Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
FEL vs MEME and sampling strategy (Read 8137 times)
streicker
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 4
FEL vs MEME and sampling strategy
Sep 29th, 2011 at 7:34am
 
Hi, I'm working on a virus dataset that is characterized by many historical host shifts among many different host species, which now each maintain mostly independent epidemiological cycles that are subject to purifying selection. I have datasets for 3 genes and I am interested in identifying sites under positive selection which might be signals of host adaptation. There are multiple virus sequences for each host-specific virus lineage for each gene.

If I analyze these datasets under FEL (or REL or SLAC), it seems like I would overestimate dS by including multiple representatives per viral lineage (since any selection would have already happened on the branches leading to that clade). It seems like MEME would be a good solution here since selection would be expected to be episodic over time. My first question is whether that sounds like a reasonable approach or would it be better to reduce the dataset to have only 1 representative sequence per clade per gene to reduce the purifying selection happening within host-associated clades and use site-only methods?

Second, I have tried both FEL and MEME on the full dataset, and the latter identifies more sites under selection than FEL, which makes sense to me. However, there are a few sites that are identified as being under positive selection (p<0.1) by FEL, but not by MEME. Is there an explanation for how a site could be under selection when averaging across all branches, but not episodically when only a subset of branches are considered? If there a way to come to some consensus on this, I would appreciate any advice.

Thanks,
Daniel

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: FEL vs MEME and sampling strategy
Reply #1 - Sep 30th, 2011 at 12:12pm
 
Hi Daniel,

The cases where FEL finds selection and MEME does not should not happen; I suspect this could be an optimization bug. Could you link to Datamonkey result pages where this behavior has occurred?

MEME should be able to pick up episodic selection regardless of how many clade representatives you include; it is not very good at detecting selective sweeps (i.e. a single substitution followed by fixation), but this should not be a problem in your case.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
streicker
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 4
Re: FEL vs MEME and sampling strategy
Reply #2 - Sep 30th, 2011 at 2:14pm
 
Hi Sergie,

Thanks for your quick reply. I am not being allowed to post links, but the upload number for both the FEL and MEME jobs is: 21233339386635.1

However, I had a closer look at the results for each run for this dataset and another that had apparently conflicting results and in each case the FEL result was just barely significant (p<0.099) and the MEME result was just barely non-significant (p=0.10; codon 357). So maybe not so different after all?

More generally, this brings up the issue of p values for these analyses. Are there guidelines based on sample size for what is appropriate?

Thanks,
Daniel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: FEL vs MEME and sampling strategy
Reply #3 - Sep 30th, 2011 at 2:24pm
 
Hi Daniel,

Indeed, for codon 357 MEME is basically doing exactly the same thing as FEL (the mixture distribution for MEME collapses to a single rate class), so the behavior is normal.

For MEME, p-values closely track Type I errors (we are almost done with the manuscript where this is shown), but FEL is very conservative, i.e. its Type I errors are often much lower than the p-values.

I would recommend p = 0.05 for MEME and p = 0.1 for FEL.

Sergei



Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
streicker
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 4
Re: FEL vs MEME and sampling strategy
Reply #4 - Sep 30th, 2011 at 2:25pm
 
Great. Thanks for clearing that up.

Daniel
Back to top
 
 
IP Logged
 
streicker
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 4
Re: FEL vs MEME and sampling strategy
Reply #5 - Jan 6th, 2012 at 10:52am
 
Hi again Sergei,

I was curious whether the paper on MEME was at a stage where it could be shared?

Specifically, I am having some doubts about sites that were found to be evolving under significant negative selection by FEL but significant positive selection by MEME. I'm not quite sure which to believe and can't find any papers that have used the MEME approach.

Thanks,
Daniel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: FEL vs MEME and sampling strategy
Reply #6 - Jan 6th, 2012 at 11:04am
 
Hi Daniel,

Sure; please send me an e-mail and I can send you what we currently have submitted. What you see if perfectly normal though -- bursts of positive selection are often followed by extensive conservation -- on average over all branches (FEL), this will yield dN/dS < 1, while with MEME, a small proportion of branches will be correctly assigned to a class with dN>dS.

Sergei

streicker wrote on Jan 6th, 2012 at 10:52am:
Hi again Sergei,

I was curious whether the paper on MEME was at a stage where it could be shared?

Specifically, I am having some doubts about sites that were found to be evolving under significant negative selection by FEL but significant positive selection by MEME. I'm not quite sure which to believe and can't find any papers that have used the MEME approach.

Thanks,
Daniel  

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged