HyPhy message board - Print Page

Hi, I'm working on a virus dataset that is characterized by many historical host shifts among many different host species, which now each maintain mostly independent epidemiological cycles that are subject to purifying selection. I have datasets for 3 genes and I am interested in identifying sites under positive selection which might be signals of host adaptation. There are multiple virus sequences for each host-specific virus lineage for each gene.

If I analyze these datasets under FEL (or REL or SLAC), it seems like I would overestimate dS by including multiple representatives per viral lineage (since any selection would have already happened on the branches leading to that clade). It seems like MEME would be a good solution here since selection would be expected to be episodic over time. My first question is whether that sounds like a reasonable approach or would it be better to reduce the dataset to have only 1 representative sequence per clade per gene to reduce the purifying selection happening within host-associated clades and use site-only methods?

Second, I have tried both FEL and MEME on the full dataset, and the latter identifies more sites under selection than FEL, which makes sense to me. However, there are a few sites that are identified as being under positive selection (p<0.1) by FEL, but not by MEME. Is there an explanation for how a site could be under selection when averaging across all branches, but not episodically when only a subset of branches are considered? If there a way to come to some consensus on this, I would appreciate any advice.

Thanks,
Daniel

Hi Daniel,

The cases where FEL finds selection and MEME does not should not happen; I suspect this could be an optimization bug. Could you link to Datamonkey result pages where this behavior has occurred?

MEME should be able to pick up episodic selection regardless of how many clade representatives you include; it is not very good at detecting selective sweeps (i.e. a single substitution followed by fixation), but this should not be a problem in your case.

Sergei

Hi Sergie,

Thanks for your quick reply. I am not being allowed to post links, but the upload number for both the FEL and MEME jobs is: 21233339386635.1

However, I had a closer look at the results for each run for this dataset and another that had apparently conflicting results and in each case the FEL result was just barely significant (p<0.099) and the MEME result was just barely non-significant (p=0.10; codon 357). So maybe not so different after all?

More generally, this brings up the issue of p values for these analyses. Are there guidelines based on sample size for what is appropriate?

Thanks,
Daniel

Hi Daniel,

Indeed, for codon 357 MEME is basically doing exactly the same thing as FEL (the mixture distribution for MEME collapses to a single rate class), so the behavior is normal.

For MEME, p-values closely track Type I errors (we are almost done with the manuscript where this is shown), but FEL is very conservative, i.e. its Type I errors are often much lower than the p-values.

I would recommend p = 0.05 for MEME and p = 0.1 for FEL.

Sergei

Great. Thanks for clearing that up.

Daniel

Hi again Sergei,

I was curious whether the paper on MEME was at a stage where it could be shared?

Specifically, I am having some doubts about sites that were found to be evolving under significant negative selection by FEL but significant positive selection by MEME. I'm not quite sure which to believe and can't find any papers that have used the MEME approach.

Thanks,
Daniel

Hi Daniel,

Sure; please send me an e-mail and I can send you what we currently have submitted. What you see if perfectly normal though -- bursts of positive selection are often followed by extensive conservation -- on average over all branches (FEL), this will yield dN/dS < 1, while with MEME, a small proportion of branches will be correctly assigned to a class with dN>dS.

Sergei

streicker wrote on Jan 6^th, 2012 at 10:52am:

HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl Datamonkey Server >> Datamonkey feedback >> FEL vs MEME and sampling strategy http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1317306868 Message started by streicker on Sep 29^th, 2011 at 7:34am