HyPhy message board - PARRIS versus SLAC/FEL/REL for very small datasets

Dan Fulop

YaBB Newbies

Offline

Feed your monkey!

Posts: 29

PARRIS versus SLAC/FEL/REL for very small datasets
Jan 7^th, 2011 at 2:49pm

Hi,

I am wondering which method is best to look for positive selection in a comparative transcriptome dataset with very few species and very low sequence divergence.

I have a set of codon alignments for several thousand genes from 4 ingroup species, one of which is represented by two different varieties for a total of 5 ingroup terminals. There is a relatively far outgroup that we've added to an equivalent set of codon alignments, so in those alignments we have 6 terminals.

I'm trying to decide whether to run PARRIS (without partioning), because of the very small size and low sequence divergence in the data, or SLAC and FEL with a high nominal alpha level (say 0.25 or 0.3). Maybe all of the above is best?

Given time limitations on this study (i.e. we need to get it done within the next month) I was not planning to use REL. Currently we have ~14,000 alignments, although I may filter these down with more stringent 1-to-1 orthology assignments and dropping all alignments with less than 5 or 10 SNPs.

Any feedback would be greatly appreciated.
Thanks,
Dan.

Back to top

IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #1 - Jan 7^th, 2011 at 5:06pm

Hi Dan,

In my experience, there is very little power to detect selection at ANY GIVEN site with 6 taxa. I would run PARRIS (which is actually just a differently parameterized REL) to test for evidence of selection SOMEWHERE in the gene (i.e. at > 0 sites). I would also run the entire collection of genes through the evolutionary fingerprinting analysis (Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login) and then cluster genes by their evolutionary process to see if you find anything interesting.

Sergei

Back to top

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Dan Fulop YaBB Newbies Offline Feed your monkey! Posts: 29	Re: PARRIS versus SLAC/FEL/REL for very small datasets Reply #2 - Jan 7^th, 2011 at 5:21pm Hi Sergei, Thanks again for the fast response. I'll read the fingerprinting paper. One last question, would PARRIS and the fingerprinting run faster in MP or MPI? Thanks, Dan.
Back to top	IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: PARRIS versus SLAC/FEL/REL for very small datasets Reply #3 - Jan 7^th, 2011 at 5:23pm Hi Dan, In your case you should try to run the whole thing in MPI -- I need to check if genomefitters are MPI enabled -- because you simply spin off each gene onto a separate processor. Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Dan Fulop YaBB Newbies Offline Feed your monkey! Posts: 29	Re: PARRIS versus SLAC/FEL/REL for very small datasets Reply #4 - Jan 7^th, 2011 at 5:30pm Hi Sergei, If there's an issue I could code it without using genomefitters, but it does seem like that pipeline is well-suited for my needs. If you can, please let me know if it's MPI enabled. Thanks, Dan.
Back to top	IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #5 - Jan 9^th, 2011 at 4:49pm

Hi Dan,

The pipeline is not MPI enabled. It may be simplest to split your input genes into groups (e.g. 1/8 into each) and then run 8 instances of HYPHYMP with 2 processors each on each of the group. Alternatively, if you are really pressed for time, it may be possible to run the jobs on our UCSD cluster (with ~420 cores). Let me know if you are interested in the latter option.

Sergei

Back to top

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Dan Fulop

YaBB Newbies

Offline

Feed your monkey!

Posts: 29

Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #6 - Jan 9^th, 2011 at 10:29pm

Hi Sergei,

YES, we are really strapped for time (it's a long story...) and we would greatly appreciate being able to run the analyses on your cluster.

By the way, I was really blown away by the fingerprinting paper! I really liked the efficiency of parameterization of the GB models, as well as the heuristics for optimizing the number of rate classes and approximating the ESD posterior using SIR. It made me wonder if it's worth it to recode a bunch of "standard" REL-type dN/dS methods with GB models; my naive interpretation is that, apart from other potential benefits, these may have better power with small datasets due to the lower number of parameters.

I'm really eager to apply the fingerprinting method to our data. As a matter of fact, we were already going to look for correlations between gene expression clustering (we have species and tissue specific data) and positive selection. The fingerprinting gives us a more meaningful evolutionary measure for comparisons! It would be great to see if genes with similar tissue or species variation have or don't have similar evolutionary fingerprints.

We already have codon alignments for our ingroup taxa, and we will have the alignments with the outgroup (i.e. 6th) sequence very soon. It may be worth running the jobs with the ingroup-only data in the mean time and redo them later with all 6 taxa, in part because we'll be throwing out a bunch of genes that don't have a clear 1:1 ortholog in the outgroup taxon. I know, there are perils to running these analyses with yet one fewer sequence.

Thanks so much for your offer regarding the cluster!
-Dan.

Back to top

IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: PARRIS versus SLAC/FEL/REL for very small datasets Reply #7 - Jan 10^th, 2011 at 10:20am Hi Dan, Drop me an e-mail and we can continue from there. The evolutionary fingerprinting pipeline is set up on the cluster (I'll send you a link for the documentation); it can make use of MPI. The best way to prepare your data is to put individual alignments into separate files. Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

	Welcome, Guest. Please Login