Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
PARRIS versus SLAC/FEL/REL for very small datasets (Read 4261 times)
Dan Fulop
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 29
PARRIS versus SLAC/FEL/REL for very small datasets
Jan 7th, 2011 at 2:49pm
 
Hi,

I am wondering which method is best to look for positive selection in a comparative transcriptome dataset with very few species and very low sequence divergence.

I have a set of codon alignments for several thousand genes from 4 ingroup species, one of which is represented by two different varieties for a total of 5 ingroup terminals.  There is a relatively far outgroup that we've added to an equivalent set of codon alignments, so in those alignments we have 6 terminals.

I'm trying to decide whether to run PARRIS (without partioning), because of the very small size and low sequence divergence in the data, or SLAC and FEL with a high nominal alpha level (say 0.25 or 0.3).  Maybe all of the above is best?

Given time limitations on this study (i.e. we need to get it done within the next month) I was not planning to use REL.  Currently we have ~14,000 alignments, although I may filter these down with more stringent 1-to-1 orthology assignments and dropping all alignments with less than 5 or 10 SNPs.

Any feedback would be greatly appreciated.
Thanks,
Dan.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #1 - Jan 7th, 2011 at 5:06pm
 
Hi Dan,

In my experience, there is very little power to detect selection at ANY GIVEN site with 6 taxa. I would run PARRIS (which is actually just a differently parameterized REL) to test for evidence of selection SOMEWHERE in the gene (i.e. at > 0 sites). I would also run the entire collection of genes through the evolutionary fingerprinting analysis (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) and then cluster genes by their evolutionary process to see if you find anything interesting.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Dan Fulop
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 29
Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #2 - Jan 7th, 2011 at 5:21pm
 
Hi Sergei,

Thanks again for the fast response.  I'll read the fingerprinting paper.

One last question, would PARRIS and the fingerprinting run faster in MP or MPI?

Thanks,
Dan.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #3 - Jan 7th, 2011 at 5:23pm
 
Hi Dan,

In your case you should try to run the whole thing in MPI -- I need to check if genomefitters are MPI enabled -- because you simply spin off each gene onto a separate processor.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Dan Fulop
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 29
Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #4 - Jan 7th, 2011 at 5:30pm
 
Hi Sergei,

If there's an issue I could code it without using genomefitters, but it does seem like that pipeline is well-suited for my needs.

If you can, please let me know if it's MPI enabled.

Thanks,
Dan.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #5 - Jan 9th, 2011 at 4:49pm
 
Hi Dan,

The pipeline is not MPI enabled. It may be simplest to split your input genes into groups (e.g. 1/8 into each) and then run 8 instances of HYPHYMP with 2 processors each on each of the group. Alternatively, if you are really pressed for time, it may be possible to run the jobs on our UCSD cluster (with ~420 cores). Let me know if you are interested in the latter option.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Dan Fulop
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 29
Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #6 - Jan 9th, 2011 at 10:29pm
 
Hi Sergei,

YES, we are really strapped for time (it's a long story...) and we would greatly appreciate being able to run the analyses on your cluster.

By the way, I was really blown away by the fingerprinting paper!  I really liked the efficiency of parameterization of the GB models, as well as the heuristics for optimizing the number of rate classes and approximating the ESD posterior using SIR.  It made me wonder if it's worth it to recode a bunch of "standard" REL-type dN/dS methods with GB models; my naive interpretation is that, apart from other potential benefits, these may have better power with small datasets due to the lower number of parameters.

I'm really eager to apply the fingerprinting method to our data.  As a matter of fact, we were already going to look for correlations between gene expression clustering (we have species and tissue specific data) and positive selection.  The fingerprinting gives us a more meaningful evolutionary measure for comparisons!  It would be great to see if genes with similar tissue or species variation have or don't have similar evolutionary fingerprints.

We already have codon alignments for our ingroup taxa, and we will have the alignments with the outgroup (i.e. 6th) sequence very soon.  It may be worth running the jobs with the ingroup-only data in the mean time and redo them later with all 6 taxa, in part because we'll be throwing out a bunch of genes that don't have a clear 1:1 ortholog in the outgroup taxon.  I know, there are perils to running these analyses with yet one fewer sequence.

Thanks so much for your offer regarding the cluster!
-Dan.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PARRIS versus SLAC/FEL/REL for very small datasets
Reply #7 - Jan 10th, 2011 at 10:20am
 
Hi Dan,

Drop me an e-mail and we can continue from there. The evolutionary fingerprinting pipeline is set up on the cluster (I'll send you a link for the documentation); it can make use of MPI. The best way to prepare your data is to put individual alignments into separate files.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged