Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Odd results for low divergence data sets with REL (Read 1519 times)
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Odd results for low divergence data sets with REL
Jul 14th, 2005 at 6:32pm
 
This is in response to an e-mail query.

Symptoms:

1) A data set of low divergence (e.g. 0.005 mean substitutions / nucleotide/unit time) is analyzed with SLAC, FEL and REL.

2). SLAC and FEL produce sensible results, but REL does not (e.g. calls sites with very little or no observed variation selected).

Workarounds:

First try the basics outlined in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

For datasets with low divergence (especially if the sequences are also short), REL is going to infer parameters with large errors (they can be quantified if needed), and empirical Bayes inference may be flawed.

To reduce type I error try to classify sites as positively selected only if MEAN posterior ratio dN/dS>1 (or MEAN difference dN-dS>0) AND posterior probability for the site is high. Posterior means can be computed using Standard Analyses->CodonSelectionAnalyses->dNdSResultProcessor.bf with the Posterior ratio option.

Another option is to use conservative Empirical Bayes analyses. Consider the example in which 2x2 dS, dN  Dual model was fitted to the data and the following values were inferred for rates:

dS_1 = 0.0   (P=0.67)
dS_2 = 3.05 (P=0.33)

dN_1 = 0.12 (P=0.83)
dN_2 = 2.05 (P=0.17)

In this distribution there are 2 rate classes (out of 4) for which dN>dS, namely
dN = 2.05, dS = 0.0 and dN=0.12, dS = 0.0. Note that sites which have a high posterior probability to belong to the second class are technically going to be classified as positively selected, but are most likely false positives. Indeed, because of large parameter estimation errors, we can't be sure that 0.12 (dN) and 0.0 (dS) are really different.

In this example, one could reduce the error by computing the posterior probability of dN>dS using only the class with dN = 2.05 and dS = 0.0.

Instructions on how to do a similar task (reloading the .fit files written out by the dNdSRateAnalysis) through the GUI can be found in the HyPhy tutorial Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login on pages 26-31.

We will soon add a module to automatically assess parameter estimation errors and reduce the need for this type of manual verification.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged