HyPhy message board - Odd results for low divergence data sets with REL

	Welcome, Guest. Please Login

Home

Help

HyPhy message board › HYPHY Package › HyPhy feedback › Odd results for low divergence data sets with REL

(Moderators: Sergei, Simon)

‹ Previous Topic | Next Topic ›

Pages: 1

Send Topic

Odd results for low divergence data sets with REL (Read 1519 times)

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Odd results for low divergence data sets with REL
Jul 14^th, 2005 at 6:32pm

This is in response to an e-mail query.

Symptoms:

1) A data set of low divergence (e.g. 0.005 mean substitutions / nucleotide/unit time) is analyzed with SLAC, FEL and REL.

2). SLAC and FEL produce sensible results, but REL does not (e.g. calls sites with very little or no observed variation selected).

Workarounds:

First try the basics outlined in Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login

For datasets with low divergence (especially if the sequences are also short), REL is going to infer parameters with large errors (they can be quantified if needed), and empirical Bayes inference may be flawed.

To reduce type I error try to classify sites as positively selected only if MEAN posterior ratio dN/dS>1 (or MEAN difference dN-dS>0) AND posterior probability for the site is high. Posterior means can be computed using Standard Analyses->CodonSelectionAnalyses->dNdSResultProcessor.bf with the Posterior ratio option.

Another option is to use conservative Empirical Bayes analyses. Consider the example in which 2x2 dS, dN Dual model was fitted to the data and the following values were inferred for rates:

dS_1 = 0.0 (P=0.67)
dS_2 = 3.05 (P=0.33)

dN_1 = 0.12 (P=0.83)
dN_2 = 2.05 (P=0.17)

In this distribution there are 2 rate classes (out of 4) for which dN>dS, namely
dN = 2.05, dS = 0.0 and dN=0.12, dS = 0.0. Note that sites which have a high posterior probability to belong to the second class are technically going to be classified as positively selected, but are most likely false positives. Indeed, because of large parameter estimation errors, we can't be sure that 0.12 (dN) and 0.0 (dS) are really different.

In this example, one could reduce the error by computing the posterior probability of dN>dS using only the class with dN = 2.05 and dS = 0.0.

Instructions on how to do a similar task (reloading the .fit files written out by the dNdSRateAnalysis) through the GUI can be found in the HyPhy tutorial Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login on pages 26-31.

We will soon add a module to automatically assess parameter estimation errors and reduce the need for this type of manual verification.

Sergei

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Pages: 1

Send Topic

‹ Previous Topic | Next Topic ›

« Home

‹ Board

Top of this page