Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Bayes Factor per site from dNdSRateAnalysis.bf (Read 3591 times)
mbendall
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 2
Bayes Factor per site from dNdSRateAnalysis.bf
Jul 28th, 2009 at 2:22pm
 
I have been using HyPhy to run REL analysis on multiple datasets.  However, there are two things that have been giving me difficulty.

1.  In several datasets, we get a Bayes Factor (BF) of infinity for one or all codons.
2.  When using the same data, The BF we get from HyPhy is not concordant with the BF from Datamonkey.

In the first case, this may be a division by zero problem.  How would I interpret or fix this?

This is our method:
1.  Run dNdSRateAnalysis.bf.  The settings we used are MG94xREV, Dual, Independent Discrete and Random starting values.  For branch lengths we have experimented with optimized branch lengths, estimated with nucleotide model and fixed branches from the tree.
2.  Run dNdSResultProcessor.bf.  We choose to find positive selection using the Bayes factor under the dual rate variation model.

I assumed that the output from dNdSResultProcessor would be the BF at each codon, however these values disagree with the values from Datamonkey (with the user-supplied tree and GTR).  This leads me to wonder whether the dNdSResultProcessor is outputting the BF or some other test statistic.

Here is an example from one dataset.  This is the BF found at the same codon by (HyPhy,Datamonkey)
(20.2221,199.879)
(21.6825,907.541)
(21.4324,714.943)
(21.4532,728.953)
(21.5728,829.18)

Any help on this issue would be greatly appreciated!
Back to top
 
 
IP Logged
 
wayne
Global Moderator
*****
Offline


I love YaBB 1G - SP1!

Posts: 57
San Diego, CA
Gender: male
Re: Bayes Factor per site from dNdSRateAnalysis.bf
Reply #1 - Jul 29th, 2009 at 11:04pm
 
Hi mbendall, we are surprised to see such large differences in bayes factors between the two implementations. Small differences are expected if the number of dN and dS rate classes differ between the two implementations. Currently, this is chosen by the user in the dNdSRateAnalysis.bf batch file, and determined by the size of alignment on datamonkey. Can you provide more info as to the size of your dataset.

Please would you also either provide your upload IDs or send an email with your data included so we can identify the problem.

cheers ./w
Back to top
 

Assistant Project Scientist&&Antiviral Research Center&&Department of Pathology&&University of California, San Diego
WWW WWW  
IP Logged
 
mbendall
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 2
Re: Bayes Factor per site from dNdSRateAnalysis.bf
Reply #2 - Jul 30th, 2009 at 11:47am
 
Thanks for your reply.  Here are the Datamonkey run IDs:

SIM_314 dataset.  23 taxa, 99 codons.  DM upload:  975228257827604.1
SIM_417 dataset.  9 taxa, 99 codons.  DM upload:  811155365827277.1

I have attached the dNdSRateAnalysis -> dNdSResultProcessor output that I recieved.  There is a README file describing the contents of this folder.

Regarding the number of rate classes, does Datamonkey always assume 9 rate classes?  I tried running dNdSRateAnalysis using the same number of rate classes as determined by Datamonkey (4 dS, 5 dN for SIM_314; 2 dS, 7 dN for SIM_417) with similar results to the other HyPhy analyses (low BFs).

Thanks again for your help.
Matthew
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (43 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Bayes Factor per site from dNdSRateAnalysis.bf
Reply #3 - Jul 30th, 2009 at 12:18pm
 
Dear Matthew,

Datamonkey implementation of REL uses 3x3 rates.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
wayne
Global Moderator
*****
Offline


I love YaBB 1G - SP1!

Posts: 57
San Diego, CA
Gender: male
Re: Bayes Factor per site from dNdSRateAnalysis.bf
Reply #4 - Jul 30th, 2009 at 3:12pm
 
Hi Matthew, the answer to your question is "no data, no happiness". No seriously the second dataset (417) has only 11 substitutions, 9 of which are non-synonymous. I ran this dataset through another procedure which uses a stepwise approach to identify the number of rate classes supported by the data (BivariateCodonRateAnalysis.bf) and it returned only 1 rate class with dN/dS = 2 (CI = 0.98 to 4).

The first dataset (314) only supports two rate classes using the same stepwise procedure as above. These rate classes are purifying and positive selection (dN/dS = 0, weight = 0.77; dN/dS = 3.8, weight = 0.23). Again, the lower numbers of synonymous substitutions (inferred using SLAC) suggest it is difficult to accurately infer synonymous rates. This is particularly evident since multiple classes on datamonkey analysis have 0 weights. The different values observed for the HyPhy script is the result of the start values being randomised, whereas they are set as default values on datamonkey. In general, there is not sufficient data to accurately infer the 3x3 rate classes which makes both analyses unstable.

As a side we will be replacing the REL analysis with the BivariateCodonRateAnalysis.bf used above (soon Wink.) This is a better approach for approximating dN/dS distributions across sites. REL performs poorly when there is insufficient data. In the meantime, you may want to consider using the latest "unreleased" version of HyPhy which includes BivariateCodonRateAnalysis.bf. Available @ Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login.

cheers ./w
Back to top
 

Assistant Project Scientist&&Antiviral Research Center&&Department of Pathology&&University of California, San Diego
WWW WWW  
IP Logged