HyPhy message board - Bayes Factor per site from dNdSRateAnalysis.bf

	Welcome, Guest. Please Login

Home

Help

HyPhy message board › HYPHY Package › HyPhy feedback › Bayes Factor per site from dNdSRateAnalysis.bf

(Moderators: Sergei, Simon)

‹ Previous Topic | Next Topic ›

Pages: 1

Send Topic

Bayes Factor per site from dNdSRateAnalysis.bf (Read 3591 times)

mbendall

YaBB Newbies

Offline

Feed your monkey!

Posts: 2

Bayes Factor per site from dNdSRateAnalysis.bf
Jul 28^th, 2009 at 2:22pm

I have been using HyPhy to run REL analysis on multiple datasets. However, there are two things that have been giving me difficulty.

1. In several datasets, we get a Bayes Factor (BF) of infinity for one or all codons.
2. When using the same data, The BF we get from HyPhy is not concordant with the BF from Datamonkey.

In the first case, this may be a division by zero problem. How would I interpret or fix this?

This is our method:
1. Run dNdSRateAnalysis.bf. The settings we used are MG94xREV, Dual, Independent Discrete and Random starting values. For branch lengths we have experimented with optimized branch lengths, estimated with nucleotide model and fixed branches from the tree.
2. Run dNdSResultProcessor.bf. We choose to find positive selection using the Bayes factor under the dual rate variation model.

I assumed that the output from dNdSResultProcessor would be the BF at each codon, however these values disagree with the values from Datamonkey (with the user-supplied tree and GTR). This leads me to wonder whether the dNdSResultProcessor is outputting the BF or some other test statistic.

Here is an example from one dataset. This is the BF found at the same codon by (HyPhy,Datamonkey)
(20.2221,199.879)
(21.6825,907.541)
(21.4324,714.943)
(21.4532,728.953)
(21.5728,829.18)

Any help on this issue would be greatly appreciated!

IP Logged

wayne

Global Moderator

Offline

I love YaBB 1G - SP1!

Posts: 57
San Diego, CA
Gender: male

Re: Bayes Factor per site from dNdSRateAnalysis.bf
Reply #1 - Jul 29^th, 2009 at 11:04pm

Hi mbendall, we are surprised to see such large differences in bayes factors between the two implementations. Small differences are expected if the number of dN and dS rate classes differ between the two implementations. Currently, this is chosen by the user in the dNdSRateAnalysis.bf batch file, and determined by the size of alignment on datamonkey. Can you provide more info as to the size of your dataset.

Please would you also either provide your upload IDs or send an email with your data included so we can identify the problem.

cheers ./w

Assistant Project Scientist&&Antiviral Research Center&&Department of Pathology&&University of California, San Diego

WWW

IP Logged

mbendall

YaBB Newbies

Offline

Feed your monkey!

Posts: 2

Re: Bayes Factor per site from dNdSRateAnalysis.bf
Reply #2 - Jul 30^th, 2009 at 11:47am

Thanks for your reply. Here are the Datamonkey run IDs:

SIM_314 dataset. 23 taxa, 99 codons. DM upload: 975228257827604.1
SIM_417 dataset. 9 taxa, 99 codons. DM upload: 811155365827277.1

I have attached the dNdSRateAnalysis -> dNdSResultProcessor output that I recieved. There is a README file describing the contents of this folder.

Regarding the number of rate classes, does Datamonkey always assume 9 rate classes? I tried running dNdSRateAnalysis using the same number of rate classes as determined by Datamonkey (4 dS, 5 dN for SIM_314; 2 dS, 7 dN for SIM_417) with similar results to the other HyPhy analyses (low BFs).

Thanks again for your help.
Matthew

Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: Bayes Factor per site from dNdSRateAnalysis.bf Reply #3 - Jul 30^th, 2009 at 12:18pm Dear Matthew, Datamonkey implementation of REL uses 3x3 rates. Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

wayne

Global Moderator

Offline

I love YaBB 1G - SP1!

Posts: 57
San Diego, CA
Gender: male

Re: Bayes Factor per site from dNdSRateAnalysis.bf
Reply #4 - Jul 30^th, 2009 at 3:12pm

Hi Matthew, the answer to your question is "no data, no happiness". No seriously the second dataset (417) has only 11 substitutions, 9 of which are non-synonymous. I ran this dataset through another procedure which uses a stepwise approach to identify the number of rate classes supported by the data (BivariateCodonRateAnalysis.bf) and it returned only 1 rate class with dN/dS = 2 (CI = 0.98 to 4).

The first dataset (314) only supports two rate classes using the same stepwise procedure as above. These rate classes are purifying and positive selection (dN/dS = 0, weight = 0.77; dN/dS = 3.8, weight = 0.23). Again, the lower numbers of synonymous substitutions (inferred using SLAC) suggest it is difficult to accurately infer synonymous rates. This is particularly evident since multiple classes on datamonkey analysis have 0 weights. The different values observed for the HyPhy script is the result of the start values being randomised, whereas they are set as default values on datamonkey. In general, there is not sufficient data to accurately infer the 3x3 rate classes which makes both analyses unstable.

As a side we will be replacing the REL analysis with the BivariateCodonRateAnalysis.bf used above (soon Wink

.) This is a better approach for approximating dN/dS distributions across sites. REL performs poorly when there is insufficient data. In the meantime, you may want to consider using the latest "unreleased" version of HyPhy which includes BivariateCodonRateAnalysis.bf. Available @ Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Assistant Project Scientist&&Antiviral Research Center&&Department of Pathology&&University of California, San Diego

WWW

IP Logged

Pages: 1

Send Topic

‹ Previous Topic | Next Topic ›

« Home

‹ Board

Top of this page