HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Datamonkey Server >> Datamonkey feedback >> Detailed HYphy settings of the Datamonkey selection analyses
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1361452551

Message started by Kim Roelants on Feb 21st, 2013 at 5:15am

Title: Detailed HYphy settings of the Datamonkey selection analyses
Post by Kim Roelants on Feb 21st, 2013 at 5:15am
Dear Sergei,

I have conducted a series of selection analyses using the SLAC, FEL and REL methods implemented on the DataMonkey server. For the present study, these 'stock' analyses are more than adequate to make my point. However, in the future, I would like to be able to use and interpret the more versatile settings implemented in the downloadable version of Hyphy 2.1. Following your helpful manual, I have tried to reproduce my Datamonkey REL results with Hyphy. The results however, are different: instead of 3 positively selected sites (Bayes factor >50) HyPhy finds none, using the very same input files. So I guess there must have been a difference in one of the other analysis settings. Datamonkey also identifed 17 negatively selected sites, but I don't know how to see the p-values for negative selection using Hyphy.

So I have the following questions:
- What are the detailed Hyphy settings of the REL, FEL and SLAC  analyses conducted by Datamonkey (apart from the input files, the DNA substitution model, and the singificance level?
- I guess that Datamonkey uses an MG94 codon model? But does it also include the 3x4 codon frequency setting?
- Does datamonkey use codon- or nucleotide-based branch length optimisation?
- I guess Datamonkey also assumes rate varation across synonymous substitutions (i.e. the dual setting). If so, how is this rate heterogenity modelled (gamma or GDD?)
- And how is the number of rate classes determined (in Hyphy, I have to predefine this number but in Datamonkey, the number of rate classes seems to vary across analyses of different input trees)?
- Finally, how can I use Hyphy's dNdSresultsprocessor.bf to obtain a summary table of site-specific results similar to those produced by Datamonkey? That is, including the sites that evolved under significant purifying selection?

Thanks in advance for your help,

Kim

Title: Re: Detailed HYphy settings of the Datamonkey selection analyses
Post by Sergei on Feb 25th, 2013 at 2:34pm
Hi Kim,

The fact that you are getting noticeably different results suggests that the analysis is unduly sensitive to the settings, and should probably not be used for the file at hand.
For instance, in the HyPhy run, were the Bayes factors for the 3 selected sites (from the datamonkey.org) analysis just below the significance level, or a lot below it (e.g. 15 vs 20, or 2 vs 20)?

To answer your specific questions

1). Datamonkey uses 3x4 frequency parameterizations under MG94x(abcdef) model, where abcdef depends on user selection at the analysis set up page.
2). Datamonkey uses nucleotide-based branch lengths
3,4). Datamonkey uses 3x3 GDD rate variation. Sometimes a few of these rate classes have 0 weights (especially for smaller datasets)
5). dNdSresultsprocessor.bf cannot generate the same tables as what you see in Datamonkey, sorry.
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
Finally, I would strongly encourage you to use the new FUBAR method in place of REL (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login, also see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login).

Sergei


HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.