Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Detailed HYphy settings of the Datamonkey selection analyses (Read 2800 times)
Kim Roelants
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 1
Detailed HYphy settings of the Datamonkey selection analyses
Feb 21st, 2013 at 5:15am
 
Dear Sergei,

I have conducted a series of selection analyses using the SLAC, FEL and REL methods implemented on the DataMonkey server. For the present study, these 'stock' analyses are more than adequate to make my point. However, in the future, I would like to be able to use and interpret the more versatile settings implemented in the downloadable version of Hyphy 2.1. Following your helpful manual, I have tried to reproduce my Datamonkey REL results with Hyphy. The results however, are different: instead of 3 positively selected sites (Bayes factor >50) HyPhy finds none, using the very same input files. So I guess there must have been a difference in one of the other analysis settings. Datamonkey also identifed 17 negatively selected sites, but I don't know how to see the p-values for negative selection using Hyphy.

So I have the following questions:
- What are the detailed Hyphy settings of the REL, FEL and SLAC  analyses conducted by Datamonkey (apart from the input files, the DNA substitution model, and the singificance level?
- I guess that Datamonkey uses an MG94 codon model? But does it also include the 3x4 codon frequency setting?
- Does datamonkey use codon- or nucleotide-based branch length optimisation?
- I guess Datamonkey also assumes rate varation across synonymous substitutions (i.e. the dual setting). If so, how is this rate heterogenity modelled (gamma or GDD?)
- And how is the number of rate classes determined (in Hyphy, I have to predefine this number but in Datamonkey, the number of rate classes seems to vary across analyses of different input trees)?
- Finally, how can I use Hyphy's dNdSresultsprocessor.bf to obtain a summary table of site-specific results similar to those produced by Datamonkey? That is, including the sites that evolved under significant purifying selection?

Thanks in advance for your help,

Kim
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Detailed HYphy settings of the Datamonkey selection analyses
Reply #1 - Feb 25th, 2013 at 2:34pm
 
Hi Kim,

The fact that you are getting noticeably different results suggests that the analysis is unduly sensitive to the settings, and should probably not be used for the file at hand.
For instance, in the HyPhy run, were the Bayes factors for the 3 selected sites (from the datamonkey.org) analysis just below the significance level, or a lot below it (e.g. 15 vs 20, or 2 vs 20)?

To answer your specific questions

1). Datamonkey uses 3x4 frequency parameterizations under MG94x(abcdef) model, where abcdef depends on user selection at the analysis set up page.
2). Datamonkey uses nucleotide-based branch lengths
3,4). Datamonkey uses 3x3 GDD rate variation. Sometimes a few of these rate classes have 0 weights (especially for smaller datasets)
5). dNdSresultsprocessor.bf cannot generate the same tables as what you see in Datamonkey, sorry.
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
Finally, I would strongly encourage you to use the new FUBAR method in place of REL (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login, also see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login).

Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged