Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Codon relative rate test questions (Read 1424 times)
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Codon relative rate test questions
Jun 27th, 2005 at 11:39am
 
I received this set of questions via e-mail:

Quote:
Thanks for your help.  I took the time to read the MG94 MBE paper, and indeed
the LRS, LRN, and LRB tests are exactly what I'm looking for.  However, it is
not entirely clear to me how one conducts these tests using the command-line
version of HyPhy, and I didn't find it documented anywhere.



I am acutely aware of the need to update the documentation (which is effectively still the 2000 version). Sadly, there is simply not enough time to do this well. These discussion forums are hopefully an adequate stop-gap, until the time we are able to address the documentation issue.

Quote:
I found a bug or
two along the way, as well.  

Here's what I'm doing in HYPHY, along with my bug reports, questions, etc.

./HYPHY

      (11) Relative Rate

       (1) Use relative rate test on three species and a variety of standard models

       (2):Codon (several available genetic codes).

       (1):Universal code. (Genebank transl_table=1).

input codon data, choose outgroup

""
       (MG94):Muse-Gaut 94 codon model. Local or global parameters. Possible
Rate heterogeneity (and HM spatial correlation).
       (1):All model parameters are estimated independently for each branch.

-> I assume this conducts the LRB test in MG94 MBE?  This works fine.



This is indeed the exact model from the original paper.

Quote:
.OR.
""
       (MG94X2):Muse-Gaut 94+custom nucleotide biases codon model. ...

-> Somehow, I presume this permits LRS and LRN tests?  Note that selection of

Distribution Options 1 or 2 (gamma rate dists.) crashes the program, as the
relative location of the gamma1.def file is set incorrectly.

       (3):Independent General Discrete Distributions (Recommended setting)

Number of synonymous rate classes (>=2):3

Number of non-synonymous rate classes (>=2):3



Actually, this model (undocumented, of course Tongue) simply extends the MG94 model to allow for nucleotide bias corrections and for site-to-site rate variation in both syn. and non-syn. rates. Given that you only have 3 sequences for the test, there is very little data to fit a rich distribution of site-to-site rates. I would recommend using 2x2 at most, or the MG94custom model (with local options) to allow for nucleotide biases

Quote:
Please enter a 6 character model designation (e.g:010010 defines HKY85):010010

 -> Okay, here is another question: where are the 6 character model
designations defined?  I searched the documentation, grep'ed the code, etc., and
couldn't find it.  I believe I'd like to just use something simpler/quicker,
e.g. K2P, here.



The notation is semi-standard in model selection circles (e.g. see this posting Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login). I agree that K2P, F81, TN93 would be more convenient to people who know the nomenclature, but the string notation allows one to specify any of the 203 cases of the GTR, and explicitly read off what the rates are (once you get past the notation!).

Quote:
-> So, I let the model run for a long time (in truth, I have about 1400 genes I
need to test, and I have to modify the .bf, .c, etc. code to automate the
process, which I understand how to do; any suggestions regarding a possible way
to speed up the calculations, though (different nt sub model?), would be very
appreciated).


I would suggest you use the MG94custom model with local options and perhaps the 010010 (HKY85) model string. In all honesty, there is very little power to infer any site-to-site rate variation from 3 sequences (apart from separating the sites which change from the ones that don't), so a constant rate model should do fine.

Quote:
I couldn't find the output description documentation anywhere (BTW, the
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login link found on
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login is broken, which I
was hoping might have an explanation), and it's not intuitively obvious to me
what all the results are, either.  I see a single P-value; what test would that
correspond to?  Is it LRB?  How to get P-values for LRS, LRN?


When you choose a model with multiple parameters per branch (e.g. a 'local' codon model) you will see a dialog prompting you to constrain some or all the parameters. Your choice here determines what test is being run, and what the p-value means. E.g. when using MG94xcustom local model, you should see a dialog:
Code:
                        +--------------------------+
                        |Parameter(s) to constrain:|
                        +--------------------------+


        (1):All local model parameters are constrained
        (2):Constrain parameter nonSynRate
        (3):Constrain parameter synRate

 Please choose an option (or press q to cancel selection):

 


If you choose 'All' you will be running LRB (both), i.e. the test on branch lengths (overall amount of evolution); for synRate: LRS (synonymous rates only) and for nonSynrate: LRN (non-syn rates only).

The easiest way to script out-of-the box analyses is probably as suggested here: Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login.

Finally, thanks for the bug reports; I will try to fix pathnames and broken URLs ASAP.

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged