Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Pages: 1 2 
Review methods for dN and dS on trees (Read 10257 times)
Sarah
Guest


Review methods for dN and dS on trees
Sep 6th, 2006 at 10:15am
 
I'm new to HyPhy and have been trying to understand the array of different methods for inferring dN and dS on trees. I've looked at the static HyPhy documentation, messages here, and the batch files themselves, and I'm still unclear.

In particular, I was wondering if you could explain the differences between:
  • dNdSRateAnalysis.bf: What exactly is being done here? What does it mean to test "all available models"? What are the default initial value options? How long should this analysis take? I've been running it for MG94-HKY85 on 253 sequences (984 nucs each) with a 3x3 gamma dist for syn and nonsyn on my Intel iMac; three models have been tested in 90 h.
  • dNdSpost.bf
  • post_sns.bf
  • Loading a partitioned codon file, assigning a tree MG94xHKY85..._Rates, building a likelihood function, and optimizing the function as described on p. 27 of the "HyPhy: Hypothesis Testing Using Phylogenies" documentation.
  • Because of my calculations I can't see the Standard Analyses menu now, but I believe there are similar-looking dNdS calculators under the Codon and Positive Selection submenus. A few lines on each (or references to where they are described) would be very appreciated.

Please feel free to direct me to explanatory sources I might have missed. Apologies for the broad questions--I can be more specific about what I'm trying to calculate, but I thought others might benefit from an overview too.

Thanks,
Sarah
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: Review methods for dN and dS on trees
Reply #1 - Sep 6th, 2006 at 12:49pm
 
Dear Sarah,

Hi, I'm covering for Sergei this week while he's abroad.

Regarding the batch file dNdSRateAnalysis.bf:
• HyPhy attempts to fit one of several available codon substitution rate models to your data.  The parameters of the codon rate matrix are estimated as either global variables, or local to each branch of your tree. 
• By selecting "Run all available models" at the "Rate Variation Options" window, HyPhy will iterate through all five possible models of rate variation across sites (starting at line 530) so that you can compare their likelihoods.
• The option to select between default or randomized initial values for the rate distribution parameters (line 407) actually appears to be an unused option -- possibly left-over from earlier versions of the batch file.
• Given the number of sequences that you are trying to fit, and the complexity of the model that you're fitting, the amount of time elapsed doesn't seem unreasonable for your computer.  (There's a lot of parameters being fit to a lot of data!)  You might want to keep an eye on the messages.log file for error messages, just to be sure that nothing's gone awry.

I can't find the batch file dNdSpost.bf anywhere on my computer!  Is this from an earlier version of HyPhy?

As far as I can tell, post_sns.bf is a post-processing batch file that generates and displays trees based specifically on either synonymous or non-synonymous rates of substitution that were estimated from fitting a codon model.  In other words, one would use this after executing something like dNdSRateAnalysis.bf.

The step-by-step procedure for fitting a codon model to a codon partition is, as you say, in the documentation.  This is basically equivalent to dNdSRateAnalysis.bf, except executed through the HyPhy GUI.

Under the "Codon Selection" submenu, there is a batch file dNdSBivariateRateAnalysis.bf which applies the site-specific estimation of synonymous substitution rates described in Kosakovsky Pond and Muse (1995) MBE 22(12):2375.

Under the "Positive Selection" submenu, the batch file QuickSelectionDetection.bf applies methods described in Kosakovsky Pond et al. (2005) MBE 22(5): 1208.  This first fits a codon model to the data in a similar manner to dNdSRateAnalysis.bf before reconstructing ancestral states for inferring the number of NS and S substitutions per site.

Okay!  I'm sure Sergei could provide a much more informed overview, but I hope this helps you out a bit.  Lemme know if you need more detailed explanations on stuff.

- Art.

Back to top
 
 
IP Logged
 
Sarah
Guest


Re: Review methods for dN and dS on trees
Reply #2 - Sep 7th, 2006 at 11:38am
 
Thanks, Art! I'm still trying to learn the HyPhy batch language (and some of the techniques in this area) and really appreciate your help.

On dNdSRateAnalysis.bf:
  • Though dNdSRateAnalysis.bf compares AIC for different models of nonsyn and syn substitution rates, I still had to choose between MG94xREV v. MG94 v. MG94xHKY85, etc. I could/should have used CodonModelCompare.bf to choose the best instantaneous rate matrix model for the dNdSRateAnalysis.bf, right? (In other words, dNdSRateAnalysis.bf does not consider the appropriateness of my rate matrix.)
  • To obtain bootstraps and variance for dN and dS values on branches, one uses bootstrap.bf?
  • I also wanted to check that it is tractable to create a partition and recalculate dN/dS for every branch with respect to that partition.

On dNdSBivariateRateAnalysis.bf:
  • It seems to me that the last two models ("Dual" and "Lineage Dual") of dNdSRateAnalysis are similar, if not identical, to the bivariate rate analyses described in Kosakovsky Pond & Muse (2005) MBE 22(12):2375. It's confusing that there's a separate batch file for the bivariate analyses. Does dNdSBivariateRateAnalysis.bf do anything different from the last two models of dNdSRateAnalysis? Does the bivariate version automatically use discrete distributions instead of gamma distributions? I don't recall being prompted to choose which to use.

dNdSpost.bf was referenced here: Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

My goal is to compare dN/dS rates across the branches of a tree, comparing results from partitioned and nonpartitioned datasets. I hope to do this with the bivariate analyses and bootstrap the confidence intervals. There are other files that also look like they could assess significant differences between dN/dS on branches: BranchClassDNDS.bf, SelectionLRT.bf, TestBranchDNDS.bf, MRPositiveSelection.bf, SubtreeSelectionComparison.bf. I don't quite see their purpose if I can show differences in dNdSBivariateRateAnalysis.bf. Do they differ hugely in assumptions or power? Are they documented anywhere other than the message boards and the HyPhy user manuals?

I have also run FEL on the data to obtain selected codons. Is there an easy way to map inferred substitutions on the tree? Is there a way to map the (dis)appearance of certain motifs on the tree?

Thanks again so much for your help.

Sarah

p.s. Computer froze after 98 h of dNdSRateAnalysis.bf, so I restarted with dNdSBivariateRateAnalysis.bf 15 h ago. Two odd things: first, the program timer isn't running, though the "LF Optimization. Value X and Y evals/sec" changes every hour or two, and there's occasionally movement on the progress bar. Second, the status light remains yellow. I encountered similar issues with CodonModelCompare.bf, except there I had no signs in the first 20 min or so that anything was running after the first min. Is this a problem with my computer (2 Ghz Intel iMac), the latest Universal Binary version of HyPhy, or am I misinterpreting something?
Back to top
 
 
IP Logged
 
Sarah
Guest


Re: Review methods for dN and dS on trees
Reply #3 - Sep 7th, 2006 at 11:56am
 
Regarding the difference between dNdSRateAnalysis.bf and dNdSBivariateRateAnalysis.bf, one comment that concerns me is

Quote:
Dear Albert,

I don't think dSdNtree will work properly with the general bivariate analysis, because this analysis uses a fundamentally different way to set up the likelihood function...; I'll take a look into the cause of the crash and get back to you.

Cheers,
Sergei

(from Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login)

This suggests there are major differences between the two implementations, but I can't find where they are described.

Thanks again,
Sarah
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: Review methods for dN and dS on trees
Reply #4 - Sep 7th, 2006 at 2:21pm
 
Dear Sarah,

• Sure, it seems reasonable to me to apply CodonModelCompare.bf before running dNdSRateAnalysis.bf.

• I don't think bootstrap.bf will yield what you're looking for.  Instead, try simpleBootstrap.bf (please refer to Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login).

• Again, a partition is handled like any other data set filter in HyPhy, so there's no reason why it shouldn't work with any batch file.

More later, =)
- Art.
Back to top
 
 
IP Logged
 
Simon
Global Moderator
*****
Offline


Simon's Very Bad Picture

Posts: 29
San Diego, CA
Gender: male
Re: Review methods for dN and dS on trees
Reply #5 - Sep 7th, 2006 at 2:23pm
 
Dear Sarah,

Phew! That's a lot of questions...I thought that I would chip in too...

Quote:
My goal is to compare dN/dS rates across the branches of a tree, comparing results from partitioned and nonpartitioned datasets. I hope to do this with the bivariate analyses and bootstrap the confidence intervals. There are other files that also look like they could assess significant differences between dN/dS on branches: BranchClassDNDS.bf, SelectionLRT.bf, TestBranchDNDS.bf, MRPositiveSelection.bf, SubtreeSelectionComparison.bf. I don't quite see their purpose if I can show differences in dNdSBivariateRateAnalysis.bf. Do they differ hugely in assumptions or power? Are they documented anywhere other than the message boards and the HyPhy user manuals?


We had a paper looking at selection varying across both branches and across sites Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login, although unlike the 'branch-sites' sorts of models, we assumed that the distributions across branches and sites were independent. Is this what you want?

MFPositiveSelection is an old batch file used to estimate dN/dS jointly across multiple datasets (like HIV sequences from different infected individuals). The different branch batch files really depend on which branches you want to look at; in the tips of the tree, an isolated branch, a subtree, or some more complicated setup.

Best
Simon
Back to top
 

Simon D.W. Frost, D.Phil.&&Senior Lecturer&&Department of Veterinary Medicine&&University of Cambridge
WWW WWW  
IP Logged
 
Simon
Global Moderator
*****
Offline


Simon's Very Bad Picture

Posts: 29
San Diego, CA
Gender: male
Re: Review methods for dN and dS on trees
Reply #6 - Sep 7th, 2006 at 2:46pm
 
Dear Sarah,

It's probably a reasonable approximation to estimate the nucleotide biases using a nucleotide mode (NucModelCompare) rather than the codon model (CodonModelCompare), which will be much faster. I'd be interested to hear if they came up with different answers for the best fitting nucleotide model.

Rather than estimate confidence intervals using bootstrap, I would use profile likelihood confidence intervals, which is going to be much faster.

If you know which branch(es) to look at, try using TestBranchDNDS, which will allow you to specify dual rate variation (using two independent beta-gamma distributions), a custom nucleotide bias, and to specify one or more branches, although I would start assuming no site-to-site rate variation first, if that is just a nuisance term. Following the analysis, you can then plot dN and dS trees (Analyses>Results>Syn and nonsyn trees), obtain profile likelihood confidence intervals (under Analyses>Results>Variance Estimates) etc.

Best
Simon
Back to top
 

Simon D.W. Frost, D.Phil.&&Senior Lecturer&&Department of Veterinary Medicine&&University of Cambridge
WWW WWW  
IP Logged
 
Sarah
YaBB Newbies
*
Offline



Posts: 47
Re: Review methods for dN and dS on trees
Reply #7 - Sep 7th, 2006 at 4:44pm
 
Hi, Art & Simon,

First, your GA approach is impressive and ideally what I would do, but I don't have the skills/time at the moment. If these preliminary analyses are suggestive, I would like to pursue it. (I assume it's not implemented anywhere in HyPhy.)

TestBranchDNDS.bf could be bad here: I don't want to specify branches a priori because I expect a lot of variation in selection across the tree. I was hoping that dNdSBivariateRateAnalysis.bf (or the last model in dNdSRateAnalysis.bf) would provide a way around this problem. Lineage Dual is obviously not as nice as pulling from discrete categories of omegas, but it's a start... though I'm not sure if it's what I'm running now... or if anything's running... or exactly what form the output will take. It would definitely be good to run independent branch and site models and compare them (I suspect I'll need both branch and site variation). Will later try the free ratio model in PAML.

I'll test NucModelCompare.bf and CodonModelCompare.bf once dNdSBivariateRateAnalysis.bf has finished.

Thanks again for your help; I wish I weren't so new to this. Rather than peppering you with more questions, I will focus on understanding the batch code and linking the literature to the options in the software.

Thanks again,
Sarah
Back to top
 
 
IP Logged
 
Simon
Global Moderator
*****
Offline


Simon's Very Bad Picture

Posts: 29
San Diego, CA
Gender: male
Re: Review methods for dN and dS on trees
Reply #8 - Sep 7th, 2006 at 5:02pm
 
Dear Sarah,

The GA branch analyses are available for download separately Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login. This approach is only feasible for a small number of sequences (I'd say 20 or less, in which case you could run the analyses on our cluster - it's very easy), and I'd use no site-to-site rate variation to begin with.

Best
Simon
Back to top
 

Simon D.W. Frost, D.Phil.&&Senior Lecturer&&Department of Veterinary Medicine&&University of Cambridge
WWW WWW  
IP Logged
 
Simon
Global Moderator
*****
Offline


Simon's Very Bad Picture

Posts: 29
San Diego, CA
Gender: male
Re: Review methods for dN and dS on trees
Reply #9 - Sep 7th, 2006 at 5:27pm
 
Dear Sarah,

One more thing; you can fit models allowing different dN/dS classes for each branch in HyPhy by specifying 'local' models rather than 'global' models. This is easy to do in the graphical user interface - if the model you want isn't there, you can make your own up (see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login). You can fit a 'local' model, with separate dN/dS categories per branch in much the same way as described in one of Sergei's excellent tutorials (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login).

Have fun learning about HyPhy!
Simon
Back to top
 

Simon D.W. Frost, D.Phil.&&Senior Lecturer&&Department of Veterinary Medicine&&University of Cambridge
WWW WWW  
IP Logged
 
Sarah
YaBB Newbies
*
Offline



Posts: 47
Re: Review methods for dN and dS on trees
Reply #10 - Sep 13th, 2006 at 12:43pm
 
Thanks again for all the help. I've been doing a lot of rereading and my earlier confusion is slowly resolving.

Is the GA then your only "branch-site" model, i.e., model that infers which codons have been selected on particular branches?

Is there a reference for dNdSBivariateRateAnalysis.bf?

I've a single Athlon processor chugging away on CodonModelCompare.bf. I'll let you know the results when they're in.

Sarah

p.s. As an aside, I started a 13 sequence, 984 bp job running on your GA server on Friday. Between Friday and Tuesday, I never got beyond "This page will update every 15 minutes until your program starts running" page, even though it was clearly running when I checked the job queue, and I refreshed my browser's cache regularly. Today the same page yields "Not found" and the job is no longer running. Did I lose the results? (Did they ever exist?)
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Review methods for dN and dS on trees
Reply #11 - Sep 13th, 2006 at 1:12pm
 
Dear Sarah,

GA-Branch is not a branch-site method, it looks for signatures of alignment-wide selection at the level of a single branch. You can include site-by-site rate variation as well, but it will be independent of the branch-by-branch rate variation (i.e. a 'slow' site will be slow in all branches of the tree). I am not sure why your analysis died - would you please try it again and let me know if the problem persists?

I did implement a PAML style Branch-Site model upon request (search the message boards way back and you should find the reference), in case you want to try that.

There is no reference to dNdSBivariateRateAnalysis.bf because it is an unpublished (yet) method - it is very similar in spirit to dNdSRateAnalysis.bf (which is described in the paper we wrote with Spencer Muse in MBE), except that dN and dS are no longer assumed to come from independent distributions, but rather from a general discrete bivariate distribution. The key difference is that dN and dS can (and will in general) co-vary in the dNdSBivariateRateAnalysis.bf analysis. Another thing about this analysis is that selection strength does not vary along the tree.

Based on what you want to do (compare alignment-wide dN/dS along branches between different data sets), a GA-branch style approach seems ideal. It will

(a). Free you from having to assign branches to rate classes a priori
(b). Avoid model overfitting as the free ratio model will almost certainly do (hence leading to very large variances in parameter estimates)
(c). Provide natural confidence intervals for dN/dS over branches averaged over models.

Let me know if I can be of further assistance.

Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sarah
YaBB Newbies
*
Offline



Posts: 47
Re: Review methods for dN and dS on trees
Reply #12 - Sep 13th, 2006 at 5:13pm
 
Dear Sergei,

I just ran a GA job on the same sequences without problems. The program and output formats are great! I'm looking into running the program on a local cluster so I can see the effects of site-by-site variation and different topologies.

Argh, you're (obviously) right about the GA model not being branch-site--don't know what I was thinking. It seems that one could, in theory, run FEL on the same tree used for GA, and map codon substitutions to the tree to see on which branches selection may have occurred at particular codons--but this clearly does not yield the same precision claimed by the branch-site model. That said, I am wary of false positives in the branch-site model.

Thanks,
Sarah

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Review methods for dN and dS on trees
Reply #13 - Sep 14th, 2006 at 12:14pm
 
Dear Sarah,

Let me know if you run into problems running jobs locally.

Indeed, one could use FEL on the same tree to look for codons under selection post-hoc (i.e. use the GA to find putatively selected branches, then use FEL to test for selection on those branches), but there are some statistical issues of bias here, whereby one uses the same data to first formulate a hypothesis (i.e. find branches under selection) and then test for evidence for/against it using the same data. It's permissible for exploratory data analysis, though.

Alternatively, you could use SLAC/FEL to infer substitution histories for each codon, and then correlate them with the branches under selection globally. However, this is also mostly an exploratory analysis, without much statistical rigor.

Generally speaking, there is not very much power in any branch-site type method, even if the 'foreground' branch is specified a priori (as shown by poor power in simulation studies by Yang et al in their 2005 MBE paper). Also, there is a problem of model mis-specification (i.e. postulating which branches are under selection).

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sarah
YaBB Newbies
*
Offline



Posts: 47
Re: Review methods for dN and dS on trees
Reply #14 - Sep 17th, 2006 at 1:20pm
 
I don't have administrative privileges on the cluster where we're trying to install the GA batch file, but so far the administrators haven't run into any preliminary compilation problems. In the meantime, I've tried to run a few more analyses on your server. The queue has remained the same since Friday, and one analysis (started Thursday, #7627?) seems stuck on generation 100--the progress page hasn't changed. Am I using too many system resources? Again, I plan to run locally soon.

I've submitted overlapping sequence files in the GA analysis, e.g. A, A+B, B, B+C, etc. I've noticed that a few of the branches get quite different results between the single-group analysis (A) to the paired analysis (A+B). I was hoping to explore this problem when running the GA routine locally first by increasing the number of categories. Is that possible? If not, could I constrain a few? It seems like it would also help to constrain analysis to a specific nucleotide model and provide consistent topologies, rather than relying on NJ. Does this seem reasonable?

Sarah

p.s. The FEL would be exploratory only; I plan to use the percentiles generated by the GA to infer statistical support for dN>dS on particular branches (I do have a hypothesis which ones will be selected). That looks like the approach you took in your 2005 MBE paper.
Back to top
 
 
IP Logged
 
Pages: 1 2