HyPhy message board | |
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> Review methods for dN and dS on trees http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1157562913 Message started by Sarah on Sep 6th, 2006 at 10:15am |
Title: Review methods for dN and dS on trees Post by Sarah on Sep 6th, 2006 at 10:15am
I'm new to HyPhy and have been trying to understand the array of different methods for inferring dN and dS on trees. I've looked at the static HyPhy documentation, messages here, and the batch files themselves, and I'm still unclear.
In particular, I was wondering if you could explain the differences between:
Please feel free to direct me to explanatory sources I might have missed. Apologies for the broad questions--I can be more specific about what I'm trying to calculate, but I thought others might benefit from an overview too. Thanks, Sarah |
Title: Re: Review methods for dN and dS on trees Post by artpoon on Sep 6th, 2006 at 12:49pm
Dear Sarah,
Hi, I'm covering for Sergei this week while he's abroad. Regarding the batch file dNdSRateAnalysis.bf: • HyPhy attempts to fit one of several available codon substitution rate models to your data. The parameters of the codon rate matrix are estimated as either global variables, or local to each branch of your tree. • By selecting "Run all available models" at the "Rate Variation Options" window, HyPhy will iterate through all five possible models of rate variation across sites (starting at line 530) so that you can compare their likelihoods. • The option to select between default or randomized initial values for the rate distribution parameters (line 407) actually appears to be an unused option -- possibly left-over from earlier versions of the batch file. • Given the number of sequences that you are trying to fit, and the complexity of the model that you're fitting, the amount of time elapsed doesn't seem unreasonable for your computer. (There's a lot of parameters being fit to a lot of data!) You might want to keep an eye on the messages.log file for error messages, just to be sure that nothing's gone awry. I can't find the batch file dNdSpost.bf anywhere on my computer! Is this from an earlier version of HyPhy? As far as I can tell, post_sns.bf is a post-processing batch file that generates and displays trees based specifically on either synonymous or non-synonymous rates of substitution that were estimated from fitting a codon model. In other words, one would use this after executing something like dNdSRateAnalysis.bf. The step-by-step procedure for fitting a codon model to a codon partition is, as you say, in the documentation. This is basically equivalent to dNdSRateAnalysis.bf, except executed through the HyPhy GUI. Under the "Codon Selection" submenu, there is a batch file dNdSBivariateRateAnalysis.bf which applies the site-specific estimation of synonymous substitution rates described in Kosakovsky Pond and Muse (1995) MBE 22(12):2375. Under the "Positive Selection" submenu, the batch file QuickSelectionDetection.bf applies methods described in Kosakovsky Pond et al. (2005) MBE 22(5): 1208. This first fits a codon model to the data in a similar manner to dNdSRateAnalysis.bf before reconstructing ancestral states for inferring the number of NS and S substitutions per site. Okay! I'm sure Sergei could provide a much more informed overview, but I hope this helps you out a bit. Lemme know if you need more detailed explanations on stuff. - Art. |
Title: Re: Review methods for dN and dS on trees Post by artpoon on Sep 7th, 2006 at 2:21pm
Dear Sarah,
• Sure, it seems reasonable to me to apply CodonModelCompare.bf before running dNdSRateAnalysis.bf. • I don't think bootstrap.bf will yield what you're looking for. Instead, try simpleBootstrap.bf (please refer to Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to ). • Again, a partition is handled like any other data set filter in HyPhy, so there's no reason why it shouldn't work with any batch file. More later, =) - Art. |
Title: Re: Review methods for dN and dS on trees Post by Simon on Sep 7th, 2006 at 2:23pm
Dear Sarah,
Phew! That's a lot of questions...I thought that I would chip in too... Quote:
We had a paper looking at selection varying across both branches and across sites Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to , although unlike the 'branch-sites' sorts of models, we assumed that the distributions across branches and sites were independent. Is this what you want? MFPositiveSelection is an old batch file used to estimate dN/dS jointly across multiple datasets (like HIV sequences from different infected individuals). The different branch batch files really depend on which branches you want to look at; in the tips of the tree, an isolated branch, a subtree, or some more complicated setup. Best Simon |
Title: Re: Review methods for dN and dS on trees Post by Simon on Sep 7th, 2006 at 2:46pm
Dear Sarah,
It's probably a reasonable approximation to estimate the nucleotide biases using a nucleotide mode (NucModelCompare) rather than the codon model (CodonModelCompare), which will be much faster. I'd be interested to hear if they came up with different answers for the best fitting nucleotide model. Rather than estimate confidence intervals using bootstrap, I would use profile likelihood confidence intervals, which is going to be much faster. If you know which branch(es) to look at, try using TestBranchDNDS, which will allow you to specify dual rate variation (using two independent beta-gamma distributions), a custom nucleotide bias, and to specify one or more branches, although I would start assuming no site-to-site rate variation first, if that is just a nuisance term. Following the analysis, you can then plot dN and dS trees (Analyses>Results>Syn and nonsyn trees), obtain profile likelihood confidence intervals (under Analyses>Results>Variance Estimates) etc. Best Simon |
Title: Re: Review methods for dN and dS on trees Post by Sarah on Sep 7th, 2006 at 4:44pm
Hi, Art & Simon,
First, your GA approach is impressive and ideally what I would do, but I don't have the skills/time at the moment. If these preliminary analyses are suggestive, I would like to pursue it. (I assume it's not implemented anywhere in HyPhy.) TestBranchDNDS.bf could be bad here: I don't want to specify branches a priori because I expect a lot of variation in selection across the tree. I was hoping that dNdSBivariateRateAnalysis.bf (or the last model in dNdSRateAnalysis.bf) would provide a way around this problem. Lineage Dual is obviously not as nice as pulling from discrete categories of omegas, but it's a start... though I'm not sure if it's what I'm running now... or if anything's running... or exactly what form the output will take. It would definitely be good to run independent branch and site models and compare them (I suspect I'll need both branch and site variation). Will later try the free ratio model in PAML. I'll test NucModelCompare.bf and CodonModelCompare.bf once dNdSBivariateRateAnalysis.bf has finished. Thanks again for your help; I wish I weren't so new to this. Rather than peppering you with more questions, I will focus on understanding the batch code and linking the literature to the options in the software. Thanks again, Sarah |
Title: Re: Review methods for dN and dS on trees Post by Simon on Sep 7th, 2006 at 5:02pm
Dear Sarah,
The GA branch analyses are available for download separately Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to . This approach is only feasible for a small number of sequences (I'd say 20 or less, in which case you could run the analyses on our cluster - it's very easy), and I'd use no site-to-site rate variation to begin with. Best Simon |
Title: Re: Review methods for dN and dS on trees Post by Simon on Sep 7th, 2006 at 5:27pm
Dear Sarah,
One more thing; you can fit models allowing different dN/dS classes for each branch in HyPhy by specifying 'local' models rather than 'global' models. This is easy to do in the graphical user interface - if the model you want isn't there, you can make your own up (see Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to ). You can fit a 'local' model, with separate dN/dS categories per branch in much the same way as described in one of Sergei's excellent tutorials (Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to ). Have fun learning about HyPhy! Simon |
Title: Re: Review methods for dN and dS on trees Post by Sarah on Sep 13th, 2006 at 12:43pm
Thanks again for all the help. I've been doing a lot of rereading and my earlier confusion is slowly resolving.
Is the GA then your only "branch-site" model, i.e., model that infers which codons have been selected on particular branches? Is there a reference for dNdSBivariateRateAnalysis.bf? I've a single Athlon processor chugging away on CodonModelCompare.bf. I'll let you know the results when they're in. Sarah p.s. As an aside, I started a 13 sequence, 984 bp job running on your GA server on Friday. Between Friday and Tuesday, I never got beyond "This page will update every 15 minutes until your program starts running" page, even though it was clearly running when I checked the job queue, and I refreshed my browser's cache regularly. Today the same page yields "Not found" and the job is no longer running. Did I lose the results? (Did they ever exist?) |
Title: Re: Review methods for dN and dS on trees Post by Sergei on Sep 13th, 2006 at 1:12pm
Dear Sarah,
GA-Branch is not a branch-site method, it looks for signatures of alignment-wide selection at the level of a single branch. You can include site-by-site rate variation as well, but it will be independent of the branch-by-branch rate variation (i.e. a 'slow' site will be slow in all branches of the tree). I am not sure why your analysis died - would you please try it again and let me know if the problem persists? I did implement a PAML style Branch-Site model upon request (search the message boards way back and you should find the reference), in case you want to try that. There is no reference to dNdSBivariateRateAnalysis.bf because it is an unpublished (yet) method - it is very similar in spirit to dNdSRateAnalysis.bf (which is described in the paper we wrote with Spencer Muse in MBE), except that dN and dS are no longer assumed to come from independent distributions, but rather from a general discrete bivariate distribution. The key difference is that dN and dS can (and will in general) co-vary in the dNdSBivariateRateAnalysis.bf analysis. Another thing about this analysis is that selection strength does not vary along the tree. Based on what you want to do (compare alignment-wide dN/dS along branches between different data sets), a GA-branch style approach seems ideal. It will (a). Free you from having to assign branches to rate classes a priori (b). Avoid model overfitting as the free ratio model will almost certainly do (hence leading to very large variances in parameter estimates) (c). Provide natural confidence intervals for dN/dS over branches averaged over models. Let me know if I can be of further assistance. Sergei |
Title: Re: Review methods for dN and dS on trees Post by Sarah on Sep 13th, 2006 at 5:13pm
Dear Sergei,
I just ran a GA job on the same sequences without problems. The program and output formats are great! I'm looking into running the program on a local cluster so I can see the effects of site-by-site variation and different topologies. Argh, you're (obviously) right about the GA model not being branch-site--don't know what I was thinking. It seems that one could, in theory, run FEL on the same tree used for GA, and map codon substitutions to the tree to see on which branches selection may have occurred at particular codons--but this clearly does not yield the same precision claimed by the branch-site model. That said, I am wary of false positives in the branch-site model. Thanks, Sarah |
Title: Re: Review methods for dN and dS on trees Post by Sergei on Sep 14th, 2006 at 12:14pm
Dear Sarah,
Let me know if you run into problems running jobs locally. Indeed, one could use FEL on the same tree to look for codons under selection post-hoc (i.e. use the GA to find putatively selected branches, then use FEL to test for selection on those branches), but there are some statistical issues of bias here, whereby one uses the same data to first formulate a hypothesis (i.e. find branches under selection) and then test for evidence for/against it using the same data. It's permissible for exploratory data analysis, though. Alternatively, you could use SLAC/FEL to infer substitution histories for each codon, and then correlate them with the branches under selection globally. However, this is also mostly an exploratory analysis, without much statistical rigor. Generally speaking, there is not very much power in any branch-site type method, even if the 'foreground' branch is specified a priori (as shown by poor power in simulation studies by Yang et al in their 2005 MBE paper). Also, there is a problem of model mis-specification (i.e. postulating which branches are under selection). Cheers, Sergei |
Title: Re: Review methods for dN and dS on trees Post by Sarah on Sep 17th, 2006 at 1:20pm
I don't have administrative privileges on the cluster where we're trying to install the GA batch file, but so far the administrators haven't run into any preliminary compilation problems. In the meantime, I've tried to run a few more analyses on your server. The queue has remained the same since Friday, and one analysis (started Thursday, #7627?) seems stuck on generation 100--the progress page hasn't changed. Am I using too many system resources? Again, I plan to run locally soon.
I've submitted overlapping sequence files in the GA analysis, e.g. A, A+B, B, B+C, etc. I've noticed that a few of the branches get quite different results between the single-group analysis (A) to the paired analysis (A+B). I was hoping to explore this problem when running the GA routine locally first by increasing the number of categories. Is that possible? If not, could I constrain a few? It seems like it would also help to constrain analysis to a specific nucleotide model and provide consistent topologies, rather than relying on NJ. Does this seem reasonable? Sarah p.s. The FEL would be exploratory only; I plan to use the percentiles generated by the GA to infer statistical support for dN>dS on particular branches (I do have a hypothesis which ones will be selected). That looks like the approach you took in your 2005 MBE paper. |
Title: Re: Review methods for dN and dS on trees Post by Sergei on Sep 17th, 2006 at 1:45pm
Dear Sarah,
One of the nodes on our cluster seems to have developed hardware problems - thus the hung jobs. I restarted the queue just now, having taken the problem node out of the pool; we'll see how that goes. You can indeed increase the number of categories when running the analysis locally, and also tighten the convergence criterion (e.g. not 30 generations with no c-AIC improvement, but 50 or 100). Using a consistent tree would be a good idea, because lineage specific analyses could be influenced quite a bit by the changes in the topology. I could take a look at the specific outputs which generate different results and see if anything else stands out; it's a bit difficult to speak in generalities and be helpful here:) Cheers, Sergei |
Title: Re: Review methods for dN and dS on trees Post by Sarah on Nov 13th, 2006 at 10:07am wrote on Sep 7th, 2006 at 2:46pm:
Just wanted to say that CodonModelCompare found three good, statistically indistinguishable models, and NucModelCompare (after two months on two processors) settled on the most likely of those three. Sarah |
Title: Re: Review methods for dN and dS on trees Post by Sergei on Nov 13th, 2006 at 10:16am
Dear Sarah,
Two months? Wow! Did you use the branch length approximation heuristic? It speeds things up by a factor of 50x or so, and almost always gets the same results. Cheers, Sergei P.S. Also, check out Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to |
Title: Re: Review methods for dN and dS on trees Post by Sarah on Nov 13th, 2006 at 11:33am
My mistake--it was a single processor (Athlon XP2000+ I think). I used the branch length approximation.
I'd love to read that chapter. Sarah |
HyPhy message board » Powered by YaBB 2.5.2! YaBB Forum Software © 2000-2024. All Rights Reserved. |