Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
use user-defined branch lengths for analyses (Read 4289 times)
Seraina
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
use user-defined branch lengths for analyses
Apr 18th, 2009 at 2:15am
 
Hello,

I'm trying to use the branch lengths specified in the tree file for the estimation of site specific substitution rates while estimating the model parameters (actually, a GTR) from the data. How do I have to proceed to constrain the branch lengths before the likelihood optimization?
Thanks for your help!

Seraina
Back to top
 
 
IP Logged
 
wayne
Global Moderator
*****
Offline


I love YaBB 1G - SP1!

Posts: 57
San Diego, CA
Gender: male
Re: use user-defined branch lengths for analyses
Reply #1 - Apr 22nd, 2009 at 10:29am
 
Hi Seraina,

you can use the SiteRates2.bf batch file in ./TemplateBatchFiles/ which will implement a site by site rate estimation with other parameters shared across the tree and branch lengths constrained to that in your tree file.

cheers ./w
Back to top
 

Assistant Project Scientist&&Antiviral Research Center&&Department of Pathology&&University of California, San Diego
WWW WWW  
IP Logged
 
Seraina
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
Re: use user-defined branch lengths for analyses
Reply #2 - Apr 22nd, 2009 at 11:56pm
 
Hi Wayne,
thanks for your reply. Actually, I've already used (and modified for my purpose) the batch file SiteRates2.bf. However, when it comes to the step "Optimize (resNull,lf)", not only the model parameters, but also the branch lengths of my input tree are modified. So what I'm looking for is a way to constrain the optimization procedure to only estimate the parameters of the GTR model from the data, but not modifying the branch lengths. Is this possible?
Thanks for the help!
Seraina
Back to top
 
 
IP Logged
 
Seraina
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
Re: use user-defined branch lengths for analyses
Reply #3 - Apr 23rd, 2009 at 12:25am
 
Okay, maybe I've missed something there - I was actually only using the SiteRates.bf. Do I correctly understand that the line
"ReplicateConstraint ("this1.?.?:=this2.?.?__",givenTree,givenTree);" is constraining all the branch lengths of the tree to the actual values? And "this1.?.?" means all the branches of the tree (with this1.? being all the nodes?)?
Is there any reference file where I can find details about the class definitions in HyPhy?
Thanks again!
Seraina
Back to top
 
 
IP Logged
 
wayne
Global Moderator
*****
Offline


I love YaBB 1G - SP1!

Posts: 57
San Diego, CA
Gender: male
Re: use user-defined branch lengths for analyses
Reply #4 - Apr 23rd, 2009 at 4:34pm
 
Hi Seraina, I'll try answer this post and the one to the feedback board here. so yes there is a difference between SiteRates.bf and SiteRates2.bf.

SiteRates.bf will first fit a model of the entire alignment in which it estimates both branch length parameters and model parameters (AC, GC etc depending on your model). In the site wise model, the model parameters are constrained to those estimated across the alignment, and the branch lengths (per site) are set to someFactorX*alignmentWideBranchLength. So the site wise analysis returns a substitution rate at a site relative to the average across all sites. It does not fit independent model parameters for each site (AC, GT, etc are all shared across all sites).

SiteRates2.bf will do the same, but the BranchLengths in someFactorX*alignmentWideBranchLength are taken from file. This would be useful if you had some independent estimates of length of branches in units of time.

to answer your direct ?

ReplicateConstraint ("this1.?.?:=this2.?.?__",givenTree,givenTree)

in the SiteRates2.bf file will set the givenTree to exact values from the tree read from file. These will apply to all parameters associated with branches, but not with model parameters that are global across all branches.

There was a problem with the previous SiteRates2.bf in that not all the someFactorX's were being applied. I've attached an edit that will also go into the next release. If what I've described above is what you want  then use this code. But from your later posts I think you actually want independent model parameters (GTR) for each site.

hope this helps somewhat ./w
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (7 KB | )

Assistant Project Scientist&&Antiviral Research Center&&Department of Pathology&&University of California, San Diego
WWW WWW  
IP Logged
 
Seraina
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
Re: use user-defined branch lengths for analyses
Reply #5 - Apr 24th, 2009 at 5:20am
 
Hi Wayne,
thank you for your answers!
What I actually need is exactly what the program DNARates by Gary Olsen has done, which is to calculate site specific rates with reference to a "known" tree with fixed branch lengths. So the branch lengths should be global (and not be recalculated or scaled for each site). The rate should then be given with reference to the branch length in the original file, and not relative to any average over sites.
The rates estimated at each site would accordingly be the same no matter whether I use the whole alignment or only the first 20 sites.
Thanks for the help!
Seraina
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: use user-defined branch lengths for analyses
Reply #6 - Apr 24th, 2009 at 7:36am
 
Dear Seraina,

Back in the day, I had a very long discussion with someone else who wanted to do a similar thing: Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

There are two salient points to be carried over to your case

1). Branch lengths can mean two things: a priori measures (i.e. millions of years since divergence) or [what is usually meant for model fitting] genetic distance. The latter is a product of evolutionary time and evolutionary rate. Because for a single gene, the evolutionary time is presumably the same for every site, if you hold branch lengths (and here I take them to mean genetic distances) the same for every site, that means that the rate is the same for every site. 

In your input tree, what do the branch lengths mean?

2). The reason HyPhy gives you different estimates for 20 sites vs the total alignment is because it estimates model parameters (e.g. AT rate and the base frequencies) either from the first 20 sites, or the whole alignment -- giving different values. Those values will then influence the estimates of site-specific rates. The only way to remove this effect, is to specify the substitution model (base frequencies and substitution rates a priori). I am quite certain this is what DNARates used to do (with F84 model supplied by the user and base frequencies are set to 0.25 or prompted for).

SiteRates2.bf can be modified for this purpose (take a look at the files from the thread I quoted). Let me know if this helps.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Seraina
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
Re: use user-defined branch lengths for analyses
Reply #7 - Apr 24th, 2009 at 8:44am
 
Dear Sergei,

thanks a lot for your answer - this previous discussion with Francesc was exactly what I was looking for. I think I now more or less understand your approach, and it should really give similar results to DNARates. What HyPhy actually does better is to estimate a rate of zero for invariant positions - in contrast to DNARates.

Indeed, the branch lengths I have are in units of time - so a chronogram as in Francesc's case. I get very similar rate distributions in HyPhy and DNARates when first re-estimating branch lengths under the molecular clock constraint in HyPhy and then estimating the rates (with a modified SiteRates.bf). However, the rate estimates are rather different when using the SiteRates2.bf and the same branch lengths as I used for input in DNARates - I'll try to figure out why this is the case.

One last question: does the branch parameter constraint in the SiteRates2.bf really work? :

global rateX = 1;
ReplicateConstraint ("this1.?.?:=rateX*this2.?.?__",givenTree,givenTree);

Does this really freeze the branch length to the values at the moment (or only keeping them proportional with rateX, or even doing actually nothing)? Just wondering because in the example for constraints in the manual (a=3; b=2; c:=a+b; a=5; --> c=7), the constraint is for the parameters and not their current values.

Sincerely
Seraina

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: use user-defined branch lengths for analyses
Reply #8 - Apr 24th, 2009 at 9:12am
 
Dear Seraina,

Code:
x:=x__;
 



can be read as 'constrain x to its current value'. The '__' postfix is the evaluator operation that replaces the variable (x) with its current value before the formula is evaluated.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Seraina
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 7
Re: use user-defined branch lengths for analyses
Reply #9 - May 8th, 2009 at 8:17am
 
Dear Sergei (and others),

Looking at the SiteRates2.bf again, I found out what the problem was. Actually, the rates were output correctly on the screen (i.e. in substitutions per unit of time, with the unit of time as specified by the branch lengths in the input tree), but then there was some code at the bottom of the file which rescaled all the rates according to the average site rate of the alignment. The rates saved to file and output in the diagram were thus wrong (or let's say, not as one would expect with a dated tree).
Attached is thus a new batch file which can be used with user-specified branch lengths. Could it be included in the next release of HyPhy? Or do I just post it in the Contributions?

One last question still: The implementation is currently working with the 3 trees that you mentioned in your posts to Francesc:
- the input tree
- the expected tree (which is rateX*inputTree)
- the site tree (which is siteRate*expectedTree)
But then, you also use a parameter "scalingfactor" which is (total tree length of the expected tree) / (total tree length of input tree). In my understanding, this should be the same as the factor "rateX", but it attains only a quite similar value. Why do you have to make the apparent detour via the estimated branch lengths?

Finally, thanks a lot for all the help (also to Wayne and artpoon)!
Seraina
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (7 KB | )
 
IP Logged