HyPhy message board | |
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> constraining branch lengths to observed values http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1127479438 Message started by jamescotton on Sep 23rd, 2005 at 5:43am |
Title: constraining branch lengths to observed values Post by jamescotton on Sep 23rd, 2005 at 5:43am
How do I tell HYPHY to optimise a likelihood function but treat the branch lengths as observed values rather than as variables to optimise - I want to give hyphy a tree with branch lengths (expected substitutions per site) and then estimate the substitution model parameters under a variety of different models (upto a "local" GRM, if possible).
If possible, I then want to go on the simulate data under the estimated model on the observed tree+br lens. Sorry I can't figure it out! James |
Title: Re: constraining branch lengths to observed values Post by jamescotton on Sep 27th, 2005 at 9:22am
Hi Sergei - thanks for the previous suggestions.. I've made a little bit of progress with understanding HyPhy, but not enough, i'm afraid.. I've got this far, and have a problem in that the branch lengths stored in the tree object don't seem to be updated.. I get output like this:
Quote:
and so on for every branch - the BLmod values are correct, but don't seem to store in the tree properly. The same happens if i use '=' rather than ':=' for the assignment in the first "executecommands" statement in the loop. I don't understand why this is happening!.. The second problem is that all the branch lengths are estimated to be the same length in the output.. Quote:
I *think* I understand this - i'm constraining each branch length to the value of branchLengthsNow[k]/scalingB, and the program tries to optimise this parameter (scalingB) and so optimises all the branch lengths to be the same.. do I need to somehow "de-reference" these values to say "constrain the branch length to the current value of branchLengthNow[h]/scalingB rather than to be the value of these variables that can then be optimised? i've no idea how to do this, or even if this makes sense in HyPhy.. My program is pasted below - sorry for such a long post, but i'm finding it a rather steep learning curve just now! Thanks, James Quote:
|
Title: Re: constraining branch lengths to observed values Post by Sergei on Sep 27th, 2005 at 10:25am
Dear James,
Sorry about the learning curve, but thanks for your perseverance. Take a look at the example below (which I haven't tested, since I have no data for your tree). It should help clarify some things. Code (] DataSet ds = ReadDataFile ("Darterfile.nex"); DataSetFilter filteredData = CreateFilter (ds,1); ACCEPT_BRANCH_LENGTHS = 1; ACCEPT_ROOTED_TREES = 1; /* define a dummy topology to read in branch lengths */ UseModel (USE_NO_MODEL); Topology dummy = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhin ga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079408):0.026938,((P_carbo:0.033352,P_varius:0.037379):0.034797,P_melanoleu cos:0.101265):0.131502):0.223011);"; /* make sure there are no spaces in branch lengths which could come from line wrapping*/ branchNames = BranchName (dummy,-1); /* populate a vector of branch names; post order traversal */ branchLengthsNow = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */ /* this defines a global GRT model */ global AC = 0.1; global AT = 0.1; global CG = 0.1; global CT = 0.1; global GT = 0.3; t = 1; gtrmatrix = [*,AC*t,t,AT*t}{AC*t,*,CG*t,CT*t}{t,CG*t,*,GT*t}{AT*t,CT*t,GT*t,*]; HarvestFrequencies (vectorOfFrequencies,filteredData,1,1,1); Model GTRmodel = (gtrmatrix,vectorOfFrequencies, 1); /* Last parameter of 1 is correct; HyPhy will automaticall mutliply all rate entries by the appropriate nucleotide frequency */ Tree L_1 = ""+dummy; /* set L1 to the same string as the dummy topology; ""+ forces the conversion to string type*/ /* now we compute the explicit scaling expression for the branch length of the form BranchLength = t * factor, where factor depends on AC,AT,CG,CT,GT and base frequencies */ /* since it will depend on some parameters which will be optimized, the assignment is used */ global tToBranchLengthFactor := 2*(vectorOfFrequencies[0):
|
Title: Re: constraining branch lengths to observed values Post by jamescotton on Sep 27th, 2005 at 10:37am
Wow! thanks for the quick reply..
The trick of turning the value into a string is what I was missing, I think, and of storing the branch lengths elsewhere first... but it doesn't work: it seems to be trying to estimate k, or something! Quote:
|
Title: Re: constraining branch lengths to observed values Post by Sergei on Sep 27th, 2005 at 11:07am
Dear James,
wrote on Sep 27th, 2005 at 10:37am:
Of course (slaps self on head)! HyPhy will scan variable dependencies before optimizing the likelihood function and find 'k' to be an independent parameter. To fix this, use the following ExecuteCommands fix (this replaces 'k' with its actual numeric value). Change: Code (] ExecuteCommands ("L_1."+branchNames[k):
to Code (] ExecuteCommands ("L_1."+branchNames[k):
Cheers, Sergei |
Title: Re: constraining branch lengths to observed values Post by jamescotton on Sep 28th, 2005 at 3:13am
Hi - That now works, or at least it very nearly did - Just for future reference, I had to change
Code (] Topology dummy = "blahblahblah"; [/code):
presumably, a "Topology" object doesn't store branch lengths? Anyway - thanks for all your help - you've been really great - I'm going to try and code the same thing for some "local" models now, and think i'm slowly getting the hang of the HYPHY language.. i'll let you know how I get on. Thanks again, James |
Title: Re: constraining branch lengths to observed values Post by jamescotton on Sep 28th, 2005 at 5:55am
Sorry to keep bothering you Sergei - another quick question...
Is there an easy way to change a variable from "global" to "local" before re-optimising a likelihood function, to enable easy comparing of likelihoods under two models, or would I need to do this by constraining parameters to be identical at every branch? thanks, J |
Title: Re: constraining branch lengths to observed values Post by jamescotton on Sep 28th, 2005 at 6:19am
Hi Sergei - i've now got HyPhy successfully doing parameter estimates under local and global GTR models and estimating base frequencies under ML with specified branch lengths, so i'm nearly there! - Thanks for all your help.
I do have another quick question - what does this do? [code] EMBED_FREQUENCY_DEPENDENCE = 1; [/code] I've used it in my estimating base freqs, because it was in the example batch file I cannibalised, but i've no idea what it does - it isn't in the command ref. you made for me! James |
Title: Re: constraining branch lengths to observed values Post by Sergei on Sep 28th, 2005 at 7:45am
Dear James,
wrote on Sep 28th, 2005 at 6:19am:
Good question (especially since I don't remember myself what it does). Let me look at the sources at get back to you when I figure it out (I'll also add the entry to documentation). Sergei |
Title: Re: constraining branch lengths to observed values Post by jamescotton on Sep 28th, 2005 at 9:16am
hi Sergei - another problem! When I try and fit a local GTR model with constrained branch lengths, where all of the base composition and substitution rate parameters are "local", I get a non-sensical (+ve) log likelihood..
I have a feeling this might be because i'm overfitting the model (would having local composition and matrix parameters makes some of them non-identifiable?.. i'm not sure). Of course, there could be a problem with my code. I've fitted the same model but with global base composition parameters without any problem, and get a sensible LogL. I've pasted some of the output below, and my code in the next message as otherwise (it's too long for the forum!) J Quote:
|
Title: Re: constraining branch lengths to observed values Post by jamescotton on Sep 28th, 2005 at 9:16am This is the batch file.. Code (] /* global estPiA; global estPiC; global estPiG; global estPiT; global sum; */ /*estPiA:<1; estPiC:<1; estPiG:<1; estPiT:=Abs(1-estPiA-estPiC-estPiG); sum := estPiA+estPiC+estPiG+estPiT; */ EMBED_FREQUENCY_DEPENDENCE = 1; DataSet ds = ReadDataFile ("Darterfile.nex"); DataSetFilter filteredData = CreateFilter (ds,1); ACCEPT_BRANCH_LENGTHS = 1; ACCEPT_ROOTED_TREES = 1; /* define a dummy topology to read in branch lengths */ UseModel (USE_NO_MODEL); Tree dummy = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhinga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079408):0.026938,((P_carbo:0.033352,P_varius:0.037379):0.034797,P_melanoleucos:0.101265):0.131502):0.223011);"; /* make sure there are no spaces in branch lengths which could come from line wrapping*/ branchNames = BranchName (dummy,-1); /* populate a vector of branch names; post order traversal */ branchLengthsNow = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */ for (i=0; i < Columns(branchNames)-1; i = i+1) { fprintf (stdout,branchNames[i):
|
Title: Re: constraining branch lengths to observed values Post by Sergei on Sep 28th, 2005 at 11:13am
Dear James,
The local case is a bit trickier, because one has to be very careful about which parameters must remain global (base frequencies for instance) and define a branch specific branch length constraint. Give this a try: (may contain bugs, since I haven't tried to run it - missing the data file). Code (] /* frequency parameters must be global (shared by tree branches), otherwise we don't really know character states weights at the root and can't compute proper likelihood; By not declaring frequency parameters as globals, you are liable to get nonsensical results */ /* also, notice the trick to define four [0,1):
Cheers, Sergei |
HyPhy message board » Powered by YaBB 2.5.2! YaBB Forum Software © 2000-2024. All Rights Reserved. |