HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> constraining branch lengths to observed values
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1127479438

Message started by jamescotton on Sep 23rd, 2005 at 5:43am

Title: constraining branch lengths to observed values
Post by jamescotton on Sep 23rd, 2005 at 5:43am
How do I tell HYPHY to optimise a likelihood function but treat the branch lengths as observed values rather than as variables to optimise - I want to give hyphy a tree with branch lengths (expected substitutions per site) and then estimate the substitution model parameters under a variety of different models (upto a "local" GRM, if possible).

If possible, I then want to go on the simulate data under the estimated model on the observed tree+br lens.

Sorry I can't figure it out!

James

Title: Re: constraining branch lengths to observed values
Post by Sergei on Sep 25th, 2005 at 9:32pm
Dear James,


wrote on Sep 23rd, 2005 at 5:43am:
I want to give hyphy a tree with branch lengths (expected substitutions per site) and then estimate the substitution model parameters under a variety of different models (upto a "local" GRM, if possible).


You may find this message useful: Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login.

Because of possible model variety, HyPhy doesn't intrisically know how to map E[subs] to model parameters; for most models it's easy to work the relationship out and map branch lengths to model parameters. Let me know if the above posting is insufficient and I'll be happy to provide more info. I'll put together a little example for a local GRM model and add a link when it's done.


wrote on Sep 23rd, 2005 at 5:43am:


If possible, I then want to go on the simulate data under the estimated model on the observed tree+br lens.


That part is easy; just use SimulateDataSet (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) once you have fitted a likelihood function, e.g (assuming 'lf' is the likelihood function ID):

[code]
for (i=0; i<100; i=i+1)
{
    DataSet simmedData = SimulateDataSet (lf);
    DataSetFilter simDF = CreateFilter (simmedData,1);
    myFilePath = "some path/sim_"+i+".seq";
    fprintf (myFilePath,CLEAR_FILE,simDF);
}
[/code]

Cheers,
Sergei

Title: Re: constraining branch lengths to observed values
Post by jamescotton on Sep 27th, 2005 at 9:22am
Hi Sergei - thanks for the previous suggestions.. I've made a little bit of progress with understanding HyPhy, but not enough, i'm afraid.. I've got this far, and have a problem in that the branch lengths stored in the tree object don't seem to be updated.. I get output like this:


Quote:
branch scaling factor computed to be 0.161777

BLmod=0.044344
Branch Pe_occidentalis.
      Length before scaling:    0.00717
      Length after scaling :    0.00717
BLmod=0.077582
Branch Pe_conspicillatus.
      Length before scaling:    0.01255
      Length after scaling :    0.01255
BLmod=0.100006
Branch M_serrator.
      Length before scaling:    0.01618
      Length after scaling :    0.01618
BLmod=0.0441
Branch S_sula.
      Length before scaling:    0.00713
      Length after scaling :    0.00713


and so on for every branch - the BLmod values are correct, but don't seem to store in the tree properly. The same happens if i use '=' rather than ':=' for the assignment in the first "executecommands" statement in the loop. I don't understand why this is happening!.. The second problem is that all the branch lengths are estimated to be the same length in the output..


Quote:
global AC=0.2932585265898854;
global AT=0.1951842034375856;
global CG=0.07364278346962116;
global CT=1.274516987700602;
global GT=0.03398007119957082;
k=0.02;
scalingB=0.05617924196392339;
L_1.Pe_occidentalis.t:=branchLengthsNow[k]/scalingB;
L_1.Pe_conspicillatus.t:=branchLengthsNow[k]/scalingB;
L_1.M_serrator.t:=branchLengthsNow[k]/scalingB;
L_1.S_sula.t:=branchLengthsNow[k]/scalingB;
L_1.S_dactylatra.t:=branchLengthsNow[k]/scalingB;
L_1.Node7.t:=branchLengthsNow[k]/scalingB;
L_1.Node5.t:=branchLengthsNow[k]/scalingB;
L_1.A_anhinga.t:=branchLengthsNow[k]/scalingB;
L_1.A_novaehollandiae.t:=branchLengthsNow[k]/scalingB;
L_1.A_rufa.t:=branchLengthsNow[k]/scalingB;
L_1.Node12.t:=branchLengthsNow[k]/scalingB;
L_1.Node10.t:=branchLengthsNow[k]/scalingB;
L_1.Node4.t:=branchLengthsNow[k]/scalingB;
L_1.P_carbo.t:=branchLengthsNow[k]/scalingB;
L_1.P_varius.t:=branchLengthsNow[k]/scalingB;
L_1.Node16.t:=branchLengthsNow[k]/scalingB;
L_1.P_melanoleucos.t:=branchLengthsNow[k]/scalingB;
L_1.Node15.t:=branchLengthsNow[k]/scalingB;
L_1.Node3.t:=branchLengthsNow[k]/scalingB;
Tree L_1=(Pe_occidentalis:0.0469995,Pe_conspicillatus:0.0469995,(((M_serrator:0.0469995,(S_sula:0.0469995,S_dactylatra:0.0469995)Node7:0.0469995)Node5:0.0469995,(A_anhinga:0.0469995,(A_novaehollandiae:0.0469995,A_rufa:0.0469995)Node12:0.0469995)Node10:0.0469995)Node4:0.0469995,((P_carbo:0.0469995,P_varius:0.0469995)Node16:0.0469995,P_melanoleucos:0.0469995)Node15:0.0469995)Node3:0.0469995);


I *think* I understand this - i'm constraining each branch length to the value of branchLengthsNow[k]/scalingB, and the program tries to optimise this parameter (scalingB) and so optimises all the branch lengths to be the same.. do I need to somehow "de-reference" these values to say "constrain the branch length to the current value of branchLengthNow[h]/scalingB rather than to be the value of these variables that can then be optimised? i've no idea how to do this, or even if this makes sense in HyPhy..

My program is pasted below - sorry for such a long post, but i'm finding it a rather steep learning curve just now!

Thanks,
James

Quote:
DataSet ds = ReadDataFile ("Darterfile.nex");
DataSetFilter filteredData = CreateFilter (ds,1);

ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES = 1;

global AC = 0.1;
global AT = 0.1;
global CG = 0.1;
global CT = 0.1;
global GT = 0.3;

t    = 1;

gtrmatrix = [*,AC*t,t,AT*t}{AC*t,*,CG*t,CT*t}{t,CG*t,*,GT*t}{AT*t,CT*t,GT*t,*];
HarvestFrequencies (vectorOfFrequencies,filteredData,1,1,1);


Model GTRmodel = (gtrmatrix,vectorOfFrequencies, 1); /* NOT SURE IF THIS SHOULD BE 1 or 0 FOR THE LAST PARAMETER */
Tree L_1 = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhinga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079408):0.026938,((P_carbo:0.033352,P_varius:0.037379):0.034797,P_melanoleucos:0.101265):0.131502):0.223011);";

scalingB = computeScalingFactorB (gtrmatrix, vectorOfFrequencies);
fprintf (stdout, "\nBranch scaling factor computed to be ", scalingB,"\n\n");


branchNames  = BranchName  (L_1,-1);
branchLengthsNow = BranchLength (L_1,-1);

for (k = 0; k < Columns (branchNames)-1; k=k+1)
{
     ExecuteCommands ("L_1."+branchNames[k]+".t:=branchLengthsNow[k]/scalingB;");
     fprintf (stdout,"BLmod=",branchLengthsNow[k]/scalingB,"\n");
     fprintf (stdout, "Branch ", branchNames[k],".\n\t Length before scaling: ", Format (branchLengthsNow[k],10,5),  
             "\n\t Length after scaling : ", Format (BranchLength (L_1,k),10,5), "\n");
}


LikelihoodFunction lf = (filteredData,L_1);

Optimize (res,lf);
LIKELIHOOD_FUNCTION_OUTPUT=5;
fprintf (stdout, "\n",lf);

function computeScalingFactorB (rateMatrix, baseFreqs)
{
B = 0;
for (n1 = 0; n1 < 4; n1 = n1+1)
{
 for (n2 = 0; n2 < 4; n2 = n2+1)
 {
  if (n2!=n1)
  {
   B = B + baseFreqs[n1]*baseFreqs[n2]*rateMatrix[n1][n2];
  }
 }
}
return B;
}

Title: Re: constraining branch lengths to observed values
Post by Sergei on Sep 27th, 2005 at 10:25am
Dear James,

Sorry about the learning curve, but thanks for your perseverance. Take a look at the example below (which I haven't tested, since I have no data for your tree). It should help clarify some things.


Code (]

DataSet ds                 = ReadDataFile ("Darterfile.nex");
DataSetFilter filteredData = CreateFilter (ds,1);
 
ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES   = 1;

/* define a dummy topology to read in branch lengths */

UseModel (USE_NO_MODEL);
Topology dummy = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhin ga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079408):0.026938,((P_carbo:0.033352,P_varius:0.037379):0.034797,P_melanoleu cos:0.101265):0.131502):0.223011);";
/* make sure there are no spaces in branch lengths which could come from line wrapping*/

branchNames          = BranchName   (dummy,-1); /* populate a vector of branch names; post order traversal */
branchLengthsNow  = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */

 
/* this defines a global GRT model */

global AC = 0.1;
global AT = 0.1;
global CG = 0.1;
global CT = 0.1;
global GT = 0.3;
 
t            = 1;
 
gtrmatrix = [*,AC*t,t,AT*t}{AC*t,*,CG*t,CT*t}{t,CG*t,*,GT*t}{AT*t,CT*t,GT*t,*];
HarvestFrequencies (vectorOfFrequencies,filteredData,1,1,1);
 
 
Model GTRmodel = (gtrmatrix,vectorOfFrequencies, 1);
/* Last parameter of 1 is correct; HyPhy will automaticall mutliply all rate entries by the appropriate nucleotide frequency */

Tree L_1 = ""+dummy; /* set L1 to the same string as the dummy topology; ""+ forces the conversion to string type*/

/* now we compute the explicit scaling expression for the branch length of the form BranchLength = t * factor, where factor depends on AC,AT,CG,CT,GT and base frequencies */
/* since it will depend on some parameters which will be optimized, the assignment is used */

global        tToBranchLengthFactor := 2*(vectorOfFrequencies[0):

*vectorOfFrequencies[1]*AC+vectorOfFrequencies[0]*vectorOfFrequencies[2]+
                                                       vectorOfFrequencies[0]*vectorOfFrequencies[3]*AT+vectorOfFrequencies[1]*vectorOfFrequencies[2]*CG+
                                                       vectorOfFrequencies[1]*vectorOfFrequencies[3]*CT+vectorOfFrequencies[2]*vectorOfFrequencies[3]*GT);
                                                       
                                                /* 2* is because the contribution of the (i,j) and the (j,i) rate to the branch length is the same for a reversible model */
                                               
/* now force all branch lengths to be equal to those read from the string by setting t := givenLength/tToBranchLengthFactor for every branch */

for (k = 0; k < Columns (branchNames)-1; k=k+1)
{
       ExecuteCommands ("L_1."+branchNames[k]+".t:=branchLengthsNow[k]/tToBranchLengthFactor;");
}  
 
LikelihoodFunction lf = (filteredData,L_1);
 
Optimize (res,lf);
LIKELIHOOD_FUNCTION_OUTPUT=5;
fprintf (stdout, "\n",lf);


Title: Re: constraining branch lengths to observed values
Post by jamescotton on Sep 27th, 2005 at 10:37am
Wow! thanks for the quick reply..
The trick of turning the value into a string is what I was missing, I think, and of storing the branch lengths elsewhere first...
but it doesn't work: it seems to be trying to estimate k, or something!


Quote:
global AC=0.06324555320336758;
global AT=0.06324555320336758;
global CG=0.06324555320336758;
global CT=0.06324555320336758;
global GT=0.06324555320336758;
k=0.02;
L_1.Pe_occidentalis.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Pe_conspicillatus.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.M_serrator.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.S_sula.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.S_dactylatra.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node7.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node5.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.A_anhinga.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.A_novaehollandiae.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.A_rufa.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node12.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node4.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.P_carbo.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.P_varius.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node16.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.P_melanoleucos.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node10.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node15.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node3.t:=branchLengthsNow[k]/tToBranchLengthFactor;
global tToBranchLengthFactor:=2*(vectorOfFrequencies[0]*vectorOfFrequencies[1]*AC+vectorOfFrequencies[0]*vectorOfFrequencies[2]+vectorOfFrequencies[0]*vectorOfFrequencies[3]*AT+vectorOfFrequencies[1]*vectorOfFrequencies[2]*CG+vectorOfFrequencies[1]*vectorOfFrequencies[3]*CT+vectorOfFrequencies[2]*vectorOfFrequencies[3]*GT);
Tree L_1=(Pe_occidentalis:0,Pe_conspicillatus:0,(((M_serrator:0,(S_sula:0,S_dactylatra:0)Node7:0)Node5:0,(A_anhinga:0,(A_novaehollandiae:0,A_rufa:0)Node12:0)Node10:0)Node4:0,((P_carbo:0,P_varius:0)Node16:0,P_melanoleucos:0)Node15:0)Node3:0);

Title: Re: constraining branch lengths to observed values
Post by Sergei on Sep 27th, 2005 at 11:07am
Dear James,


wrote on Sep 27th, 2005 at 10:37am:
it seems to be trying to estimate k, or something!


Of course (slaps self on head)!

HyPhy will scan variable dependencies before optimizing the likelihood function and find 'k' to be an independent parameter. To fix this, use the following ExecuteCommands fix (this replaces 'k' with its actual numeric value).

Change:


Code (]
ExecuteCommands ("L_1."+branchNames[k):

+".t:=branchLengthsNow[k]/tToBranchLengthFactor;");


to


Code (]
ExecuteCommands ("L_1."+branchNames[k):

+".t:="+branchLengthsNow[k]+"/tToBranchLengthFactor;");


Cheers,
Sergei

Title: Re: constraining branch lengths to observed values
Post by jamescotton on Sep 28th, 2005 at 3:13am
Hi - That now works, or at least it very nearly did - Just for future reference, I had to change


Code (]
Topology dummy = "blahblahblah";
[/code):



into

[code]
Tree dummy = "blahblahblah";


presumably, a "Topology" object doesn't store branch lengths?

Anyway - thanks for all your help - you've been really great - I'm going to try and code the same thing for some "local" models now, and think i'm slowly getting the hang of the HYPHY language..

i'll let you know how I get on.

Thanks again,
James

Title: Re: constraining branch lengths to observed values
Post by jamescotton on Sep 28th, 2005 at 5:55am
Sorry to keep bothering you Sergei - another quick question...

Is there an easy way to change a variable from "global" to "local" before re-optimising a likelihood function, to enable easy comparing of likelihoods under two models, or would I need to do this by constraining parameters to be identical at every branch?

thanks,
J

Title: Re: constraining branch lengths to observed values
Post by jamescotton on Sep 28th, 2005 at 6:19am
Hi Sergei - i've now got HyPhy successfully doing parameter estimates under local and global GTR models and estimating base frequencies under ML with specified branch lengths, so i'm nearly there! - Thanks for all your help.

I do have another quick question - what does this do?

[code]
EMBED_FREQUENCY_DEPENDENCE = 1;
[/code]

I've used it in my estimating base freqs, because it was in the example batch file I cannibalised, but i've no idea what it does - it isn't in the command ref. you made for me!

James

Title: Re: constraining branch lengths to observed values
Post by Sergei on Sep 28th, 2005 at 7:45am
Dear James,


wrote on Sep 28th, 2005 at 6:19am:
I do have another quick question - what does this do?

[code]
EMBED_FREQUENCY_DEPENDENCE = 1;
[/code]


Good question (especially since I don't remember myself what it does). Let me look at the sources at get back to you when I figure it out (I'll also add the entry to documentation).

Sergei

Title: Re: constraining branch lengths to observed values
Post by jamescotton on Sep 28th, 2005 at 9:16am
hi Sergei - another problem! When I try and fit a local GTR model with constrained branch lengths, where all of the base composition and substitution rate parameters are "local", I get a non-sensical (+ve) log likelihood..

I have a feeling this might be because i'm overfitting the model (would having local composition and matrix parameters makes some of them non-identifiable?.. i'm not sure). Of course, there could be a problem with my code. I've fitted the same model but with global base composition parameters without any problem, and get a sensible LogL.

I've pasted some of the output  below, and my code in the next message as otherwise (it's too long for the forum!)

J


Quote:
e
(LOTS OF STUFF REMOVED)
L_1.Node15.sum:=L_1.Node15.estPiT+L_1.Node15.estPiT+L_1.Node15.estPiT+L_1.Node15.estPiT;
L_1.Node15.estPiT:=Abs(1-L_1.Node15.estPiA-L_1.Node15.estPiA-L_1.Node15.estPiA);
L_1.Node3.t:=0.223011/tToBranchLengthFactor;
L_1.Node3.sum:=L_1.Node3.estPiT+L_1.Node3.estPiT+L_1.Node3.estPiT+L_1.Node3.estPiT;
L_1.Node3.estPiT:=Abs(1-L_1.Node3.estPiA-L_1.Node3.estPiA-L_1.Node3.estPiA);
Tree L_1=(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_sula:0.0441,S_dactylatra:0.076275)Node7:0.103486)Node5:0.03702,(A_anhinga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.0681)Node12:0.071018)Node10:0.079408)Node4:0.026938,((P_carbo:0.033352,P_varius:0.037379)Node16:0.034797,P_melanoleucos:0.101265)Node15:0.131502)Node3:0.223011);
Log Likelihood = 1368.54748043482;
Tree L_1=(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_sula:0.0441,S_dactylatra:0.076275)Node7:0)Node5:0,(A_anhinga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.0681)Node12:0.071018)Node10:0.079408)Node4:0.026938,((P_carbo:0.033352,P_varius:0.037379)Node16:0,P_melanoleucos:0)Node15:0.131502)Node3:0.223011);I found Lk=1368.55, with 162 params of which 0 are global params
I found Lk=1368.55, with 162 params of which 0 are global params



Title: Re: constraining branch lengths to observed values
Post by jamescotton on Sep 28th, 2005 at 9:16am

This is the batch file..


Code (]
/*
global estPiA;
global estPiC;
global estPiG;
global estPiT;
global sum;
*/
/*estPiA:<1;
estPiC:<1;
estPiG:<1;
estPiT:=Abs(1-estPiA-estPiC-estPiG);
sum := estPiA+estPiC+estPiG+estPiT;
*/
EMBED_FREQUENCY_DEPENDENCE = 1;

DataSet ds       = ReadDataFile ("Darterfile.nex");  
DataSetFilter filteredData = CreateFilter (ds,1);  

ACCEPT_BRANCH_LENGTHS = 1;  
ACCEPT_ROOTED_TREES   = 1;  



/* define a dummy topology to read in branch lengths */


UseModel (USE_NO_MODEL);
Tree dummy = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhinga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079408):0.026938,((P_carbo:0.033352,P_varius:0.037379):0.034797,P_melanoleucos:0.101265):0.131502):0.223011);";  
/* make sure there are no spaces in branch lengths which could come from line wrapping*/  

branchNames     = BranchName   (dummy,-1); /* populate a vector of branch names; post order traversal */
branchLengthsNow  = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */

for (i=0; i < Columns(branchNames)-1; i = i+1)
{
           fprintf (stdout,branchNames[i):

,"=",branchLengthsNow[i],"\n");
}
 
/* this defines a global GRT model */  
/*
global AC = 0.1;  
global AT = 0.1;  
global CG = 0.1;  
global CT = 0.1;  
global GT = 0.3;  
*/
t       = 1;  

gtrmatrix = [*,AC*t,t,AT*t}{AC*t,*,CG*t,CT*t}{t,CG*t,*,GT*t}{AT*t,CT*t,GT*t,*];  
/*HarvestFrequencies (vectorOfFrequencies,filteredData,1,1,1);  */
vectorOfFrequencies = [estPiA/sum},{estPiC/sum},{estPiG/sum},{estPiT/sum];
 
 
Model GTRmodel = (gtrmatrix,vectorOfFrequencies, 1);  
/* Last parameter of 1 is correct; HyPhy will automaticall mutliply all rate entries by the appropriate nucleotide frequency */  

Tree L_1 = ""+dummy; /* set L1 to the same string as the dummy topology; ""+ forces the conversion to string type*/

/* now we compute the explicit scaling expression for the branch length of the form BranchLength = t * factor, where factor depends on AC,AT,CG,CT,GT and base frequencies */
/* since it will depend on some parameters which will be optimized, the assignment is used */

tToBranchLengthFactor := 2*(vectorOfFrequencies[0]*vectorOfFrequencies[1]*AC+vectorOfFrequencies[0]*vectorOfFrequencies[2]+
          vectorOfFrequencies[0]*vectorOfFrequencies[3]*AT+vectorOfFrequencies[1]*vectorOfFrequencies[2]*CG+
          vectorOfFrequencies[1]*vectorOfFrequencies[3]*CT+vectorOfFrequencies[2]*vectorOfFrequencies[3]*GT);
         
        /* 2* is because the contribution of the (i,j) and the (j,i) rate to the branch length is the same for a reversible model */
       
/* now force all branch lengths to be equal to those read from the string by setting t := givenLength/tToBranchLengthFactor for every branch */

/*DO ALL THE CONSTRAINTS*/
ReplicateConstraint ("this1.?.estPiA<1", L_1);
ReplicateConstraint ("this1.?.estPiC<1", L_1);
ReplicateConstraint ("this1.?.estPiG<1", L_1);
ReplicateConstraint ("this1.?.estPiT:=Abs(1-this2.?.estPiA-this2.?.estPiC-this2.?.estPiG)", L_1, L_1);
ReplicateConstraint ("this1.?.sum:=this2.?.estPiT+this2.?.estPiA+this2.?.estPiC+this2.?.estPiG", L_1, L_1);

for (k = 0; k < Columns (branchNames)-1; k=k+1)  
{  
  ExecuteCommands ("L_1."+branchNames[k]+".t:="+branchLengthsNow[k]+"/tToBranchLengthFactor;");
 
}  


LikelihoodFunction lf = (filteredData,L_1);  

Optimize (res,lf);  
LIKELIHOOD_FUNCTION_OUTPUT=5;  
fprintf (stdout, "\n",lf);
LIKELIHOOD_FUNCTION_OUTPUT=2;
fprintf (stdout, "\n",lf);

fprintf(stdout, "I found Lk=",res[1][0],", with ",res[1][1], " params of which ",res[1][2]," are global params\n");

LFCompute (lf,LF_START_COMPUTE);
LFCompute (lf,res2);
LFCompute (lf,LF_DONE_COMPUTE);

fprintf(stdout, "I found Lk=",res[1][0],", with ",res[1][1], " params of which ",res[1][2]," are global params\n");

/*
HarvestFrequencies (obsFreq,filteredData,1,1,1);

fprintf (stdout, "\nEstimated Frequencies:", vectorOfFrequencies, "\nObserved Frequencies", obsFreq);
*/

Title: Re: constraining branch lengths to observed values
Post by Sergei on Sep 28th, 2005 at 11:13am
Dear James,

The local case is a bit trickier, because one has to be very careful about which parameters must remain global (base frequencies for instance) and define a branch specific branch length constraint.

Give this a try: (may contain bugs, since I haven't tried to run it - missing the data file).


Code (]
/* frequency parameters must be global (shared by tree branches), otherwise we don't really know
character states weights at the root and can't compute proper likelihood;
By not declaring frequency parameters as globals, you are liable to get nonsensical results
*/

/* also, notice the trick to define four [0,1):

parameters which sum to 1 without scaling,
but rather with 3 auxiliary variables
ReplicateConstraint doesn't work on :< constraints; those need to be set explicitly
*/

global aux1 = 0.5;
aux1 :< 1;
global aux2 = 0.33;
aux2 :< 1;
global aux3 = 0.25;
aux3 :< 1;

global estPiA := aux1;
global estPiC := (1-aux1)*aux2;
global estPiG := (1-aux1)*(1-aux2)*aux3
global estPiT := (1-aux1)*(1-aux2)*(1-aux3);


  DataSet ds  = ReadDataFile ("Darterfile.nex");  
DataSetFilter filteredData = CreateFilter (ds,1);  
 
ACCEPT_BRANCH_LENGTHS = 1;  
ACCEPT_ROOTED_TREES   = 1;  
 
 
 
/* define a dummy topology to read in branch lengths */  
 
UseModel (USE_NO_MODEL);  
Tree dummy = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.10 0006,(S_sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhin ga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079 408):0.026938,((P_carbo:0.033352,P_varius:0.037379):0.034797,P_melanoleu cos:0.101265):0.131502):0.223011);";  
/* make sure there are no spaces in branch lengths which could come from line wrapping*/  
 
branchNames       = BranchName   (dummy,-1); /* populate a vector of branch names; post order traversal */
branchLengthsNow  = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */
 
for (i=0; i < Columns(branchNames)-1; i = i+1)
{
       fprintf (stdout,branchNames[i],"=",branchLengthsNow[i],"\n");
}
   
t  = 1;  /* t will now serve as a scaling parameter to constraints branch lengths */
 
gtrmatrix = [*,AC*t,t,AT*t}
                  {AC*t,*,CG*t,CT*t}
                  {t,CG*t,*,GT*t}
                  {AT*t,CT*t,GT*t,*];  

vectorOfFrequencies = [estPiA},{estPiC},{estPiG},{estPiT];
   
   
Model GTRmodel = (gtrmatrix,vectorOfFrequencies, 1);  
 
Tree L_1 = ""+dummy; /* set L1 to the same string as the dummy topology; ""+ forces the conversion to string type*/
 
     
/* now force all branch lengths to be equal to those read from the string by setting t := givenLength/tToBranchLengthFactor for every branch;
because tToBranchLengthFactor now depends on local branch lengths we need to redefine the scaling factor for each branch */
 
 
for (k = 0; k < Columns (branchNames)-1; k=k+1)  
{  
  branchPrefix = "L_1."+branchNames[k]+".";
   ExecuteCommands (branchPrefix+".t:=0.5*"+branchLengthsNow[k]+"/("
                             + "estPiA*estPiC*"+branchPrefix+".AC+"
                             + "estPiA*estPiG+"
                             + "estPiA*estPiT*"+branchPrefix+".AT+"
                             + "estPiC*estPiG*"+branchPrefix+".CG+"
                             + "estPiC*estPiT*"+branchPrefix+".CT+"
                             + "estPiG*estPiT*"+branchPrefix+".GT)"
     );  
   
}  
 
LikelihoodFunction lf = (filteredData,L_1);  
 
Optimize (res,lf);  
LIKELIHOOD_FUNCTION_OUTPUT=5;  
fprintf (stdout, "\n",lf);  


Cheers,
Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.