Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
constraining branch lengths to observed values (Read 7077 times)
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
constraining branch lengths to observed values
Sep 23rd, 2005 at 5:43am
 
How do I tell HYPHY to optimise a likelihood function but treat the branch lengths as observed values rather than as variables to optimise - I want to give hyphy a tree with branch lengths (expected substitutions per site) and then estimate the substitution model parameters under a variety of different models (upto a "local" GRM, if possible).

If possible, I then want to go on the simulate data under the estimated model on the observed tree+br lens.

Sorry I can't figure it out!

James
Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: constraining branch lengths to observed values
Reply #1 - Sep 25th, 2005 at 9:32pm
 
Dear James,

Quote:
I want to give hyphy a tree with branch lengths (expected substitutions per site) and then estimate the substitution model parameters under a variety of different models (upto a "local" GRM, if possible).


You may find this message useful: Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login.

Because of possible model variety, HyPhy doesn't intrisically know how to map E[subs] to model parameters; for most models it's easy to work the relationship out and map branch lengths to model parameters. Let me know if the above posting is insufficient and I'll be happy to provide more info. I'll put together a little example for a local GRM model and add a link when it's done.

Quote:


If possible, I then want to go on the simulate data under the estimated model on the observed tree+br lens.



That part is easy; just use SimulateDataSet (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) once you have fitted a likelihood function, e.g (assuming 'lf' is the likelihood function ID):

Code:
for (i=0; i<100; i=i+1)
{
     DataSet simmedData = SimulateDataSet (lf);
     DataSetFilter simDF = CreateFilter (simmedData,1);
     myFilePath = "some path/sim_"+i+".seq";
     fprintf (myFilePath,CLEAR_FILE,simDF);
}
 



Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining branch lengths to observed values
Reply #2 - Sep 27th, 2005 at 9:22am
 
Hi Sergei - thanks for the previous suggestions.. I've made a little bit of progress with understanding HyPhy, but not enough, i'm afraid.. I've got this far, and have a problem in that the branch lengths stored in the tree object don't seem to be updated.. I get output like this:

[quote]
branch scaling factor computed to be 0.161777

BLmod=0.044344
Branch Pe_occidentalis.
      Length before scaling:    0.00717
      Length after scaling :    0.00717
BLmod=0.077582
Branch Pe_conspicillatus.
      Length before scaling:    0.01255
      Length after scaling :    0.01255
BLmod=0.100006
Branch M_serrator.
      Length before scaling:    0.01618
      Length after scaling :    0.01618
BLmod=0.0441
Branch S_sula.
      Length before scaling:    0.00713
      Length after scaling :    0.00713
[/quote]

and so on for every branch - the BLmod values are correct, but don't seem to store in the tree properly. The same happens if i use '=' rather than ':=' for the assignment in the first "executecommands" statement in the loop. I don't understand why this is happening!.. The second problem is that all the branch lengths are estimated to be the same length in the output..

[quote]
global AC=0.2932585265898854;
global AT=0.1951842034375856;
global CG=0.07364278346962116;
global CT=1.274516987700602;
global GT=0.03398007119957082;
k=0.02;
scalingB=0.05617924196392339;
L_1.Pe_occidentalis.t:=branchLengthsNow[k]/scalingB;
L_1.Pe_conspicillatus.t:=branchLengthsNow[k]/scalingB;
L_1.M_serrator.t:=branchLengthsNow[k]/scalingB;
L_1.S_sula.t:=branchLengthsNow[k]/scalingB;
L_1.S_dactylatra.t:=branchLengthsNow[k]/scalingB;
L_1.Node7.t:=branchLengthsNow[k]/scalingB;
L_1.Node5.t:=branchLengthsNow[k]/scalingB;
L_1.A_anhinga.t:=branchLengthsNow[k]/scalingB;
L_1.A_novaehollandiae.t:=branchLengthsNow[k]/scalingB;
L_1.A_rufa.t:=branchLengthsNow[k]/scalingB;
L_1.Node12.t:=branchLengthsNow[k]/scalingB;
L_1.Node10.t:=branchLengthsNow[k]/scalingB;
L_1.Node4.t:=branchLengthsNow[k]/scalingB;
L_1.P_carbo.t:=branchLengthsNow[k]/scalingB;
L_1.P_varius.t:=branchLengthsNow[k]/scalingB;
L_1.Node16.t:=branchLengthsNow[k]/scalingB;
L_1.P_melanoleucos.t:=branchLengthsNow[k]/scalingB;
L_1.Node15.t:=branchLengthsNow[k]/scalingB;
L_1.Node3.t:=branchLengthsNow[k]/scalingB;
Tree L_1=(Pe_occidentalis:0.0469995,Pe_conspicillatus:0.0469995,(((M_serrator:0.04699
95,(S_sula:0.0469995,S_dactylatra:0.0469995)Node7:0.0469995)Node5:0.0469995,(A_a
nhinga:0.0469995,(A_novaehollandiae:0.0469995,A_rufa:0.0469995)Node12:0.0469995)
Node10:0.0469995)Node4:0.0469995,((P_carbo:0.0469995,P_varius:0.0469995)Node16:0
.0469995,P_melanoleucos:0.0469995)Node15:0.0469995)Node3:0.0469995);
[/quote]

I *think* I understand this - i'm constraining each branch length to the value of branchLengthsNow[k]/scalingB, and the program tries to optimise this parameter (scalingB) and so optimises all the branch lengths to be the same.. do I need to somehow "de-reference" these values to say "constrain the branch length to the current value of branchLengthNow[h]/scalingB rather than to be the value of these variables that can then be optimised? i've no idea how to do this, or even if this makes sense in HyPhy..

My program is pasted below - sorry for such a long post, but i'm finding it a rather steep learning curve just now!

Thanks,
James
[quote]
DataSet ds = ReadDataFile ("Darterfile.nex");
DataSetFilter filteredData = CreateFilter (ds,1);

ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES = 1;

global AC = 0.1;
global AT = 0.1;
global CG = 0.1;
global CT = 0.1;
global GT = 0.3;

t    = 1;

gtrmatrix = {{*,AC*t,t,AT*t}{AC*t,*,CG*t,CT*t}{t,CG*t,*,GT*t}{AT*t,CT*t,GT*t,*}};
HarvestFrequencies (vectorOfFrequencies,filteredData,1,1,1);


Model GTRmodel = (gtrmatrix,vectorOfFrequencies, 1); /* NOT SURE IF THIS SHOULD BE 1 or 0 FOR THE LAST PARAMETER */
Tree L_1 = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_
sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhinga:0.197727,(A_n
ovaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079408):0.026938,((P_carbo:
0.033352,P_varius:0.037379):0.034797,P_melanoleucos:0.101265):0.131502):0.223011
);";

scalingB = computeScalingFactorB (gtrmatrix, vectorOfFrequencies);
fprintf (stdout, "\nBranch scaling factor computed to be ", scalingB,"\n\n");


branchNames  = BranchName  (L_1,-1);
branchLengthsNow = BranchLength (L_1,-1);

for (k = 0; k < Columns (branchNames)-1; k=k+1)
{
     ExecuteCommands ("L_1."+branchNames[k]+".t:=branchLengthsNow[k]/scalingB;");
     fprintf (stdout,"BLmod=",branchLengthsNow[k]/scalingB,"\n");
     fprintf (stdout, "Branch ", branchNames[k],".\n\t Length before scaling: ", Format (branchLengthsNow[k],10,5), 
             "\n\t Length after scaling : ", Format (BranchLength (L_1,k),10,5), "\n");
}


LikelihoodFunction lf = (filteredData,L_1);

Optimize (res,lf);
LIKELIHOOD_FUNCTION_OUTPUT=5;
fprintf (stdout, "\n",lf);

function computeScalingFactorB (rateMatrix, baseFreqs)
{
B = 0;
for (n1 = 0; n1 < 4; n1 = n1+1)
{
  for (n2 = 0; n2 < 4; n2 = n2+1)
  {
   if (n2!=n1)
   {
    B = B + baseFreqs[n1]*baseFreqs[n2]*rateMatrix[n1][n2];
   }
  }
}
return B;
}
[/quote]
Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: constraining branch lengths to observed values
Reply #3 - Sep 27th, 2005 at 10:25am
 
Dear James,

Sorry about the learning curve, but thanks for your perseverance. Take a look at the example below (which I haven't tested, since I have no data for your tree). It should help clarify some things.

[code]

DataSet ds                 = ReadDataFile ("Darterfile.nex");
DataSetFilter filteredData = CreateFilter (ds,1);
 
ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES   = 1;

/* define a dummy topology to read in branch lengths */

UseModel (USE_NO_MODEL);
Topology dummy = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhin ga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079408):0.026938,((P_carbo:0.033352,P_varius:0.037379):0.034797,P_melanoleu cos:0.101265):0.131502):0.223011);";
/* make sure there are no spaces in branch lengths which could come from line wrapping*/

branchNames          = BranchName   (dummy,-1); /* populate a vector of branch names; post order traversal */
branchLengthsNow  = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */

 
/* this defines a global GRT model */

global AC = 0.1;
global AT = 0.1;
global CG = 0.1;
global CT = 0.1;
global GT = 0.3;
 
t            = 1;
 
gtrmatrix = {{*,AC*t,t,AT*t}{AC*t,*,CG*t,CT*t}{t,CG*t,*,GT*t}{AT*t,CT*t,GT*t,*}};
HarvestFrequencies (vectorOfFrequencies,filteredData,1,1,1);
 
 
Model GTRmodel = (gtrmatrix,vectorOfFrequencies, 1);
/* Last parameter of 1 is correct; HyPhy will automaticall mutliply all rate entries by the appropriate nucleotide frequency */

Tree L_1 = ""+dummy; /* set L1 to the same string as the dummy topology; ""+ forces the conversion to string type*/

/* now we compute the explicit scaling expression for the branch length of the form BranchLength = t * factor, where factor depends on AC,AT,CG,CT,GT and base frequencies */
/* since it will depend on some parameters which will be optimized, the assignment is used */

global        tToBranchLengthFactor := 2*(vectorOfFrequencies[0]*vectorOfFrequencies[1]*AC+vectorOfFrequencies[0]*vectorOfFrequencies[2]+
                                                       vectorOfFrequencies[0]*vectorOfFrequencies[3]*AT+vectorOfFrequencies[1]*vectorOfFrequencies[2]*CG+
                                                       vectorOfFrequencies[1]*vectorOfFrequencies[3]*CT+vectorOfFrequencies[2]*vectorOfFrequencies[3]*GT);
                                                      
                                                /* 2* is because the contribution of the (i,j) and the (j,i) rate to the branch length is the same for a reversible model */
                                               
/* now force all branch lengths to be equal to those read from the string by setting t := givenLength/tToBranchLengthFactor for every branch */

for (k = 0; k < Columns (branchNames)-1; k=k+1)
{
       ExecuteCommands ("L_1."+branchNames[k]+".t:=branchLengthsNow[k]/tToBranchLengthFactor;");
}  
 
LikelihoodFunction lf = (filteredData,L_1);
 
Optimize (res,lf);
LIKELIHOOD_FUNCTION_OUTPUT=5;
fprintf (stdout, "\n",lf);

[/code]
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining branch lengths to observed values
Reply #4 - Sep 27th, 2005 at 10:37am
 
Wow! thanks for the quick reply..
The trick of turning the value into a string is what I was missing, I think, and of storing the branch lengths elsewhere first...
but it doesn't work: it seems to be trying to estimate k, or something!

Quote:
global AC=0.06324555320336758;
global AT=0.06324555320336758;
global CG=0.06324555320336758;
global CT=0.06324555320336758;
global GT=0.06324555320336758;
k=0.02;
L_1.Pe_occidentalis.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Pe_conspicillatus.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.M_serrator.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.S_sula.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.S_dactylatra.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node7.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node5.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.A_anhinga.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.A_novaehollandiae.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.A_rufa.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node12.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node4.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.P_carbo.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.P_varius.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node16.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.P_melanoleucos.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node10.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node15.t:=branchLengthsNow[k]/tToBranchLengthFactor;
L_1.Node3.t:=branchLengthsNow[k]/tToBranchLengthFactor;
global tToBranchLengthFactor:=2*(vectorOfFrequencies[0]*vectorOfFrequencies[1]*AC+vectorOfFrequencies[0]*vectorOfFrequencies[2]+vectorOfFrequencies[0]*vectorOfFrequencies[3]*AT+vectorOfFrequencies[1]*vectorOfFrequencies[2]*CG+vectorOfFrequencies[1]*vectorOfFrequencies[3]*CT+vectorOfFrequencies[2]*vectorOfFrequencies[3]*GT);
Tree L_1=(Pe_occidentalis:0,Pe_conspicillatus:0,(((M_serrator:0,(S_sula:0,S_dactylatr
a:0)Node7:0)Node5:0,(A_anhinga:0,(A_novaehollandiae:0,A_rufa:0)Node12:0)Node10:0
)Node4:0,((P_carbo:0,P_varius:0)Node16:0,P_melanoleucos:0)Node15:0)Node3:0);
Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: constraining branch lengths to observed values
Reply #5 - Sep 27th, 2005 at 11:07am
 
Dear James,

Quote:
it seems to be trying to estimate k, or something!


Of course (slaps self on head)!

HyPhy will scan variable dependencies before optimizing the likelihood function and find 'k' to be an independent parameter. To fix this, use the following ExecuteCommands fix (this replaces 'k' with its actual numeric value).

Change:

Code:
ExecuteCommands ("L_1."+branchNames[k]+".t:=branchLengthsNow[k]/tToBranchLengthFactor;");
 



to

Code:
ExecuteCommands ("L_1."+branchNames[k]+".t:="+branchLengthsNow[k]+"/tToBranchLengthFactor;");
 



Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining branch lengths to observed values
Reply #6 - Sep 28th, 2005 at 3:13am
 
Hi - That now works, or at least it very nearly did - Just for future reference, I had to change

Code:
Topology dummy = "blahblahblah";
 



into

Code:
Tree dummy = "blahblahblah";
 



presumably, a "Topology" object doesn't store branch lengths?

Anyway - thanks for all your help - you've been really great - I'm going to try and code the same thing for some "local" models now, and think i'm slowly getting the hang of the HYPHY language..

i'll let you know how I get on.

Thanks again,
James
Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining branch lengths to observed values
Reply #7 - Sep 28th, 2005 at 5:55am
 
Sorry to keep bothering you Sergei - another quick question...

Is there an easy way to change a variable from "global" to "local" before re-optimising a likelihood function, to enable easy comparing of likelihoods under two models, or would I need to do this by constraining parameters to be identical at every branch?

thanks,
J
Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining branch lengths to observed values
Reply #8 - Sep 28th, 2005 at 6:19am
 
Hi Sergei - i've now got HyPhy successfully doing parameter estimates under local and global GTR models and estimating base frequencies under ML with specified branch lengths, so i'm nearly there! - Thanks for all your help.

I do have another quick question - what does this do?

Code:
EMBED_FREQUENCY_DEPENDENCE = 1;
 



I've used it in my estimating base freqs, because it was in the example batch file I cannibalised, but i've no idea what it does - it isn't in the command ref. you made for me!

James
Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: constraining branch lengths to observed values
Reply #9 - Sep 28th, 2005 at 7:45am
 
Dear James,

Quote:
I do have another quick question - what does this do?

Code:
EMBED_FREQUENCY_DEPENDENCE = 1;
 




Good question (especially since I don't remember myself what it does). Let me look at the sources at get back to you when I figure it out (I'll also add the entry to documentation).

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining branch lengths to observed values
Reply #10 - Sep 28th, 2005 at 9:16am
 
hi Sergei - another problem! When I try and fit a local GTR model with constrained branch lengths, where all of the base composition and substitution rate parameters are "local", I get a non-sensical (+ve) log likelihood..

I have a feeling this might be because i'm overfitting the model (would having local composition and matrix parameters makes some of them non-identifiable?.. i'm not sure). Of course, there could be a problem with my code. I've fitted the same model but with global base composition parameters without any problem, and get a sensible LogL.

I've pasted some of the output  below, and my code in the next message as otherwise (it's too long for the forum!)

J

[quote]

e
(LOTS OF STUFF REMOVED)
L_1.Node15.sum:=L_1.Node15.estPiT+L_1.Node15.estPiT+L_1.Node15.estPiT+L_1.Node15
.estPiT;
L_1.Node15.estPiT:=Abs(1-L_1.Node15.estPiA-L_1.Node15.estPiA-L_1.Node15.estPiA);

L_1.Node3.t:=0.223011/tToBranchLengthFactor;
L_1.Node3.sum:=L_1.Node3.estPiT+L_1.Node3.estPiT+L_1.Node3.estPiT+L_1.Node3.estP
iT;
L_1.Node3.estPiT:=Abs(1-L_1.Node3.estPiA-L_1.Node3.estPiA-L_1.Node3.estPiA);
Tree L_1=(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,
(S_sula:0.0441,S_dactylatra:0.076275)Node7:0.103486)Node5:0.03702,(A_anhinga:0.1
97727,(A_novaehollandiae:0.106082,A_rufa:0.0681)Node12:0.071018)Node10:0.079408)
Node4:0.026938,((P_carbo:0.033352,P_varius:0.037379)Node16:0.034797,P_melanoleuc
os:0.101265)Node15:0.131502)Node3:0.223011);
Log Likelihood = 1368.54748043482;
Tree L_1=(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,
(S_sula:0.0441,S_dactylatra:0.076275)Node7:0)Node5:0,(A_anhinga:0.197727,(A_nova
ehollandiae:0.106082,A_rufa:0.0681)Node12:0.071018)Node10:0.079408)Node4:0.02693
8,((P_carbo:0.033352,P_varius:0.037379)Node16:0,P_melanoleucos:0)Node15:0.131502
)Node3:0.223011);I found Lk=1368.55, with 162 params of which 0 are global params
I found Lk=1368.55, with 162 params of which 0 are global params
[/quote]

Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
jamescotton
YaBB Newbies
*
Offline



Posts: 28
London
Gender: male
Re: constraining branch lengths to observed values
Reply #11 - Sep 28th, 2005 at 9:16am
 

This is the batch file..

Code:
/*
global estPiA;
global estPiC;
global estPiG;
global estPiT;
global sum;
*/
/*estPiA:<1;
estPiC:<1;
estPiG:<1;
estPiT:=Abs(1-estPiA-estPiC-estPiG);
sum := estPiA+estPiC+estPiG+estPiT;
*/
EMBED_FREQUENCY_DEPENDENCE = 1;

DataSet ds	 = ReadDataFile ("Darterfile.nex");
DataSetFilter filteredData = CreateFilter (ds,1);

ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES   = 1;



/* define a dummy topology to read in branch lengths */


UseModel (USE_NO_MODEL);
Tree dummy = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.100006,(S_sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhinga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079408):0.026938,((P_carbo:0.033352,P_varius:0.037379):0.034797,P_melanoleucos:0.101265):0.131502):0.223011);";
/* make sure there are no spaces in branch lengths which could come from line wrapping*/

branchNames     = BranchName   (dummy,-1); /* populate a vector of branch names; post order traversal */
branchLengthsNow  = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */

for (i=0; i < Columns(branchNames)-1; i = i+1)
{
		fprintf (stdout,branchNames[i],"=",branchLengthsNow[i],"\n");
}

/* this defines a global GRT model */
/*
global AC = 0.1;
global AT = 0.1;
global CG = 0.1;
global CT = 0.1;
global GT = 0.3;
 */
t	 = 1;

gtrmatrix = {{*,AC*t,t,AT*t}{AC*t,*,CG*t,CT*t}{t,CG*t,*,GT*t}{AT*t,CT*t,GT*t,*}};
/*HarvestFrequencies (vectorOfFrequencies,filteredData,1,1,1);  */
vectorOfFrequencies = {{estPiA/sum},{estPiC/sum},{estPiG/sum},{estPiT/sum}};


Model GTRmodel = (gtrmatrix,vectorOfFrequencies, 1);
/* Last parameter of 1 is correct; HyPhy will automaticall mutliply all rate entries by the appropriate nucleotide frequency */

Tree L_1 = ""+dummy; /* set L1 to the same string as the dummy topology; ""+ forces the conversion to string type*/

/* now we compute the explicit scaling expression for the branch length of the form BranchLength = t * factor, where factor depends on AC,AT,CG,CT,GT and base frequencies */
/* since it will depend on some parameters which will be optimized, the assignment is used */

 tToBranchLengthFactor := 2*(vectorOfFrequencies[0]*vectorOfFrequencies[1]*AC+vectorOfFrequencies[0]*vectorOfFrequencies[2]+
	     vectorOfFrequencies[0]*vectorOfFrequencies[3]*AT+vectorOfFrequencies[1]*vectorOfFrequencies[2]*CG+
	     vectorOfFrequencies[1]*vectorOfFrequencies[3]*CT+vectorOfFrequencies[2]*vectorOfFrequencies[3]*GT);

	   /* 2* is because the contribution of the (i,j) and the (j,i) rate to the branch length is the same for a reversible model */

/* now force all branch lengths to be equal to those read from the string by setting t := givenLength/tToBranchLengthFactor for every branch */

/*DO ALL THE CONSTRAINTS*/
ReplicateConstraint ("this1.?.estPiA<1", L_1);
ReplicateConstraint ("this1.?.estPiC<1", L_1);
ReplicateConstraint ("this1.?.estPiG<1", L_1);
ReplicateConstraint ("this1.?.estPiT:=Abs(1-this2.?.estPiA-this2.?.estPiC-this2.?.estPiG)", L_1, L_1);
ReplicateConstraint ("this1.?.sum:=this2.?.estPiT+this2.?.estPiA+this2.?.estPiC+this2.?.estPiG", L_1, L_1);

for (k = 0; k < Columns (branchNames)-1; k=k+1)
{
   ExecuteCommands ("L_1."+branchNames[k]+".t:="+branchLengthsNow[k]+"/tToBranchLengthFactor;");

}


LikelihoodFunction lf = (filteredData,L_1);

Optimize (res,lf);
LIKELIHOOD_FUNCTION_OUTPUT=5;
fprintf (stdout, "\n",lf);
LIKELIHOOD_FUNCTION_OUTPUT=2;
fprintf (stdout, "\n",lf);

fprintf(stdout, "I found Lk=",res[1][0],", with ",res[1][1], " params of which ",res[1][2]," are global params\n");

LFCompute (lf,LF_START_COMPUTE);
LFCompute (lf,res2);
LFCompute (lf,LF_DONE_COMPUTE);

fprintf(stdout, "I found Lk=",res[1][0],", with ",res[1][1], " params of which ",res[1][2]," are global params\n");

/*
HarvestFrequencies (obsFreq,filteredData,1,1,1);

fprintf (stdout, "\nEstimated Frequencies:", vectorOfFrequencies, "\nObserved Frequencies", obsFreq);
*/
 

Back to top
 

James Cotton&&School of Biological and Chemical Sciences&&Queen Mary, University of London&&+44 (0)207 882 3645&&j.a.cotton@qmul.ac.uk&&Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: constraining branch lengths to observed values
Reply #12 - Sep 28th, 2005 at 11:13am
 
Dear James,

The local case is a bit trickier, because one has to be very careful about which parameters must remain global (base frequencies for instance) and define a branch specific branch length constraint.

Give this a try: (may contain bugs, since I haven't tried to run it - missing the data file).

Code:
/* frequency parameters must be global (shared by tree branches), otherwise we don't really know
character states weights at the root and can't compute proper likelihood;
By not declaring frequency parameters as globals, you are liable to get nonsensical results
*/

/* also, notice the trick to define four [0,1] parameters which sum to 1 without scaling,
but rather with 3 auxiliary variables
ReplicateConstraint doesn't work on :< constraints; those need to be set explicitly
*/

global aux1 = 0.5;
aux1 :< 1;
global aux2 = 0.33;
aux2 :< 1;
global aux3 = 0.25;
aux3 :< 1;

global estPiA := aux1;
global estPiC := (1-aux1)*aux2;
global estPiG := (1-aux1)*(1-aux2)*aux3
global estPiT := (1-aux1)*(1-aux2)*(1-aux3);


  DataSet ds  = ReadDataFile ("Darterfile.nex");  
DataSetFilter filteredData = CreateFilter (ds,1);  
 
ACCEPT_BRANCH_LENGTHS = 1;  
ACCEPT_ROOTED_TREES   = 1;  
 
 
 
/* define a dummy topology to read in branch lengths */  
 
UseModel (USE_NO_MODEL);  
Tree dummy = "(Pe_occidentalis:0.044344,Pe_conspicillatus:0.077582,(((M_serrator:0.10 0006,(S_sula:0.044100,S_dactylatra:0.076275):0.103486):0.037020,(A_anhin ga:0.197727,(A_novaehollandiae:0.106082,A_rufa:0.068100):0.071018):0.079 408):0.026938,((P_carbo:0.033352,P_varius:0.037379):0.034797,P_melanoleu cos:0.101265):0.131502):0.223011);";  
/* make sure there are no spaces in branch lengths which could come from line wrapping*/  
 
branchNames       = BranchName   (dummy,-1); /* populate a vector of branch names; post order traversal */
branchLengthsNow  = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */
 
for (i=0; i < Columns(branchNames)-1; i = i+1)
{
  	fprintf (stdout,branchNames[i],"=",branchLengthsNow[i],"\n");
}
   
t  = 1;  /* t will now serve as a scaling parameter to constraints branch lengths */
 
gtrmatrix = {{*,AC*t,t,AT*t}
			 {AC*t,*,CG*t,CT*t}
			 {t,CG*t,*,GT*t}
			 {AT*t,CT*t,GT*t,*}};  

vectorOfFrequencies = {{estPiA},{estPiC},{estPiG},{estPiT}};
   
   
Model GTRmodel = (gtrmatrix,vectorOfFrequencies, 1);  
 
Tree L_1 = ""+dummy; /* set L1 to the same string as the dummy topology; ""+ forces the conversion to string type*/
 
     
/* now force all branch lengths to be equal to those read from the string by setting t := givenLength/tToBranchLengthFactor for every branch;
because tToBranchLengthFactor now depends on local branch lengths we need to redefine the scaling factor for each branch */
 
 
for (k = 0; k < Columns (branchNames)-1; k=k+1)  
{  
   branchPrefix = "L_1."+branchNames[k]+".";
   ExecuteCommands (branchPrefix+".t:=0.5*"+branchLengthsNow[k]+"/("
					+ "estPiA*estPiC*"+branchPrefix+".AC+"
					+ "estPiA*estPiG+"
					+ "estPiA*estPiT*"+branchPrefix+".AT+"
					+ "estPiC*estPiG*"+branchPrefix+".CG+"
					+ "estPiC*estPiT*"+branchPrefix+".CT+"
					+ "estPiG*estPiT*"+branchPrefix+".GT)"
	);  
   
}  
 
LikelihoodFunction lf = (filteredData,L_1);  
 
Optimize (res,lf);  
LIKELIHOOD_FUNCTION_OUTPUT=5;  
fprintf (stdout, "\n",lf);  
 



Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged