Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Reversible binary model-how to fix branch lengths? (Read 3020 times)
Aidan Budd
YaBB Newbies
*
Offline


Monkey fed...

Posts: 29
Heidelberg, Germany
Gender: male
Reversible binary model-how to fix branch lengths?
Mar 6th, 2008 at 2:21am
 
Hi,

I am trying to get an ML estimate for the rate of a reversible model of binary character change for a given binary character dataset, for a given tree topology, and given set of branch lengths - estimating character composition proportions (pi's) from the data set.

I'm now stuck trying to set up a tree using the rev model - after I set up the model, and get a tree (with specified branch lengths), for some reason, the branches are all set to length = 1. I've already set ACCEPT_BRANCH_LENGTHS = 1, and use very similar code to do the same thing for the non-rev model (where it works, thanks to Sergei's help). I've put the code at the end of this post.

Tearing my hair out about this - would be great if you could suggest how to fix these branches to the values in the string.

Once this works, for running optimisation of the rate character with fixed  branch lengths, under the rev model, I will set a constraint on "t" of branches of the optimised tree by scaling the initial branch lengths by the inverse of the rate used to optimise the tree i.e. initial branch length / rate. Sound reasonable?

Many thanks

Aidan

Code:
dssRev = "$BASESET:01\n#a\n101110101\n#b\n101011000\n#c\n011001111\n#d\n111001110";
DataSet myDataRev = ReadFromString(dssRev);
DataSetFilter myFilterRev = CreateFilter (myDataRev,1);

ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES   = 1;

global rateRev = 1;

/*tRev=1;   t=1?*/

fprintf (stdout, "\n\nStart New Analysis\n");

binaryRevQ = {{*,rateRev}
              {rateRev,*}};

HarvestFrequencies (eqFrequenciesRev, myFilterRev, 1, 1, 1);
/*eqFrequenciesRev = {{0.5}{0.5}};*/

/*For a fixed topology and fixed set of branch lengths, get the ML estimate of the rate of a reversible model*/
/* With rev model, need to have final parameter set to 0...? */
Model binaryRev = (binaryRevQ, eqFrequenciesRev, 0);

Tree myTreeRev = "(((a:1,b:3):1,c:1):1,d:1);";
treeString = Format (myTreeRev, 1, 1);
fprintf (stdout, "tree string before optimisation: ", treeString, "\n");
 

Back to top
 

Aidan Budd&&Computational Biologist&&EMBL Heideberg, Germany
 
IP Logged
 
Aidan Budd
YaBB Newbies
*
Offline


Monkey fed...

Posts: 29
Heidelberg, Germany
Gender: male
Re: Reversible binary model-how to fix branch leng
Reply #1 - Mar 6th, 2008 at 8:47am
 
So... I realise now that this rev model forces equilibrium frequencies of 0.5 for both 1 and 0  characters - (for two characters with the rate of change the same from 0 to 1 as 1 to 0, the equilibrium frequency has to be 0.5 for both of them...)

I found that including "t" into the substitution matrix, and not defining "t=1" initially, gives me a tree whose branch lengths don't change as I change the rate parameter (which is what I'm looking for - I want to optimise the rate parameter without changing branch lenghts or topology).

To get a tree whose branch lengths don't change after "Optimize" (i.e. branches that stay the same for different values of the rate parameter), I have to place the constraints on branch lengths AFTER optimising the likelihood function

Somehow I'm surprised that the constraints need to be placed afterwards like this - I'd have thought they need to be constrained BEFORE doing the optimisation - something I'm missing here...

I realise I just don't get what "t" is doing here, and how it makes a difference whether or not it is the description of Q.

Also, in the script below, I find that changing the rate has no effect on the likelihood of the model - not sure whether this is because I'm somehow asking a nonsensical question here, am coding things wrong, or that I'm just dealing with a likelihood that doesn't depend on rate....??

Any hints/suggestions on these issues much appreciated.

Sorry for all the Newbie questions - I hope if the conceptual problems I seem to have can be ironed out, I'll be able to sort stuff out much better on my own...?!    Smiley

Aidan

Code:
dssRev = "$BASESET:01\n#a\n101110101111111\n#b\n101011000101111\n#c\n011001111001000\n#d\n111001110100001";
DataSet myDataRev = ReadFromString(dssRev);
DataSetFilter myFilterRev = CreateFilter (myDataRev,1);

ACCEPT_BRANCH_LENGTHS      = 1;
ACCEPT_ROOTED_TREES        = 1;
LIKELIHOOD_FUNCTION_OUTPUT = 1;


global rateRev = 1;

/*tRev=1;   t=1?*/

/*t=1;*/

fprintf (stdout, "\n\nStart New Analysis\n");

binaryRevQ = {{*,rateRev*t}
              {rateRev*t,*}};

eqFrequenciesRev = {{0.5}{0.5}};

/*For a fixed topology and fixed set of branch lengths, get the ML estimate of the rate of a reversible model*/
/* With rev model, need to have final parameter set to 0...? */
Model binaryRev = (binaryRevQ, eqFrequenciesRev, 0);

Tree myTreeRev = (((a:0.1,b:0.3):0.1,c:0.1):0.1,d:1);
/*get all branch names and branch lengths*/
branchNamesRev  = BranchName  (myTreeRev,-1);
branchLengthsInitialRev = BranchLength (myTreeRev,-1);

LikelihoodFunction lfRev = (myFilterRev,myTreeRev);
fprintf (stdout, "\nLikelihood function before optimisation\n",lfRev);

treeString = Format (myTreeRev, 1, 1);
fprintf (stdout, "tree string before optimisation: ", treeString, "\n");

Optimize (res,lfRev);

for (k = 0; k < Columns (branchNamesRev)-1; k=k+1)
{
 outString = "myTreeRev."+branchNamesRev[k]+".t:="+branchLengthsInitialRev[k]+"/rateRev;";
 ExecuteCommands (outString);
 fprintf (stdout, "executing: ", outString, "\n");
}

fprintf (stdout, "\nLikelihood function after optimisation\n",lfRev);
treeString = Format (myTreeRev, 1, 1);
fprintf (stdout, "\ntree string with single timeRev rate:",rateRev," ", treeString, "\n");

rateRev = 0.01;

fprintf (stdout, "\nLikelihood function with rateRev set to: ",rateRev, "\n",lfRev);
treeString = Format (myTreeRev, 1, 1);
fprintf (stdout, "\ntree string with single timeRev rate:",rateRev," ", treeString, "\n");
 



Back to top
 

Aidan Budd&&Computational Biologist&&EMBL Heideberg, Germany
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Reversible binary model-how to fix branch leng
Reply #2 - Mar 10th, 2008 at 9:11am
 
Dear Aidan,

First, let me tell you about the 't' parameter. Traditionally (in a nucleotide setting), people have been defining the rate matrix (Q), which describes the substitution process, and a branch length parameter 't', which describes evolutionary time. Transition probability matrix was computed as exp (Q*t).

In HyPhy, the definition of the rate matrix includes both the traditional Q and the time parameter 't' explicitly. The main reason for that is because HyPhy permits each branch in the tree to have more than one parameter.

Now the binary reversible model. If you define the rate matrix as {{*,rate*t}{rate*t,*}, it may look like you have two parameters, but in reality you can only estimate the product: rate*t. This is because the phylogenetic likelihood function will only contain references to rate*t, not rate and t separately. This is actually the case for all time-reversible models: in one way or another only rate*time products are estimable. This is why your example code was not able to estimate the rate directly.

Hence your Q matrix may as well be {{*,t}{t,*}}. Your frequencies do not have to be fixed at 0.5, they can be anything of the form {{p,(1-p}}. This is because another 'implicit' convention in defining reversible rate matrices is that each entry is post-multiplied by the target stationary state frequency. This is simply a convention, because every time-reversible rate matrix can be written this way. In HyPhy, if you toggle the last argument of the Model ... = ... construct to 1, HyPhy will carry out the implicit multiplication for you.

So,

Code:
Q = {{*,t}{t,*}}; global p = 0.5; eFV = {{p,1-p}};
Model binaryREV = (Q,eFV);
 



will do what you want. Indeed, the explicit rate matrix will be

Code:
-(1-p)*t, (1-p)*t
t*p, -t*p
 



and you can check that pi = {{p,1-p}} is the stationary distribution, by check that the matrix multiplication pi*Q yields a 0 vector.

HTH,
Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Aidan Budd
YaBB Newbies
*
Offline


Monkey fed...

Posts: 29
Heidelberg, Germany
Gender: male
Re: Reversible binary model-how to fix branch leng
Reply #3 - Mar 11th, 2008 at 11:13am
 
Thanks, Sergei

I'm still having trouble controlling branch-lengths on these trees, though -

I want branch lengths to be the expected number of changes per character across the branch (given the time associated with the branch).

This can be obtained by summing the number of 1->0 and 0->1 transitions expected in a given time.

If I was using a "classic" Q matrix {{*,rate}{rate,*}} this would be

BL = pi_0*rate*t + pi_1*rate*t

As pi_0 + pi_1 = 1, this would give

BL = rate*t

(rate and t are confounded due the the model being reversible)

In HyPhy, t is built directly into the Q-like matrix (i.e. it's the product of a "classic" Q and t), and HyPhy Q-like matrix also incorporates the equilibrium frequencies of the characters so, as you indicated, in HyPhy my matrix would actually look like

Code:
-(1-p)*t, (1-p)*t
t*p, -t*p
 



So, to get the BL (i.e. expected number of changes per character across the branch) from here, I should just need to sum the diagonal of the matrix and multiply by -1  (or, as it's only a 2*2 matrix, I could just sum the inverse diagonal) to give

BL = (1-p)t + tp
BL = t

So, what I don't get is why, when I use the code below, the branch-lengths of the tree object are half those I specify in the tree string. Each branch has t set to the value specified in the tree string, and I thought I'd just shown that BL = t. So I'd expected the tree to have branch-lengths equal to t...?

Code:
dssRev = "$BASESET:01\n#a\n101110101111111\n#b\n101011000101111\n#c\n011001111001000\n#d\n111001110100001";
DataSet myDataRev = ReadFromString(dssRev);
DataSetFilter myFilterRev = CreateFilter (myDataRev,1);

ACCEPT_BRANCH_LENGTHS      = 1;
ACCEPT_ROOTED_TREES        = 1;
LIKELIHOOD_FUNCTION_OUTPUT = 1;

fprintf (stdout, "\n\nStart New Analysis\n");

binaryRevQ = {{*,t}
              {t,*}};

global p = 0.5;

eqFrequenciesRev = {{p,1-p}};

Model binaryRev = (binaryRevQ, eqFrequenciesRev, 1);

Tree myTreeRev = (((a:1,b:3):1,c:1):1,d:1);
treeString = Format (myTreeRev, 1, 1);
fprintf (stdout, "tree string before optimisation: ", treeString, "\n");
LikelihoodFunction lfRev = (myFilterRev,myTreeRev);
fprintf (stdout, "\nLikelihood function before optimisation\n",lfRev);
 



Where am I going wrong in my thinking here..?

Thanks

Aidan

Back to top
 

Aidan Budd&&Computational Biologist&&EMBL Heideberg, Germany
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Reversible binary model-how to fix branch leng
Reply #4 - Mar 11th, 2008 at 11:59am
 
Dear Aidan,

Aidan Budd wrote on Mar 11th, 2008 at 11:13am:
Code:
-(1-p)*t, (1-p)*t
t*p, -t*p
 



So, to get the BL (i.e. expected number of changes per character across the branch) from here, I should just need to sum the diagonal of the matrix and multiply by -1  (or, as it's only a 2*2 matrix, I could just sum the inverse diagonal) to give

BL = (1-p)t + tp
BL = t



The branch length expression (for any rate matrix, whose eq. freqs are given by pi) is BL = sum_i pi_i * q_ii.
In your case: BL = p(1-p)t + (1-p)tp = 2 tp(1-p).

Try this code:

Code:
dssRev = "$BASESET:01\n#a\n101110101\n#b\n101011000\n#c\n011001111\n#d\n111001110";
DataSet myDataRev = ReadFromString(dssRev);
DataSetFilter myFilterRev = CreateFilter (myDataRev,1);

ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES  = 1;

/*tRev=1;  t=1?*/

fprintf (stdout, "\n\nStart New Analysis\n");

binaryRevQ = {{*,t}
	 		 {t,*}};

HarvestFrequencies (eqFrequenciesRev, myFilterRev, 1, 1, 1);
/*eqFrequenciesRev = {{0.5}{0.5}};*/

/*For a fixed topology and fixed set of branch lengths, get the ML estimate of the rate of a reversible model*/
/* With rev model, need to have final parameter set to 0...? */

Model binaryRev = (binaryRevQ, eqFrequenciesRev);

Tree myTreeRev = "(((a:1,b:3):1,c:1):1,d:1);";

BLS = 2*eqFrequenciesRev[0]*(1-eqFrequenciesRev[0]);
/* BLS solves BLS * t = BL, hence t needs to be set to BL/BLS*/

bn = BranchName (myTreeRev, -1);

for (k = 0; k < Columns (bn) - 1; k=k+1)
{
	ExecuteCommands ("myTreeRev." + bn[k] + ".t = " + "myTreeRev." + bn[k] + ".t / BLS");
}

treeString 	   = Format (myTreeRev, 1, 1);
fprintf (stdout, "tree string before optimisation: ", treeString, "\n");
 



HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged