Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
model parameters and branch lengths (Read 4570 times)
eliot
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 9
model parameters and branch lengths
Apr 1st, 2008 at 9:50pm
 
hello,
                                                                              
I'm wondering about the relationship between the parameters in a model (say the F81 model) and branch lengths.
                                                                              
Take for example, the file relrate.bf. If I print out the parameter myTree.b.mu (in the unconstrained case), it is different from the branch length (in the tree that's calculated by HYPHY) by a fixed ratio. This ratio seems to be the same for different branches. What is the meaning of this ratio? What is myTree.b.mu? I thought it was the mu in the F81 rate matrix. But if this were the case, it shouldn't always have the same proportion with branch length, right? Or am I thinking about this wrong?
                                                                              
If we had a model like HKY85, we would have myTree.b.transversion and myTree.b.transition. What would be the relationship between these and the branch length?
                                                                              
thanks!
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: model parameters and branch lengths
Reply #1 - Apr 2nd, 2008 at 1:22pm
 
Hello Eliot,

Branch lengths are measured in units of expected number of substitutions.  The parameter mu is different from the branch length reported in HyPhy because the latter is scaled by the expected number of substitutions, calculated by summing the rates of all possible substitutions (all parameters in the off-diagonal of the rate matrix).

In order to see this, try the following example code:

Code:
DataSet 		nucleotideSequences = ReadDataFile ("data/3.seq");
DataSetFilter	filteredData = CreateFilter (nucleotideSequences,1);

HarvestFrequencies (observedFreqs, filteredData, 1, 1, 1);

F81RateMatrix =

		{{*,mu,mu,mu}
		 {mu,*,mu,mu}
		 {mu,mu,*,mu}
		 {mu,mu,mu,*}};

Model 	F81 = (F81RateMatrix, observedFreqs);


ALLOW_SEQUENCE_MISMATCH = 1;

Tree	threeTaxaTree = (a,b,c);

LikelihoodFunction  theLnLik = (filteredData, threeTaxaTree);

Optimize (paramValues, theLnLik);
unconstrainedLnLik = paramValues[1][0];

fprintf  (stdout, "\n0).UNCONSTRAINED MODEL:", theLnLik);
fprintf (stdout, "\n", observedFreqs, "\n");

/*_____________________________________________________________________ */
function computeScalingFactorB(rateMatrix, baseFreqs)
{
	B = 0;
	for (n1 = 0; n1 < Rows(rateMatrix); n1 = n1+1)
	{
		for (n2 = 0; n2 < Columns(rateMatrix); n2 = n2+1)
		{
			if (n2 != n1)
			{
				B = B + baseFreqs[n1]*baseFreqs[n2]*rateMatrix[n1][n2];
			}
		}
	}
	return B;
}

fprintf (stdout, "scaling factor = ", computeScalingFactorB (F81RateMatrix, observedFreqs), "\n");
 



Hope this helps,
- Art.
Back to top
 
 
IP Logged
 
eliot
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 9
Re: model parameters and branch lengths
Reply #2 - Apr 3rd, 2008 at 12:47pm
 
hi Art,

That does help. I can see that

threeTaxaTree.a.mu = branch length * mu (i.e. entry in the rate matrix) / B

This makes it seem like there are two parameters which must be estimated on each branch, mu and branch length. Is that right? If so, why does paramValues[1][1] say there are only 3 parameters (ie one per branch)?

thanks again.
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: model parameters and branch lengths
Reply #3 - Apr 3rd, 2008 at 3:15pm
 
Hi eliot,

The three parameters being estimated are the branch lengths, in units of expected number of substitutions.   The local parameter mu is confounded with time, in that it's impossible to estimate mu and (t) separately for each branch.  Smiley

Cheers,
- Art.
Back to top
 
 
IP Logged
 
eliot
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 9
Re: model parameters and branch lengths
Reply #4 - Apr 3rd, 2008 at 5:26pm
 
hi Art,

I'm still confused about something here. Ok, so when I run your file on 3.seq I get

Tree threeTaxaTree=(a:0.0850301,b:0.0733855,c:0.165368);
threeTaxaTree.a.mu: 0.116828

scaling factor = 0.165368

You said that the branch length is scaled by the expected number of substitutions, which is the scaling factor. I was expecting this would mean you multiply threeTaxaTree.a.mu by the scaling factor.

But 0.116828*0.165368 obviously is not 0.0850301

If your computeScalingFactorB function had the line

B = B + baseFreqs[n1]*baseFreqs[n2];

instead of

B = B + baseFreqs[n1]*baseFreqs[n2]*rateMatrix[n1][n2];

then B=0.727825 and it works out.


Why does B have the rateMatrix term in it? And what is the interpretation of "my version" of B?

Thanks again for your help,
Eliot
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: model parameters and branch lengths
Reply #5 - Apr 3rd, 2008 at 6:07pm
 
Hi eliot,

Okay, let's forget the scaling factor for one sec --- I'm not sure it's working correctly anyhow.  Embarrassed

The expected number of substitutions on a branch is the number reported when HyPhy spools results from the likelihood function:

Code:
0).UNCONSTRAINED MODEL:Log Likelihood = -944.001000454328;
Tree threeTaxaTree=(a:0.0850301,b:0.0733855,c:0.165368);
 



So for branch 'c', E[s] = 0.165368.  mu is a local parameter in the F81 model that is estimated for every branch in the tree, and for 'c', mu = 0.227208.   How do we get from mu to E[s]?

E[s] is calculated by taking the negative of the weighted sum of the rate matrix Q, i.e.

E[s] = - (p(A) (-mu p(C) - mu p(G) - mu p(T)) + p(C) (-mu p(A) - mu p(G) - mu p(T) ) + ...


Now our stationary distribution is estimated from observed nucleotide frequencies:
Code:
>observedFreqs
{
{    0.349333333333}
{    0.151111111111}
{    0.214222222222}
{    0.285333333333}
}
 




Proceeding through the formula for E[s], we get:
Code:
>0.349333333333 * 0.227208119226 * (1-0.349333333333)
0.0516443
>0.151111111111 * 0.227208119226 * (1-0.151111111111)
0.0291455
>0.214222222222 * 0.227208119226 * (1-0.214222222222)
0.0382462
>0.285333333333 * 0.227208119226 * (1-0.285333333333)
0.0463319
>0.0516443 + 0.0291455 + 0.0382462 + 0.0463319
0.165368
 



What this formula for E[s] is doing is taking into account multiple substitutions.  In other words, a nucleotide might not only change from A to G, but subsequently back to A over the evolutionary time represented by the branch.  (Recall that this time is already confounded by, i.e. represented in, mu).

Better?   Smiley
- Art.
Back to top
 
 
IP Logged
 
eliot
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 9
Re: model parameters and branch lengths
Reply #6 - Apr 4th, 2008 at 2:47pm
 
Yes, that helps. Thanks!
Back to top
 
 
IP Logged