Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Pages: 1 2 
Unrestricted/non reversible models (Read 8113 times)
Jacek Majewski
Guest


Unrestricted/non reversible models
Dec 19th, 2005 at 7:50pm
 
Hi,

Does HyPhy handle unconstrained, non-time reversible models? I'm trying to implement an 11 rate-parameter model (similar to PAML nucleotide model 8), but so far I've had not luck - the results seem very suspicious. Most likely I'm doing something drastically wrong.

I want the rate parameters to be constant across the phylogeny, so I define my parameters and matrix as below:

ACCEPT_ROOTED_TREES=1;

global a;
global b;
global c;
global d;
global e;
global f;
global g;
global h;
global i;
global j;
global k;
global l;

UNRESTRateMatrix = {{*,m*a,m*b,m*c}{m*d,*,m*e,m*f}{m*g,m*h,*,m*i}{m*j,m*l,m*k,*}};
Model UNREST = (UNRESTRateMatrix, obsFreqs, 0);

Here are some problems:

1) I don't want to multiply the rate matrix by nucleotide frequencies, since all the parameters can vary independently and I don't need time-reversibility - > I use the 0 option as the third parameter of  the "Model". Why do I still need to supply the frequency vector as the second argument? How are those frequencies used? The likelihood, tree, and parameter values change if I use equalFreqs vs. obsFreqs, even though I don't multiply the matrix by the frequency vector.

2. The model, as it is above, seems over-parametriized. I think I should be able to get rid of one of the global rate parameters (a - k). I tried this by setting one of the parameters to 1 (i.e. replacing, say, a by 1) in the matrix definition, but this alters the likelihood and is dependent on which parameter is held constant.

What am I doing wrong and how should I parametrize the rate matrix? Thanks in advance for any help,

Jacek
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Unrestricted/non reversible models
Reply #1 - Dec 19th, 2005 at 10:06pm
 
Dear Jacek,

HyPhy doesn't restrict models to be time-reversible (just Markovian). To answer your questions...

In order to compute the likelihood of a given tree, HyPhy needs to know the distrubution of characters at the root of the tree (so that summation over all possible ancestral states in the pruning algorithm can be seeded). This is what obsFreqs is for in the Model statment. It is also improtant to keep in mind that another standard assumption of phylogenetic evolutonary models is stationarity of base frequencies along the entire tree (for reversible and non-reversible models). If your transition matrix for time t is R(t) = exp (Q*t), then stationary distribution pi (a row vector here) satisfies pi * R(t) = pi for all t>=0 (this is the same as stating that pi * Q = 0), i.e. the substitution process does not modify the distribution of characters.

If the distribution of nucleotides you specify for the root is not stationary for Q, then at each internal node (and tips of the tree), the expected frequency of nucleotides is going to be different, depending on the branch length; this may or may not be biologicall sensible, but one should at the very least like to have the expected frequency of nucleotides at the leaves (i.e. for observed sequences) to be the same (or similar) to the proportion observed in these sequences; unless care is taken, a non-stationary process may fail this spectacularly.

Now we are ready to begin adressing your second question. In general, the stationary distribution of the unrestricted model is not going to match your observed frequencies. If you wish this restriction to hold, 3 out of 12 rate parameters have to be constrained (based on the values of obsFreqs). Take a look at the file NRM.bf in TemplateBatchFiles/TemplateModels; it defines a general model (non-reversible) which is forced to have the stationary distribution which matches observed counts. Incidentally, this model is an available option in AnalyzeNucProtData.bf standard analysis (so you can use HyPhy to fit non-reversible models out of the box).

If you attempt to fit an unrestricted model as defined in your example, the estimation procedure will probably attempt to adjust rates in such a way as to approximate the stationary distribution by something close to obsFreq. When you constrain some of the parameters to be '1', you may be breaking the ability of the model to adjust rates and frequencies, while for other parameters the '1' constaint is fine.

An alternative solution is to replace the obsFreq in the defintion of the model by the actual stationary distribution defined via a-k parameters. You can work out what this distribution is by solving

(pi_a, pi_c, pi_g, pi_t) * UNRESTRateMatrix = 0, analytically for pi_{a,c,g,t} (each irreducible discrete state Markov process is guaranteed to have a stationary distribution, so the above equation will have a solution in terms of a-k). In this case you can arbitarily set one of the rates to 1 (it is best to choose a transition, e.g. A<->G or C<->T).

Then you could define equilibFreq = {{expression1}{expression2}{expression3}{expression4}}, using the solutions of the above system and then write,

Model UNREST = (UNRESTRateMatrix, equilibFreq, 0);

Hope this helps. Please don't hesitate to ask for clarification.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jacek
Guest


Re: Unrestricted/non reversible models
Reply #2 - Dec 20th, 2005 at 1:08pm
 
Hi Sergei.

Thanks for the reply. I like the second option - solving the equations analytically for the equilibrium frequencies, since that would allow me to use the unconstrained model and interpret the parameters directly as relative rates of substitutions. BUT, I remember at some point trying to solve the equilibrium distribution for the 12 parameter model, and the solution is painfully long...

So I chose the first suggestion: I used the NRM.mdl definition. However, I seem to run into the same problem that I had before. When I use your original code, which sets AG as 1, I get an answer that for my dataset may differ by as much as 20 LogLikelihood units from setting another parameter, GT = 1 (I made sure that I replaced GT = 1 in both the matrix definition and the global parameter definition for TG).

     global TA:=AT+(AC-CA)*r1__+(AG-GA)*r2__;
     global TC:=CT+(CA-AC)*r0__+(CG-GC)*r2__;
     global TG:=1+(GA-AG)*r0__+(GC-CG)*r1__;

     
     ModelMatrixName =
                 {{*,t*AC,t*AG,t*AT}
                  {t*CA,*,t*CG,t*CT}
                  {t*GA,t*GC,*,t*1}
                  {t*TA,t*TC,t*TG,*}};
                 
Fixing most of the other parameters alters the likelihood to some extent.

I also tried this with the GTR model, and I still get differences when using different parameters as a reference (although not as big, about 2 units).

I've redone this many times now, and can't find a problem. Do you have any ideas?

Jacek
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Unrestricted/non reversible models
Reply #3 - Dec 20th, 2005 at 2:27pm
 
Dear Jacek,

This sounds (at least for GTR) as a convergence issue, unless you did not remember to add a
Code:
global AG=1;
 


statement before model definition. In this case, each branch will have its own AG rate (you can check this by printing all parameter values after the likelihood optimization; if AG appears as givenTree.nodename.AG a bunch of times, then you have a local parameter), which will yield better likelihoods than the default parameterization.

Check messages.log to see if the optimization routines terminate prematurely (before convergence).

If you data set is fairly small, default optimization settings might be bad. Try adjusting OPTIMIZATION_PRECISION and MAXIMUM_ITERATIONS_PER_VARIABLE parameters in the batch file.

Also, if you pick a rate which is very small as the reference, other relative rates may be very large (and difficult to estimate well).

Lastly, there is a way to have HyPhy deduce correct stationary distribution for a model run-time (so you don't need to solve analytically). I can give an example of how, if you want.

Hope this helps,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jacek  Majewski
Guest


Re: Unrestricted/non reversible models
Reply #4 - Dec 21st, 2005 at 7:27am
 
Thanks!!!

I'm glad I asked instead of banging my head against the wall. The MAXIMUM_ITERATIONS_PER_VARIABLE was indeed the problem. Now everything converges fine.  I would very much appreciate  and example of how to:

"deduce correct stationary distribution for a model run-time (so you don't need to solve analytically)".

HyPhy seems like a great tool, but expectedly with all its flexibility it takes a while to figure out. I may be bothering you again...

Jacek
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Unrestricted/non reversible models
Reply #5 - Jan 4th, 2006 at 5:34pm
 
Dear Jacek,

Quote:
"deduce correct stationary distribution for a model run-time (so you don't need to solve analytically)".


Sorry I didn't get back to you sooner (holidays intervened).

At any rate, here's the code for such a model

Code:
/* This file defines the transition matrix for the General Non-Reversible model without expicit frequency parameterization
   The file should be used as follows:

   01/04/2006  by Sergei L. Kosakovsky Pond
*/


global AC = 1;
global AT = 1;
global CA = 1;
global CG = 1;
global CT = 1;
global GA = 1;
global GC = 1;
global GT = 1;
global TA = 1;
global TC = 1;
global TG = 1;


function PopulateModelMatrix (ModelMatrixName&)
{
	/* All the global rate parameters are defined relative to
	the rate for A->G. For instance, CG represents the ratio
	of the rates C->G/A->G. These parameters are also going
	to be confounded with character frequency parameters */


	ModelMatrixName =
			{{*,t*AC,t,t*AT}
			 {t*CA,*,t*CG,t*CT}
			 {t*GA,t*GC,*,t*GT}
			 {t*TA,t*TC,t*TG,*}};

	return 0;
}

NRM = 0;
MULTIPLY_BY_FREQS = PopulateModelMatrix ("NRM");

dynamicFreqs = {4,1};

vectorOfFrequencies = {4,1};
vectorOfFrequencies [0] := computeEQF (AC,AT,CG,CT,GT,CA,GA,TA,GC,TC,TG);
vectorOfFrequencies [1] := dynamicFreqs[1];
vectorOfFrequencies [2] := dynamicFreqs[2];
vectorOfFrequencies [3] := dynamicFreqs[3];


Model GRMModel = (NRM, vectorOfFrequencies, MULTIPLY_BY_FREQS);
FREQUENCY_SENSITIVE = 0;

/*--------------------------------------------------------*/

ffunction computeEQF (dummy1,dummy2,dummy3,dummy4,dummy5,dummy6,dummy7,dummy8,dummy9,dummy10,dummy11)
{
	t = 1;
	tempMatrix = NRM;
	for (k=0; k<4; k=k+1)
	{
		tempMatrix[k][3] = 1;
	}
	tempMatrix = Inverse(tempMatrix);
	for (k=0; k<4; k=k+1)
	{
		dynamicFreqs[k] = tempMatrix[3][k];
	}
	return dynamicFreqs[0];
}
 

.

The trick (pointed out to me by Raazesh Sainudiin) is to notice that if you take the rate matrix for the Markov process (Q), set all the entries of the last column to '1' and then invert it, the equilibrium frequencies of the process will constitute the last row of the inverse.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Marissa
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 6
Re: Unrestricted/non reversible models
Reply #6 - Jun 2nd, 2009 at 9:59am
 
Hello,

In the code that is provided here, is the output the substitution rates? If not, how can I calculate them?

Thanks,

Marissa
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Unrestricted/non reversible models
Reply #7 - Jun 3rd, 2009 at 10:28pm
 
Dear Marissa,

If you are thinking in terms of standard (e.g. GTR or HKY85) substitution rates, you need to factor out the equilibrium frequencies, by diving each entry of the estimated matrix (i,j) by the j-th entry of the estimated equilibrium frequency vector.

Code:
t=1;
MLE_rate_matrix = NRM;
MLE_base_freqs = vectorOfFrequencies;

for (r = 0; r < 16; r=r+1)
{
	MLE_rate_matrix[r] = MLE_rate_matrix[r]/MLE_base_freqs[r%4];
}

 



Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Marissa
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 6
Re: Unrestricted/non reversible models
Reply #8 - Jun 11th, 2009 at 4:11pm
 
Hello Sergei,

Thank you for the help and the prompt response!

After thinking about this more, I think I was not asking about quite the right thing. To give you an idea about what it is that I am looking for, I will explain what I am doing and ask if you have any insight about what I should be calculating.

I'm trying to compare the nucleotide substitution process in two different data sets. Both of these consist of ancestor descendent alignments where we have an ancestral transposon aligned to the human genome. I would like to be able to compare the rate of one particular substitution (say C->T) in the two data sets.

The processes I'm interested in are not time-reversible, so the model you created in your 2006 code earlier in this thread seems to be the right one to use.

I believe that what I want to compare is the instantaneous rate matrix for the two data sets. However, it seems to me that with the model in your 2006 code, I cannot compare the values I get for C to T substitutions (in the rate matrix) between two different data sets, because everything is relative to A to G substitutions. Is that correct?

Is the instantaneous rate matrix what I want? And if so, is there a way to compare it between the two datasets?

Thanks,

Marissa
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Unrestricted/non reversible models
Reply #9 - Jun 12th, 2009 at 1:01pm
 
Dear Marissa,

For any evolutionary model (reversible or not), we cannot independently estimate rates and times (all you get is evolutionary distance, which is of course the product of the two). It is therefore very difficult to compare absolute rates if you don't have any calibration information (i.e. a way to measure evolutionary time).

The best you can do without this information is compare relative rates (e.g. does one sample accumulate C->T faster, PER UNIT TIME, than the other?). For that purpose, the actual parameterization of the model does not matter very much -- all you would do is fit the NonREV model to both alignments separately to get the alternative (C->T are different), and then fit a (more complicated to set up) model where the fitting is done jointly, but C->T (i.e. CT/dynamicFreq[4]) are constrained to be the same between two alignments.

If you need pointers, I can help you modify the code to fit the null model.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Marissa
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 6
Re: Unrestricted/non reversible models
Reply #10 - Jun 15th, 2009 at 11:57am
 
Dear Sergei,

For our two data sets, we know that the evolutionary time is the same for the two data sets. Given that information, is there some way to compare the rates?

Also, here is an attempt I made at doing what you suggest (which to me sounds like calculating a joint likelihood and performing a log ratio test), modified from your 2006 code. However, I think this code isn't doing quite what I want it to. I would certainly appreciate any pointers!

[code]
/* Load data 1 */
DataSet seq1 = ReadDataFile ("Data/human/NNNA2.fa");
DataSetFilter filter1 = CreateFilter (seq1, 1);

/* Load data 2 */
DataSet seq2 = ReadDataFile ("Data/human/NNNG2.fa");
DataSetFilter filter2 = CreateFilter (seq2, 1);

/* Define general non-reversible model */
/* This file defines the transition matrix for the General Non-Reversible model
  without explicit frequency parameterization.
  01/04/2006  by Sergei L. Kosakovsky Pond
*/
global AC = 1;
global AG = 1;
global AT = 1;
global CA = 1;
global CG = 1;
global CT = 1;
global GA = 1;
global GC = 1;
global GT = 1;
global TA = 1;
global TC = 1;
global TG = 1;

function PopulateModelMatrix (ModelMatrixName&)
{
     ModelMatrixName =
                 {{*,t*AC,t,t*AT}
                  {t*CA,*,t*CG,t*CT}
                  {t*GA,t*GC,*,t*GT}
                  {t*TA,t*TC,t*TG,*}};
     return 0;
}

/* NRM = Non-Reversible Model */
NRM = 0;
MULTIPLY_BY_FREQS = PopulateModelMatrix ("NRM");

/* Calculate equilibrium frequencies */
dynamicFreqs = {4,1};
vectorOfFrequencies = {4,1};
vectorOfFrequencies [0] := computeEQF (AC,AT,CG,CT,GT,CA,GA,TA,GC,TC,TG);
vectorOfFrequencies [1] := dynamicFreqs[1];
vectorOfFrequencies [2] := dynamicFreqs[2];
vectorOfFrequencies [3] := dynamicFreqs[3];

Model GRMModel = (NRM, vectorOfFrequencies, MULTIPLY_BY_FREQS);
FREQUENCY_SENSITIVE = 0;
ACCEPT_ROOTED_TREES = 1;

/* Create two trees, one for seq1 and one for seq2.
  We set the anc branch to 0 by setting t to 0. */
Tree tree1 = (anc,der);
Tree tree2 = (anc,der);
tree1.anc.t := 0;
tree2.anc.t := 0;

/*---------- Unconstrained Model ----------*/

LikelihoodFunction theLnLik = (filter1, tree1, filter2, tree2);

Optimize (paramValues, theLnLik);
unconstrLik = paramValues[1][0];

/*----------- Constrained Model -----------*/

LikelihoodFunction theLnLik = (filter1, tree1, filter2, tree2);

/* Constraint */
tree1.der.CT := tree2.der.CT;

Optimize (paramValues, theLnLik);
constrainedLnLik = paramValues[1][0];


/* Compute and print the log ratio */
LRT = 2 (unconstrLik - constrainedLnLik);
fprintf  (stdout, "LRT: ", LRT, "\n");


/*--------------- function to compute equilibrium frequencies ---------------*/

ffunction computeEQF (dummy1,dummy2,dummy3,dummy4,dummy5,dummy6,dummy7,dummy8,dummy9,dummy10,dummy11)
{
     t = 1;
     tempMatrix = NRM;
     for (k=0; k<4; k=k+1)
     {
           tempMatrix[k][3] = 1;
     }
     tempMatrix = Inverse(tempMatrix);
     for (k=0; k<4; k=k+1)
     {
           dynamicFreqs[k] = tempMatrix[3][k];
     }
     return dynamicFreqs[0];
}
[/code]

Thank you for all the help!

Marissa
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Unrestricted/non reversible models
Reply #11 - Jun 16th, 2009 at 8:58pm
 
Dear Marissa,

All the rate parameters are global, hence they will be shared by both data sets. The idea is to clone the model definition but change all variable names to AC_2, AG_2  (for example).

Finally, because in this parameterization, the rate is actually confounded with the frequencies (ie the parameter CT is a product of the C->T rate times the frequency of the target, T), the constraint you probably want is more along the lines of:

Code:
CT_2 := CT*dynamicFreqs_2[3]/dynamicFreqs[3]
 



Give that a go and let me know if you run into problems.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Marissa
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 6
Re: Unrestricted/non reversible models
Reply #12 - Jun 18th, 2009 at 3:55pm
 
Dear Sergei,

I fixed that up and now it's working fine. Thanks!

Now that I know that the instantaneous substitution rate (in my case CT) is larger in one data set than the other, what I would also like to know is the magnitude of the difference between the C->T rates between the data sets, given we know that the evolutionary time is the same for both data sets.

In your 2006 code, it looks like I should be comparing the values in NRM (and in my fixed code, NRM_2 for the second data set). However, I did some testing with modeled data, and the values in NRM seem to fluctuate related to the ACGT content of the sequence even when the underlying substitution rates are the same, but not in any obvious way that I could simply factor out.

My question in, what do the values in NRM represent and what should I be comparing to find out the relative substitution rates between my two data sets?

Thanks in advance,

Marissa
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Unrestricted/non reversible models
Reply #13 - Jun 22nd, 2009 at 1:19pm
 
Dear Marissa,

Can you paste in your current code?

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Marissa
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 6
Re: Unrestricted/non reversible models
Reply #14 - Jun 22nd, 2009 at 1:26pm
 
Here is the code I've been using:

[code]
/* Load data 1 */
DataSet seq1 = ReadDataFile ("Data/modeled/AT1_s_1.fa");
DataSetFilter filter1 = CreateFilter (seq1, 1);

/* Load data 2 */
DataSet seq2 = ReadDataFile ("Data/human/AT2_s_1.fa");
DataSetFilter filter2 = CreateFilter (seq2, 1);

/* Define general non-reversible model */
/* This file defines the transition matrix for the General Non-Reversible model
  without explicit frequency parameterization.
  01/04/2006  by Sergei L. Kosakovsky Pond
*/

global AC = 1;
global AG = 1;
global AT = 1;
global CA = 1;
global CG = 1;
global CT = 1;
global GA = 1;
global GC = 1;
global GT = 1;
global TA = 1;
global TC = 1;
global TG = 1;

global AC_2 = 1;
global AG_2 = 1;
global AT_2 = 1;
global CA_2 = 1;
global CG_2 = 1;
global CT_2 = 1;
global GA_2 = 1;
global GC_2 = 1;
global GT_2 = 1;
global TA_2 = 1;
global TC_2 = 1;
global TG_2 = 1;


function PopulateModelMatrix (ModelMatrixName&)
{
     ModelMatrixName =
                 {{*,t*AC,t,t*AT}
                  {t*CA,*,t*CG,t*CT}
                  {t*GA,t*GC,*,t*GT}
                  {t*TA,t*TC,t*TG,*}};
     return 0;
}

function PopulateModelMatrix_2 (ModelMatrixName&)
{
     ModelMatrixName =
                 {{*, t_2*AC_2, t_2, t_2*AT_2}
                  {t_2*CA_2, *, t_2*CG_2, t_2*CT_2}
                  {t_2*GA_2, t_2*GC_2, *, t_2*GT_2}
                  {t_2*TA_2, t_2*TC_2, t_2*TG_2, *}};
     return 0;
}


/* Create two trees, one for seq1 and one for seq2.
  We set the anc branch to 0 by setting t to 0. */

/* NRM = Non-Reversible Model */
NRM = 0;
MULTIPLY_BY_FREQS = PopulateModelMatrix ("NRM");

/* Calculate equilibrium frequencies */
dynamicFreqs = {4,1};
vectorOfFrequencies = {4,1};
vectorOfFrequencies [0] := computeEQF (AC,AT,CG,CT,GT,CA,GA,TA,GC,TC,TG);
vectorOfFrequencies [1] := dynamicFreqs[1];
vectorOfFrequencies [2] := dynamicFreqs[2];
vectorOfFrequencies [3] := dynamicFreqs[3];

/* Create the model for the */
Model GRMModel = (NRM, vectorOfFrequencies, MULTIPLY_BY_FREQS);
FREQUENCY_SENSITIVE = 0;
ACCEPT_ROOTED_TREES = 1;

Tree tree1 = (anc,der);
tree1.anc.t := 0;

NRM_2 = 0;
MULTIPLY_BY_FREQS_2 = PopulateModelMatrix_2 ("NRM_2");

dynamicFreqs_2 = {4,1};
vectorOfFrequencies_2 = {4,1};
vectorOfFrequencies_2 [0] := computeEQF_2 (AC_2,AT_2,CG_2,CT_2,GT_2,CA_2,GA_2,TA_2,GC_2,TC_2,TG_2);
vectorOfFrequencies_2 [1] := dynamicFreqs_2[1];
vectorOfFrequencies_2 [2] := dynamicFreqs_2[2];
vectorOfFrequencies_2 [3] := dynamicFreqs_2[3];

Model GRMModel_2 = (NRM_2, vectorOfFrequencies_2, MULTIPLY_BY_FREQS_2);
FREQUENCY_SENSITIVE = 0;
ACCEPT_ROOTED_TREES = 1;

Tree tree2 = (anc,der);
tree2.anc.t_2 := 0;

/*---------- Unconstrained Model ----------*/

LikelihoodFunction theLnLik = (filter1, tree1, filter2, tree2);

Optimize (paramValues, theLnLik);
unconstrLik = paramValues[1][0];
unConstrainedParamCount = paramValues[1][1];
fprintf  (stdout, "unconstrained param count: ", unConstrainedParamCount,"\n");

fprintf (stdout, "NRM:\n", NRM, "\n");

/*----------- Constrained Model -----------*/

LikelihoodFunction theLnLik = (filter1, tree1, filter2, tree2);

/* Constraint */
t_2 := t*dynamicFreqs_2[2]/dynamicFreqs[2];

Optimize (paramValues, theLnLik);
constrainedLnLik = paramValues[1][0];
constrainedParamCount = paramValues[1][1];
fprintf (stdout, "constrained param count: ", constrainedParamCount, "\n");

fprintf (stdout, "NRM_2:\n", NRM_2, "\n");

/* Compute and print the log ratio */
LRT = 2 (unconstrLik - constrainedLnLik);
fprintf  (stdout, "LRT: ", LRT, "\n");


/*--------------- function to compute equilibrium frequencies ---------------*/

ffunction computeEQF (dummy1,dummy2,dummy3,dummy4,dummy5,dummy6,dummy7,dummy8,dummy9,dummy10,dummy11)
{
     t = 1;
     tempMatrix = NRM;
     for (k=0; k<4; k=k+1)
     {
           tempMatrix[k][3] = 1;
     }
     tempMatrix = Inverse(tempMatrix);
     for (k=0; k<4; k=k+1)
     {
           dynamicFreqs[k] = tempMatrix[3][k];
     }
     return dynamicFreqs[0];
}

ffunction computeEQF_2 (dummy1,dummy2,dummy3,dummy4,dummy5,dummy6,dummy7,dummy8,dummy9,dummy10,dummy11)
{
     t_2 = 1;
     tempMatrix = NRM_2;
     for (k=0; k<4; k=k+1)
     {
           tempMatrix[k][3] = 1;
     }
     tempMatrix = Inverse(tempMatrix);
     for (k=0; k<4; k=k+1)
     {
           dynamicFreqs_2[k] = tempMatrix[3][k];
     }
     return dynamicFreqs_2[0];
}
[/code]

The two data sets being loaded are modeled data with equal ACGT content (seq1) and double AT content (seq2), where the instantaneous substitution rate to any other base is 1%.

Marissa
Back to top
 
 
IP Logged
 
Pages: 1 2