Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Non-reversible, branch length scaling (Read 1044 times)
sn
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 5
Non-reversible, branch length scaling
May 29th, 2009 at 9:36am
 
Hi Sergei,

Based on various posts in the archives I have made a .bf to do the following:
On a tree of 4 species, estimate a distinct general non-reversible model (including ML nucleotide freqs) for each of the two main clades.
Then use the resulting trees (incl subst models and freqs) on a different dataset, only allowing one global scaling parameter on the branch lengths.

Would you mind taking a look at the .bf below, to see if it does this correctly?

Is it necessary to specify the nuc. freqs. at the root, or are they included in the dynamic freq estimates?
Also, is it important which of the subst rates are used to scale the others?

I have included the line
baseSetPrefix = "$BASESET:\"ACGT\"\n$TOKEN:\"N\"=\"\"\n$TOKEN:\"-\"=\"\"\n";
to avoid including gaps in the base frequency estimates, but what exactly does this do? Does it affect only base frequency estimates, or also the way gaps are evaluated in the likelihood?

Finally (sorry to be barraging you with questions), does Hyphy have an easy way to filter based on fraction of gap-columns in an alignment?

Thanks for your time,
sn

[code]
/* Create two general non-reversible models (one for each main clade)
   Estimate models (rate-params, freqs, branch lengths).
   Then apply the estimated tree to a new dataset, allowing only a global
   scaling parameter to change.
   (In the end it should be possible to use scaling params estimated on
   several datasets (from the same reference tree) to compare tree lengths,
   e.g. Sa / Sb = 0.8).
*/

ACCEPT_ROOTED_TREES = 1;
MAXIMUM_ITERATIONS_PER_VARIABLE = 10000;

/*Does this affect the likelihood procedure, or only the base freqs?*/
baseSetPrefix = "$BASESET:\"ACGT\"\n$TOKEN:\"N\"=\"\"\n$TOKEN:\"-\"=\"\"\n";


/* ------- Estimate the first tree ------------ */
fscanf (data1.mfa,"Raw",rawData);
DataSet aln = ReadFromString (baseSetPrefix+rawData);
DataSetFilter  mydat = CreateFilter (aln,1);

/*Get the nonrev matrix and specify dynamic frequencies.
The files below are copies of 'NRM+Freqs.mdl', with '1' or '2' appended to
variable and function names, to be able to have two distinct models.
*/
#include "NRM_freqs1.mdl"; /* creates GRMModel1 and vectorOfFrequencies1*/
#include "NRM_freqs2.mdl"; /* creates GRMModel2 and vectorOfFrequencies2*/

Tree myTree = "((A{GRMModel1},B{GRMModel1}){GRMModel1},(C{GRMModel2},D{GRMModel2}){GRMModel2})";

/*Uniform distro at root*/
rootFreqs = {{0.25} {0.25}{0.25}{0.25}};
LikelihoodFunction3  myLf = (mydat, myTree,rootFreqs);

/* Or are root freqs taken into account by the dynamic freqs? */
/*LikelihoodFunction  myLf = (mydat, myTree);*/

Optimize (myLfParams, myLf);
fprintf  (stdout, myLf);
fprintf  (stdout, "\n----------\n");

/*------------- Scale tree on the second data-set ----------------*/
fscanf (data2.mfa,"Raw",rawData);
DataSet aln2 = ReadFromString (baseSetPrefix+rawData);
DataSetFilter  mydat2 = CreateFilter (aln2,1);

Tree myTree2 = "((A{GRMModel1},B{GRMModel1}){GRMModel1},(C{GRMModel2},D{GRMModel2}){GRMModel2})";
global S;

/* Make trees identical, except scaling on t */
ReplicateConstraint ("this1.?.?:=this2.?.?__", myTree2,myTree);
ClearConstraints(myTree2.A.t,myTree2.B.t,myTree2.Node1.t,myTree2.C.t,myTree2.D.t,myTree2.Node4.t);
ReplicateConstraint ("this1.?.t:=S*this2.?.t__", myTree2,myTree);

LikelihoodFunction3  myLf2 = (mydat2, myTree2,rootFreqs);

Optimize (myLfParams2, myLf2);
fprintf  (stdout, myLf2);
fprintf  (stdout, "Scaling: ",S,"\n");

[/code]
Back to top
 
 
IP Logged