sn
YaBB Newbies
Offline
Feed your monkey!
Posts: 5
|
Hi Sergei,
Based on various posts in the archives I have made a .bf to do the following: On a tree of 4 species, estimate a distinct general non-reversible model (including ML nucleotide freqs) for each of the two main clades. Then use the resulting trees (incl subst models and freqs) on a different dataset, only allowing one global scaling parameter on the branch lengths.
Would you mind taking a look at the .bf below, to see if it does this correctly?
Is it necessary to specify the nuc. freqs. at the root, or are they included in the dynamic freq estimates? Also, is it important which of the subst rates are used to scale the others?
I have included the line baseSetPrefix = "$BASESET:\"ACGT\"\n$TOKEN:\"N\"=\"\"\n$TOKEN:\"-\"=\"\"\n"; to avoid including gaps in the base frequency estimates, but what exactly does this do? Does it affect only base frequency estimates, or also the way gaps are evaluated in the likelihood?
Finally (sorry to be barraging you with questions), does Hyphy have an easy way to filter based on fraction of gap-columns in an alignment?
Thanks for your time, sn
[code] /* Create two general non-reversible models (one for each main clade) Estimate models (rate-params, freqs, branch lengths). Then apply the estimated tree to a new dataset, allowing only a global scaling parameter to change. (In the end it should be possible to use scaling params estimated on several datasets (from the same reference tree) to compare tree lengths, e.g. Sa / Sb = 0.8). */
ACCEPT_ROOTED_TREES = 1; MAXIMUM_ITERATIONS_PER_VARIABLE = 10000;
/*Does this affect the likelihood procedure, or only the base freqs?*/ baseSetPrefix = "$BASESET:\"ACGT\"\n$TOKEN:\"N\"=\"\"\n$TOKEN:\"-\"=\"\"\n";
/* ------- Estimate the first tree ------------ */ fscanf (data1.mfa,"Raw",rawData); DataSet aln = ReadFromString (baseSetPrefix+rawData); DataSetFilter mydat = CreateFilter (aln,1);
/*Get the nonrev matrix and specify dynamic frequencies. The files below are copies of 'NRM+Freqs.mdl', with '1' or '2' appended to variable and function names, to be able to have two distinct models. */ #include "NRM_freqs1.mdl"; /* creates GRMModel1 and vectorOfFrequencies1*/ #include "NRM_freqs2.mdl"; /* creates GRMModel2 and vectorOfFrequencies2*/
Tree myTree = "((A{GRMModel1},B{GRMModel1}){GRMModel1},(C{GRMModel2},D{GRMModel2}){GRMModel2})";
/*Uniform distro at root*/ rootFreqs = {{0.25} {0.25}{0.25}{0.25}}; LikelihoodFunction3 myLf = (mydat, myTree,rootFreqs);
/* Or are root freqs taken into account by the dynamic freqs? */ /*LikelihoodFunction myLf = (mydat, myTree);*/
Optimize (myLfParams, myLf); fprintf (stdout, myLf); fprintf (stdout, "\n----------\n");
/*------------- Scale tree on the second data-set ----------------*/ fscanf (data2.mfa,"Raw",rawData); DataSet aln2 = ReadFromString (baseSetPrefix+rawData); DataSetFilter mydat2 = CreateFilter (aln2,1);
Tree myTree2 = "((A{GRMModel1},B{GRMModel1}){GRMModel1},(C{GRMModel2},D{GRMModel2}){GRMModel2})"; global S;
/* Make trees identical, except scaling on t */ ReplicateConstraint ("this1.?.?:=this2.?.?__", myTree2,myTree); ClearConstraints(myTree2.A.t,myTree2.B.t,myTree2.Node1.t,myTree2.C.t,myTree2.D.t,myTree2.Node4.t); ReplicateConstraint ("this1.?.t:=S*this2.?.t__", myTree2,myTree);
LikelihoodFunction3 myLf2 = (mydat2, myTree2,rootFreqs);
Optimize (myLfParams2, myLf2); fprintf (stdout, myLf2); fprintf (stdout, "Scaling: ",S,"\n");
[/code]
|