HyPhy message board - Non-reversible, branch length scaling

	Welcome, Guest. Please Login

Home

Help

HyPhy message board › Methodology Questions › How to › Non-reversible, branch length scaling

(Moderators: Sergei, Simon)

‹ Previous Topic | Next Topic ›

Pages: 1

Send Topic

Non-reversible, branch length scaling (Read 1043 times)

YaBB Newbies

Offline

Feed your monkey!

Posts: 5

Non-reversible, branch length scaling
May 29^th, 2009 at 9:36am

Hi Sergei,

Based on various posts in the archives I have made a .bf to do the following:
On a tree of 4 species, estimate a distinct general non-reversible model (including ML nucleotide freqs) for each of the two main clades.
Then use the resulting trees (incl subst models and freqs) on a different dataset, only allowing one global scaling parameter on the branch lengths.

Would you mind taking a look at the .bf below, to see if it does this correctly?

Is it necessary to specify the nuc. freqs. at the root, or are they included in the dynamic freq estimates?
Also, is it important which of the subst rates are used to scale the others?

I have included the line
baseSetPrefix = "$BASESET:\"ACGT\"\n$TOKEN:\"N\"=\"\"\n$TOKEN:\"-\"=\"\"\n";
to avoid including gaps in the base frequency estimates, but what exactly does this do? Does it affect only base frequency estimates, or also the way gaps are evaluated in the likelihood?

Finally (sorry to be barraging you with questions), does Hyphy have an easy way to filter based on fraction of gap-columns in an alignment?

Thanks for your time,
sn

[code]
/* Create two general non-reversible models (one for each main clade)
Estimate models (rate-params, freqs, branch lengths).
Then apply the estimated tree to a new dataset, allowing only a global
scaling parameter to change.
(In the end it should be possible to use scaling params estimated on
several datasets (from the same reference tree) to compare tree lengths,
e.g. Sa / Sb = 0.8).
*/

ACCEPT_ROOTED_TREES = 1;
MAXIMUM_ITERATIONS_PER_VARIABLE = 10000;

/*Does this affect the likelihood procedure, or only the base freqs?*/
baseSetPrefix = "$BASESET:\"ACGT\"\n$TOKEN:\"N\"=\"\"\n$TOKEN:\"-\"=\"\"\n";

/* ------- Estimate the first tree ------------ */
fscanf (data1.mfa,"Raw",rawData);
DataSet aln = ReadFromString (baseSetPrefix+rawData);
DataSetFilter mydat = CreateFilter (aln,1);

/*Get the nonrev matrix and specify dynamic frequencies.
The files below are copies of 'NRM+Freqs.mdl', with '1' or '2' appended to
variable and function names, to be able to have two distinct models.
*/
#include "NRM_freqs1.mdl"; /* creates GRMModel1 and vectorOfFrequencies1*/
#include "NRM_freqs2.mdl"; /* creates GRMModel2 and vectorOfFrequencies2*/

Tree myTree = "((A{GRMModel1},B{GRMModel1}){GRMModel1},(C{GRMModel2},D{GRMModel2}){GRMModel2})";

/*Uniform distro at root*/
rootFreqs = {{0.25} {0.25}{0.25}{0.25}};
LikelihoodFunction3 myLf = (mydat, myTree,rootFreqs);

/* Or are root freqs taken into account by the dynamic freqs? */
/*LikelihoodFunction myLf = (mydat, myTree);*/

Optimize (myLfParams, myLf);
fprintf (stdout, myLf);
fprintf (stdout, "\n----------\n");

/*------------- Scale tree on the second data-set ----------------*/
fscanf (data2.mfa,"Raw",rawData);
DataSet aln2 = ReadFromString (baseSetPrefix+rawData);
DataSetFilter mydat2 = CreateFilter (aln2,1);

Tree myTree2 = "((A{GRMModel1},B{GRMModel1}){GRMModel1},(C{GRMModel2},D{GRMModel2}){GRMModel2})";
global S;

/* Make trees identical, except scaling on t */
ReplicateConstraint ("this1.?.?:=this2.?.?__", myTree2,myTree);
ClearConstraints(myTree2.A.t,myTree2.B.t,myTree2.Node1.t,myTree2.C.t,myTree2.D.t,myTree2.Node4.t);
ReplicateConstraint ("this1.?.t:=S*this2.?.t__", myTree2,myTree);

LikelihoodFunction3 myLf2 = (mydat2, myTree2,rootFreqs);

Optimize (myLfParams2, myLf2);
fprintf (stdout, myLf2);
fprintf (stdout, "Scaling: ",S,"\n");

[/code]

IP Logged

Pages: 1

Send Topic

‹ Previous Topic | Next Topic ›

« Home

‹ Board

Top of this page