HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> Non-reversible, branch length scaling
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1243615017

Message started by sn on May 29th, 2009 at 9:36am

Title: Non-reversible, branch length scaling
Post by sn on May 29th, 2009 at 9:36am
Hi Sergei,

Based on various posts in the archives I have made a .bf to do the following:
On a tree of 4 species, estimate a distinct general non-reversible model (including ML nucleotide freqs) for each of the two main clades.
Then use the resulting trees (incl subst models and freqs) on a different dataset, only allowing one global scaling parameter on the branch lengths.

Would you mind taking a look at the .bf below, to see if it does this correctly?

Is it necessary to specify the nuc. freqs. at the root, or are they included in the dynamic freq estimates?
Also, is it important which of the subst rates are used to scale the others?

I have included the line
baseSetPrefix = "$BASESET:\"ACGT\"\n$TOKEN:\"N\"=\"\"\n$TOKEN:\"-\"=\"\"\n";
to avoid including gaps in the base frequency estimates, but what exactly does this do? Does it affect only base frequency estimates, or also the way gaps are evaluated in the likelihood?

Finally (sorry to be barraging you with questions), does Hyphy have an easy way to filter based on fraction of gap-columns in an alignment?

Thanks for your time,
sn

[code]
/* Create two general non-reversible models (one for each main clade)
  Estimate models (rate-params, freqs, branch lengths).
  Then apply the estimated tree to a new dataset, allowing only a global
  scaling parameter to change.
  (In the end it should be possible to use scaling params estimated on
  several datasets (from the same reference tree) to compare tree lengths,
  e.g. Sa / Sb = 0.8).
*/

ACCEPT_ROOTED_TREES = 1;
MAXIMUM_ITERATIONS_PER_VARIABLE = 10000;

/*Does this affect the likelihood procedure, or only the base freqs?*/
baseSetPrefix = "$BASESET:\"ACGT\"\n$TOKEN:\"N\"=\"\"\n$TOKEN:\"-\"=\"\"\n";


/* ------- Estimate the first tree ------------ */
fscanf (data1.mfa,"Raw",rawData);
DataSet aln = ReadFromString (baseSetPrefix+rawData);
DataSetFilter  mydat = CreateFilter (aln,1);

/*Get the nonrev matrix and specify dynamic frequencies.
The files below are copies of 'NRM+Freqs.mdl', with '1' or '2' appended to
variable and function names, to be able to have two distinct models.
*/
#include "NRM_freqs1.mdl"; /* creates GRMModel1 and vectorOfFrequencies1*/
#include "NRM_freqs2.mdl"; /* creates GRMModel2 and vectorOfFrequencies2*/

Tree myTree = "((A{GRMModel1},B{GRMModel1}){GRMModel1},(C{GRMModel2},D{GRMModel2}){GRMModel2})";

/*Uniform distro at root*/
rootFreqs = {{0.25} {0.25}{0.25}{0.25}};
LikelihoodFunction3  myLf = (mydat, myTree,rootFreqs);

/* Or are root freqs taken into account by the dynamic freqs? */
/*LikelihoodFunction  myLf = (mydat, myTree);*/

Optimize (myLfParams, myLf);
fprintf  (stdout, myLf);
fprintf  (stdout, "\n----------\n");

/*------------- Scale tree on the second data-set ----------------*/
fscanf (data2.mfa,"Raw",rawData);
DataSet aln2 = ReadFromString (baseSetPrefix+rawData);
DataSetFilter  mydat2 = CreateFilter (aln2,1);

Tree myTree2 = "((A{GRMModel1},B{GRMModel1}){GRMModel1},(C{GRMModel2},D{GRMModel2}){GRMModel2})";
global S;

/* Make trees identical, except scaling on t */
ReplicateConstraint ("this1.?.?:=this2.?.?__", myTree2,myTree);
ClearConstraints(myTree2.A.t,myTree2.B.t,myTree2.Node1.t,myTree2.C.t,myTree2.D.t,myTree2.Node4.t);
ReplicateConstraint ("this1.?.t:=S*this2.?.t__", myTree2,myTree);

LikelihoodFunction3  myLf2 = (mydat2, myTree2,rootFreqs);

Optimize (myLfParams2, myLf2);
fprintf  (stdout, myLf2);
fprintf  (stdout, "Scaling: ",S,"\n");

[/code]

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.