Hi again, Sergi,
getting there, slowly
I have a couple more questions - put them all together than a bunch of separate posts, sorry for the long mail.
There are a couple of things I'm trying to do.
Firstly, I'd like to evaluate the likelihood of some binary data for a given topology and branch length dataset, and a given forward and backward rate parameter. The idea here is to sample a range of different forward and reverse rate parameters to generate a plot of the likelihood surface associated with the parameters.
(I've tried running without the $BASESET:01 statement, but it seems to have trouble reading the data in that case - also if I use fasta format).
This seems to work reasonably, although I don't understand why, after setting new values of forwardRate and backwardRate, the branch length parameters for the tree change by some scaling factor. I think this is something to do with "t" in the binary matrix (not clear to me what it's for...). Looks like I'm missing something fundamental here, would be great if you could point me to a part of the HyPhy docs that I could read through to get to grips with this better.
Code:dss = "$BASESET:01\n#a\n10111\n#b\n10101\n#c\n01100\n#d\n11100";
DataSet myData = ReadFromString(dss);
DataSetFilter myFilter = CreateFilter (myData,1);
ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES = 1;
global forwardRate = 1;
global backwardRate = 1;
binaryQ = {{*, forwardRate*t}
{backwardRate*t,*}};
eqFrequencies = {{backwardRate/(forwardRate+ backwardRate), forwardRate/(forwardRate+ backwardRate)}};
Model nonRevBinary = (binaryQ, eqFrequencies,0);
Tree myTree = "(((a:1,b:3):1,c:1):1,d:1);";
LikelihoodFunction lf = (myFilter,myTree);
LIKELIHOOD_FUNCTION_OUTPUT=5;
fprintf (stdout, "\n",lf);
LIKELIHOOD_FUNCTION_OUTPUT=2;
fprintf (stdout,"\n", lf);
forwardRate = 0.9;
backwardRate = 0.4;
LIKELIHOOD_FUNCTION_OUTPUT=5;
fprintf (stdout,"\n", lf);
LIKELIHOOD_FUNCTION_OUTPUT=2;
fprintf (stdout,"\n", lf);
Secondly, I'd like to run this analysis, but optimising forward and backward rates - again, this seems to work reasonably, again for fixed topology and branch lengths. (I want to do this to check that the analysis where I explore the values myself finds similar/the same likelihood and parameter values as when the optimisation in HyPhy is run)
Again, this seems to work reasonably well, although I have again this issue with the scaling that I don't get -
Code:fprintf (stdout,"\n\nOptimise rates but not branch lengths");
dss = "$BASESET:01\n#a\n10111\n#b\n10101\n#c\n01100\n#d\n11100";
DataSet myData = ReadFromString(dss);
DataSetFilter myFilter = CreateFilter (myData,1);
ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES = 1;
global forwardRate = 1;
global backwardRate = 1;
binaryQ = {{*, forwardRate*t}
{backwardRate*t,*}};
eqFrequencies = {{backwardRate/(forwardRate+ backwardRate), forwardRate/(forwardRate+ backwardRate)}};
UseModel (USE_NO_MODEL);
Tree dummy = "(((a:1,b:3):1,c:1):1,d:1);"; /* NB it has to be a Tree, not a Topology object */
branchNames = BranchName (dummy,-1); /* populate a vector of branch names; post order traversal */
branchLengthsNow = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */
Model nonRevBinary = (binaryQ, eqFrequencies,0);
Tree myTree = "(((a:1,b:3):1,c:1):1,d:1);";
for (k = 0; k < Columns (branchNames)-1; k=k+1)
{
ExecuteCommands ("myTree."+branchNames[k]+".t:="+branchLengthsNow[k]+";");
}
LikelihoodFunction lf = (myFilter,myTree);
Optimize (res,lf);
LIKELIHOOD_FUNCTION_OUTPUT=5;
fprintf (stdout, "\n",lf);
LIKELIHOOD_FUNCTION_OUTPUT=2;
fprintf (stdout,"\n", lf);
Thirdly, I'd like to run the analysis using a second binary model, attached to some of the internal branches (the idea here is to attache this model to internal branches that are spawned after a duplication event - will allowing these two extra parameters [or perhaps just one extra parameter that scales the non-post-duplication-matrix by a constant] provide a significantly better fit to the data using a likelihood ratio test. I think I need to set the equilibrium frequencies for the two models equal - I've tried to do that in this script, but not sure I've done it properly, just wanted to ask how it looked to you
Code:fprintf (stdout,"\n\nOptimise rates but not branch lengths, use different rate matrices for different branches");
dss = "$BASESET:01\n#a\n10111\n#b\n10101\n#c\n01100\n#d\n11100\n#e\n11100\n#f\n11001";
DataSet myData = ReadFromString(dss);
DataSetFilter myFilter = CreateFilter (myData,1);
ACCEPT_BRANCH_LENGTHS = 1;
ACCEPT_ROOTED_TREES = 1;
global forwardRateA = 1;
global backwardRateA = 1;
binaryQ_A = {{*, forwardRateA*t}
{backwardRateA*t,*}};
/* I need to constrain the two equilibrium frequencies to be the same, so I get them from just one of the two matrices...*/
eqFrequencies = {{backwardRateA/(forwardRateA+ backwardRateA), forwardRateA/(forwardRateA+ backwardRateA)}};
global forwardRateB = 1;
global backwardRateB = 1;
binaryQ_B = {{*, forwardRateB*t}
{backwardRateB*t,*}};
UseModel (USE_NO_MODEL);
Tree dummy = "((((a:1{nonRevBinaryA},b:1{nonRevBinaryA}):0.2{nonRevBinaryB},(c:1{nonRevBinaryA},d:1{nonRevBinaryA}):0.2{nonRevBinaryB}):1{nonRevBinaryA},e:1.5{nonRevBinaryA}):1{nonRevBinaryA},f:2{nonRevBinaryA});"; /* NB it has to be a Tree, not a Topology object */
branchNames = BranchName (dummy,-1); /* populate a vector of branch names; post order traversal */
branchLengthsNow = BranchLength (dummy,-1); /* populate a vector of branch lengths; post order traversal */
Model nonRevBinaryA = (binaryQ_A, eqFrequencies,0);
Model nonRevBinaryB = (binaryQ_B, eqFrequencies,0);
Tree myTree = ""+dummy;
for (k = 0; k < Columns (branchNames)-1; k=k+1)
{
ExecuteCommands ("myTree."+branchNames[k]+".t:="+branchLengthsNow[k]+";");
}
LikelihoodFunction lf = (myFilter,myTree);
Optimize (res,lf);
LIKELIHOOD_FUNCTION_OUTPUT=5;
fprintf (stdout, "\n",lf);
LIKELIHOOD_FUNCTION_OUTPUT=2;
fprintf (stdout,"\n", lf);
Any help or feedback much appreciated - and thanks again for all your help so far.
Aidan