HyPhy message board
HYPHY Package >> Contributions >> One tree, branches estimated across partitions

Message started by Jose_Patane on Aug 29th, 2006 at 1:21pm

Title: One tree, branches estimated across partitions
Post by Jose_Patane on Aug 29th, 2006 at 1:21pm
Hi all,

This is something PAUP* does as well... but further extensions (not in the code) could be aggregated  (e.g. each partition having its own gamma and Pinvar, which PAUP* does not allow - unless the partition is the whole dataset, of course).

/* One tree in the end: each branch based on a normalized sum of
branch lengths across different partitions. Proportional model assumed
(as in PAUP*), with partition weights estimated by ML.

Partition by codon position.

Basefreqs - estimated by ML for the whole data.
Substitution rates - estimated by ML for the whole data. */

myTopology = "some_tree";

DataSet Whole_gene = ReadDataFile("some_data_set");

/* All positions included. */
DataSetFilter All_sites = CreateFilter (Whole_gene,1);

/* Separate by codon position. */
DataSetFilter filteredData1 = CreateFilter (Whole_gene,1,"<100>");
DataSetFilter filteredData2 = CreateFilter (Whole_gene,1,"<010>");
DataSetFilter filteredData3 = CreateFilter (Whole_gene,1,"<001>");

/* A way (posted by Sergei) to
estimate frequencies from the whole dataset. */
global P1; P1 :< 1;
global P2; P2 :< 1;
global P3; P3 :< 1;

global f_A := P1;
global f_C := (1-P1) * P2;
global f_G := (1-P1) * (1-P2) * P3;
global f_T := (1-P1) * (1-P2) * (1-P3);

estFreq = [f_A}{f_C}{f_G}{f_T];

/* Absolute rates. */
global AC;
global AG;
global AT;
global CG;
global CT;
global GT;

/* Proportional model of branch length estimation across data partitions. */
global R;
global S;

/* Each partition is weighted accordingly. */
global codon_1st := All_sites.sites / (filteredData1.sites + R * filteredData2.sites + S * filteredData3.sites );
global codon_2nd := R * codon_1st;
global codon_3rd := S * codon_1st;

REVRateMatrix_1 = [  *     ,AC*m,AG*m,AT*m}
                               {AC*m,    *   ,CG*m,CT*m}
                               {AG*m,CG*m,   *   ,GT*m}
                               {AT*m,CT*m,GT*m,    *   ];

Model REV_1 = (REVRateMatrix_1, estFreq);

/*Partition analysis*/
Tree myTree_1 = myTopology;
Tree myTree_2 = myTopology;
Tree myTree_3 = myTopology;
Tree myTree_4 = myTopology;

/* Tree2.taxon.branch = Tree1.taxon.branch x K (based on another hint by Sergei) */
ReplicateConstraint ("this1.?.m := this2.?.m*R", myTree_2, myTree_1);
ReplicateConstraint ("this1.?.m := this2.?.m*S", myTree_3, myTree_1);

LikelihoodFunction theLikFun = (filteredData1, myTree_1, filteredData2, myTree_2, filteredData3, myTree_3);


/* Not really necessary, just normalizing so that mean rate=1 */
sum_of_rates = AC + AG + AT + CG + CT + GT;

r1 = AC/sum_of_rates;
r2 = AG/sum_of_rates;
r3 = AT/sum_of_rates;  
r4 = CG/sum_of_rates;
r5 = CT/sum_of_rates;  
r6 = GT/sum_of_rates;

fprintf(stdout,"\nAC = ", r1);
fprintf(stdout,"\nAG = ", r2);
fprintf(stdout,"\nAT = ", r3);
fprintf(stdout,"\nCG = ", r4);
fprintf(stdout,"\nCT = ", r5);
fprintf(stdout,"\nGT = ", r6);


/* Sum of estimated branch lengths for each partition,
  normalized by the sum of each partition's weight.  */
ReplicateConstraint ("this1.?.m := (this2.?.m + this3.? + this4.?)/(codon_1st + codon_2nd + codon_3rd)",
                     myTree_4, myTree_1, myTree_2, myTree_3);

Code tested with 2 datasets, both giving the same branch lengths as estimated running PAUP*.

Title: Re: One tree, branches estimated across partitions
Post by Sergei on Aug 29th, 2006 at 4:23pm
Dear Jose,

Thanks for yet another excellent contribution.


Title: Re: One tree, branches estimated across partitions
Post by Jose_Patane on Aug 30th, 2006 at 11:46am
Hmmm... hey, Sergei, after reading your reply on "ML basefreqs for each partition", I started having this one doubt: in the model I specified above, it now seems to me that I'm estimating global ML estimates of both basefreqs and rates based on only 1/3 of the data (all the 1st codon positions)... is that right? I'm asking that because formerly I thought that such parameters were global in the sense that they were based on the whole dataset, which doesn't seem to be the case (to be more specific, I assigned 'filteredData1' to be estimated on myTree_1, instead of 'All_sites'). Is that right? If so, my conception about the proportional model was a bit mistaken (although, in the code above, it's properly coded, for it matches PAUP estimates).

Now, being even more specific: in both ways (ML whole, or ML of just a part), the degrees of freedom would be the same, for freqs and rates are estimated just once. But the precision and/or branch lengths, in the end, could be different... right ???

Title: Re: One tree, branches estimated across partitions
Post by Sergei on Aug 30th, 2006 at 12:18pm
Dear Jose,

In your example, all global parameters (rates and base frequencies) will be estimated jointly from the three partitions (as expected). The distinction between 'global' and 'local' relates only to the tree. For example, if your model depends on a global parameter 'R' and a local parameter 't', then when you assign it to a tree Tree T = (1,2,3) - as an example - HyPhy will keep one copy of 'R' for all branches in the tree, and create a branch specific (i.e. local) parameter for every branch in the tree: T.1.t, T.2.t and T.3.t.

In your example, all global parameters affect each partition - hence the likelihood function estimation is joint. Note, how in my example of partition specific frequencies, I renamed global variables to be different among partitions (or to be more precise, among models attached to partitions), to force HyPhy to estimate these parameters separately.


Title: Re: One tree, branches estimated across partitions
Post by Jose_Patane on Aug 30th, 2006 at 1:01pm
Once more, thanks, now I think I got the idea!

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2020. All Rights Reserved.