Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
One tree, branches estimated across partitions (Read 27966 times)
Jose_Patane
YaBB Newbies
*
Offline



Posts: 26
Brazil
Gender: male
One tree, branches estimated across partitions
Aug 29th, 2006 at 1:21pm
 
Hi all,

This is something PAUP* does as well... but further extensions (not in the code) could be aggregated  (e.g. each partition having its own gamma and Pinvar, which PAUP* does not allow - unless the partition is the whole dataset, of course).

--------------------------------------------------------------------
/* One tree in the end: each branch based on a normalized sum of
branch lengths across different partitions. Proportional model assumed
(as in PAUP*), with partition weights estimated by ML.

Partition by codon position.

Basefreqs - estimated by ML for the whole data.
Substitution rates - estimated by ML for the whole data. */


myTopology = "some_tree";

DataSet Whole_gene = ReadDataFile("some_data_set");

/* All positions included. */
DataSetFilter All_sites = CreateFilter (Whole_gene,1);

/* Separate by codon position. */
DataSetFilter filteredData1 = CreateFilter (Whole_gene,1,"<100>");
DataSetFilter filteredData2 = CreateFilter (Whole_gene,1,"<010>");
DataSetFilter filteredData3 = CreateFilter (Whole_gene,1,"<001>");

/* A way (posted by Sergei) to
estimate frequencies from the whole dataset. */
global P1; P1 :< 1;
global P2; P2 :< 1;
global P3; P3 :< 1;

global f_A := P1;
global f_C := (1-P1) * P2;
global f_G := (1-P1) * (1-P2) * P3;
global f_T := (1-P1) * (1-P2) * (1-P3);

estFreq = {{f_A}{f_C}{f_G}{f_T}};

/* Absolute rates. */
global AC;
global AG;
global AT;
global CG;
global CT;
global GT;

/* Proportional model of branch length estimation across data partitions. */
global R;
global S;

/* Each partition is weighted accordingly. */
global codon_1st := All_sites.sites / (filteredData1.sites + R * filteredData2.sites + S * filteredData3.sites );
global codon_2nd := R * codon_1st;
global codon_3rd := S * codon_1st;

REVRateMatrix_1 = {{  *     ,AC*m,AG*m,AT*m}
                               {AC*m,    *   ,CG*m,CT*m}
                               {AG*m,CG*m,   *   ,GT*m}
                               {AT*m,CT*m,GT*m,    *   }};

Model REV_1 = (REVRateMatrix_1, estFreq);

/*Partition analysis*/
UseModel(REV_1);
Tree myTree_1 = myTopology;
Tree myTree_2 = myTopology;
Tree myTree_3 = myTopology;
Tree myTree_4 = myTopology;

/* Tree2.taxon.branch = Tree1.taxon.branch x K (based on another hint by Sergei) */
ReplicateConstraint ("this1.?.m := this2.?.m*R", myTree_2, myTree_1);
ReplicateConstraint ("this1.?.m := this2.?.m*S", myTree_3, myTree_1);

LikelihoodFunction theLikFun = (filteredData1, myTree_1, filteredData2, myTree_2, filteredData3, myTree_3);
Optimize(results,theLikFun);

fprintf(stdout,theLikFun);

/* Not really necessary, just normalizing so that mean rate=1 */
sum_of_rates = AC + AG + AT + CG + CT + GT;

r1 = AC/sum_of_rates;
r2 = AG/sum_of_rates;
r3 = AT/sum_of_rates; 
r4 = CG/sum_of_rates;
r5 = CT/sum_of_rates; 
r6 = GT/sum_of_rates;

fprintf(stdout,"\nAC = ", r1);
fprintf(stdout,"\nAG = ", r2);
fprintf(stdout,"\nAT = ", r3);
fprintf(stdout,"\nCG = ", r4);
fprintf(stdout,"\nCT = ", r5);
fprintf(stdout,"\nGT = ", r6);

fprintf(stdout,"\n\n");


/* Sum of estimated branch lengths for each partition,
   normalized by the sum of each partition's weight.  */
ReplicateConstraint ("this1.?.m := (this2.?.m + this3.? + this4.?)/(codon_1st + codon_2nd + codon_3rd)",
                     myTree_4, myTree_1, myTree_2, myTree_3);

--------------------------------------------------------------------------------
-------
Code tested with 2 datasets, both giving the same branch lengths as estimated running PAUP*.
Back to top
« Last Edit: Sep 14th, 2006 at 8:25am by Jose_Patane »  
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: One tree, branches estimated across partitions
Reply #1 - Aug 29th, 2006 at 4:23pm
 
Dear Jose,

Thanks for yet another excellent contribution.

Best,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jose_Patane
YaBB Newbies
*
Offline



Posts: 26
Brazil
Gender: male
Re: One tree, branches estimated across partitions
Reply #2 - Aug 30th, 2006 at 11:46am
 
Hmmm... hey, Sergei, after reading your reply on "ML basefreqs for each partition", I started having this one doubt: in the model I specified above, it now seems to me that I'm estimating global ML estimates of both basefreqs and rates based on only 1/3 of the data (all the 1st codon positions)... is that right? I'm asking that because formerly I thought that such parameters were global in the sense that they were based on the whole dataset, which doesn't seem to be the case (to be more specific, I assigned 'filteredData1' to be estimated on myTree_1, instead of 'All_sites'). Is that right? If so, my conception about the proportional model was a bit mistaken (although, in the code above, it's properly coded, for it matches PAUP estimates).

Now, being even more specific: in both ways (ML whole, or ML of just a part), the degrees of freedom would be the same, for freqs and rates are estimated just once. But the precision and/or branch lengths, in the end, could be different... right ???
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: One tree, branches estimated across partitions
Reply #3 - Aug 30th, 2006 at 12:18pm
 
Dear Jose,

In your example, all global parameters (rates and base frequencies) will be estimated jointly from the three partitions (as expected). The distinction between 'global' and 'local' relates only to the tree. For example, if your model depends on a global parameter 'R' and a local parameter 't', then when you assign it to a tree Tree T = (1,2,3) - as an example - HyPhy will keep one copy of 'R' for all branches in the tree, and create a branch specific (i.e. local) parameter for every branch in the tree: T.1.t, T.2.t and T.3.t.

In your example, all global parameters affect each partition - hence the likelihood function estimation is joint. Note, how in my example of partition specific frequencies, I renamed global variables to be different among partitions (or to be more precise, among models attached to partitions), to force HyPhy to estimate these parameters separately.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jose_Patane
YaBB Newbies
*
Offline



Posts: 26
Brazil
Gender: male
Re: One tree, branches estimated across partitions
Reply #4 - Aug 30th, 2006 at 1:01pm
 
Once more, thanks, now I think I got the idea!
Back to top
 
 
IP Logged