HyPhy message board - One tree, branches estimated across partitions

	Welcome, Guest. Please Login

Home

Help

HyPhy message board › HYPHY Package › Contributions › One tree, branches estimated across partitions

(Moderators: Sergei, Simon)

‹ Previous Topic | Next Topic ›

Pages: 1

Send Topic

One tree, branches estimated across partitions (Read 4230 times)

Jose_Patane

YaBB Newbies

Offline

Posts: 26
Brazil
Gender: male

One tree, branches estimated across partitions
Aug 29^th, 2006 at 1:21pm

Hi all,

This is something PAUP* does as well... but further extensions (not in the code) could be aggregated (e.g. each partition having its own gamma and Pinvar, which PAUP* does not allow - unless the partition is the whole dataset, of course).

--------------------------------------------------------------------
/* One tree in the end: each branch based on a normalized sum of
branch lengths across different partitions. Proportional model assumed
(as in PAUP*), with partition weights estimated by ML.

Partition by codon position.

Basefreqs - estimated by ML for the whole data.
Substitution rates - estimated by ML for the whole data. */

myTopology = "some_tree";

DataSet Whole_gene = ReadDataFile("some_data_set");

/* All positions included. */
DataSetFilter All_sites = CreateFilter (Whole_gene,1);

/* Separate by codon position. */
DataSetFilter filteredData1 = CreateFilter (Whole_gene,1,"<100>");
DataSetFilter filteredData2 = CreateFilter (Whole_gene,1,"<010>");
DataSetFilter filteredData3 = CreateFilter (Whole_gene,1,"<001>");

/* A way (posted by Sergei) to
estimate frequencies from the whole dataset. */
global P1; P1 :< 1;
global P2; P2 :< 1;
global P3; P3 :< 1;

global f_A := P1;
global f_C := (1-P1) * P2;
global f_G := (1-P1) * (1-P2) * P3;
global f_T := (1-P1) * (1-P2) * (1-P3);

estFreq = {{f_A}{f_C}{f_G}{f_T}};

/* Absolute rates. */
global AC;
global AG;
global AT;
global CG;
global CT;
global GT;

/* Proportional model of branch length estimation across data partitions. */
global R;
global S;

/* Each partition is weighted accordingly. */
global codon_1st := All_sites.sites / (filteredData1.sites + R * filteredData2.sites + S * filteredData3.sites );
global codon_2nd := R * codon_1st;
global codon_3rd := S * codon_1st;

REVRateMatrix_1 = {{ *    ,AC*m,AG*m,AT*m}
   {AC*m, * ,CG*m,CT*m}
   {AG*m,CG*m, * ,GT*m}
   {AT*m,CT*m,GT*m, * }};

Model REV_1 = (REVRateMatrix_1, estFreq);

/*Partition analysis*/
UseModel(REV_1);
Tree myTree_1 = myTopology;
Tree myTree_2 = myTopology;
Tree myTree_3 = myTopology;
Tree myTree_4 = myTopology;

/* Tree2.taxon.branch = Tree1.taxon.branch x K (based on another hint by Sergei) */
ReplicateConstraint ("this1.?.m := this2.?.m*R", myTree_2, myTree_1);
ReplicateConstraint ("this1.?.m := this2.?.m*S", myTree_3, myTree_1);

LikelihoodFunction theLikFun = (filteredData1, myTree_1, filteredData2, myTree_2, filteredData3, myTree_3);
Optimize(results,theLikFun);

fprintf(stdout,theLikFun);

/* Not really necessary, just normalizing so that mean rate=1 */
sum_of_rates = AC + AG + AT + CG + CT + GT;

r1 = AC/sum_of_rates;
r2 = AG/sum_of_rates;
r3 = AT/sum_of_rates;
r4 = CG/sum_of_rates;
r5 = CT/sum_of_rates;
r6 = GT/sum_of_rates;

fprintf(stdout,"\nAC = ", r1);
fprintf(stdout,"\nAG = ", r2);
fprintf(stdout,"\nAT = ", r3);
fprintf(stdout,"\nCG = ", r4);
fprintf(stdout,"\nCT = ", r5);
fprintf(stdout,"\nGT = ", r6);

fprintf(stdout,"\n\n");

/* Sum of estimated branch lengths for each partition,
normalized by the sum of each partition's weight. */
ReplicateConstraint ("this1.?.m := (this2.?.m + this3.? + this4.?)/(codon_1st + codon_2nd + codon_3rd)",
   myTree_4, myTree_1, myTree_2, myTree_3);

--------------------------------------------------------------------------------
-------
Code tested with 2 datasets, both giving the same branch lengths as estimated running PAUP*.

« Last Edit: Sep 14^th, 2006 at 8:25am by Jose_Patane »

IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: One tree, branches estimated across partitions Reply #1 - Aug 29^th, 2006 at 4:23pm Dear Jose, Thanks for yet another excellent contribution. Best, Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Jose_Patane

YaBB Newbies

Offline

Posts: 26
Brazil
Gender: male

Re: One tree, branches estimated across partitions
Reply #2 - Aug 30^th, 2006 at 11:46am

Hmmm... hey, Sergei, after reading your reply on "ML basefreqs for each partition", I started having this one doubt: in the model I specified above, it now seems to me that I'm estimating global ML estimates of both basefreqs and rates based on only 1/3 of the data (all the 1st codon positions)... is that right? I'm asking that because formerly I thought that such parameters were global in the sense that they were based on the whole dataset, which doesn't seem to be the case (to be more specific, I assigned 'filteredData1' to be estimated on myTree_1, instead of 'All_sites'). Is that right? If so, my conception about the proportional model was a bit mistaken (although, in the code above, it's properly coded, for it matches PAUP estimates).

Now, being even more specific: in both ways (ML whole, or ML of just a part), the degrees of freedom would be the same, for freqs and rates are estimated just once. But the precision and/or branch lengths, in the end, could be different... right ???

IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: One tree, branches estimated across partitions
Reply #3 - Aug 30^th, 2006 at 12:18pm

Dear Jose,

In your example, all global parameters (rates and base frequencies) will be estimated jointly from the three partitions (as expected). The distinction between 'global' and 'local' relates only to the tree. For example, if your model depends on a global parameter 'R' and a local parameter 't', then when you assign it to a tree Tree T = (1,2,3) - as an example - HyPhy will keep one copy of 'R' for all branches in the tree, and create a branch specific (i.e. local) parameter for every branch in the tree: T.1.t, T.2.t and T.3.t.

In your example, all global parameters affect each partition - hence the likelihood function estimation is joint. Note, how in my example of partition specific frequencies, I renamed global variables to be different among partitions (or to be more precise, among models attached to partitions), to force HyPhy to estimate these parameters separately.

Cheers,
Sergei

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Jose_Patane YaBB Newbies Offline Posts: 26 Brazil Gender:	Re: One tree, branches estimated across partitions Reply #4 - Aug 30^th, 2006 at 1:01pm Once more, thanks, now I think I got the idea!
Back to top	IP Logged

Pages: 1

Send Topic

‹ Previous Topic | Next Topic ›

« Home

‹ Board

Top of this page