Jose_Patane
YaBB Newbies
Offline
Posts: 26
Brazil
Gender:

Hi all,
This is something PAUP* does as well... but further extensions (not in the code) could be aggregated (e.g. each partition having its own gamma and Pinvar, which PAUP* does not allow  unless the partition is the whole dataset, of course).
 /* One tree in the end: each branch based on a normalized sum of branch lengths across different partitions. Proportional model assumed (as in PAUP*), with partition weights estimated by ML.
Partition by codon position.
Basefreqs  estimated by ML for the whole data. Substitution rates  estimated by ML for the whole data. */
myTopology = "some_tree";
DataSet Whole_gene = ReadDataFile("some_data_set");
/* All positions included. */ DataSetFilter All_sites = CreateFilter (Whole_gene,1);
/* Separate by codon position. */ DataSetFilter filteredData1 = CreateFilter (Whole_gene,1,"<100>"); DataSetFilter filteredData2 = CreateFilter (Whole_gene,1,"<010>"); DataSetFilter filteredData3 = CreateFilter (Whole_gene,1,"<001>");
/* A way (posted by Sergei) to estimate frequencies from the whole dataset. */ global P1; P1 :< 1; global P2; P2 :< 1; global P3; P3 :< 1;
global f_A := P1; global f_C := (1P1) * P2; global f_G := (1P1) * (1P2) * P3; global f_T := (1P1) * (1P2) * (1P3);
estFreq = {{f_A}{f_C}{f_G}{f_T}};
/* Absolute rates. */ global AC; global AG; global AT; global CG; global CT; global GT;
/* Proportional model of branch length estimation across data partitions. */ global R; global S;
/* Each partition is weighted accordingly. */ global codon_1st := All_sites.sites / (filteredData1.sites + R * filteredData2.sites + S * filteredData3.sites ); global codon_2nd := R * codon_1st; global codon_3rd := S * codon_1st;
REVRateMatrix_1 = {{ * ,AC*m,AG*m,AT*m} {AC*m, * ,CG*m,CT*m} {AG*m,CG*m, * ,GT*m} {AT*m,CT*m,GT*m, * }};
Model REV_1 = (REVRateMatrix_1, estFreq);
/*Partition analysis*/ UseModel(REV_1); Tree myTree_1 = myTopology; Tree myTree_2 = myTopology; Tree myTree_3 = myTopology; Tree myTree_4 = myTopology;
/* Tree2.taxon.branch = Tree1.taxon.branch x K (based on another hint by Sergei) */ ReplicateConstraint ("this1.?.m := this2.?.m*R", myTree_2, myTree_1); ReplicateConstraint ("this1.?.m := this2.?.m*S", myTree_3, myTree_1);
LikelihoodFunction theLikFun = (filteredData1, myTree_1, filteredData2, myTree_2, filteredData3, myTree_3); Optimize(results,theLikFun);
fprintf(stdout,theLikFun);
/* Not really necessary, just normalizing so that mean rate=1 */ sum_of_rates = AC + AG + AT + CG + CT + GT;
r1 = AC/sum_of_rates; r2 = AG/sum_of_rates; r3 = AT/sum_of_rates; r4 = CG/sum_of_rates; r5 = CT/sum_of_rates; r6 = GT/sum_of_rates;
fprintf(stdout,"\nAC = ", r1); fprintf(stdout,"\nAG = ", r2); fprintf(stdout,"\nAT = ", r3); fprintf(stdout,"\nCG = ", r4); fprintf(stdout,"\nCT = ", r5); fprintf(stdout,"\nGT = ", r6);
fprintf(stdout,"\n\n");
/* Sum of estimated branch lengths for each partition, normalized by the sum of each partition's weight. */ ReplicateConstraint ("this1.?.m := (this2.?.m + this3.? + this4.?)/(codon_1st + codon_2nd + codon_3rd)", myTree_4, myTree_1, myTree_2, myTree_3);
  Code tested with 2 datasets, both giving the same branch lengths as estimated running PAUP*.
