I am trying to reproduce the dN and dS calculations from PAML with the GY94 model in HYPHY. The likelihood score and kappa match the PAML values and the ratio of nonSynRate/synRate from the matrix matches the omega value from PAML.
However, I am calculating both the mutational opportunity and physical sites definitions of dN and dS as defined in Yang's book
Computational Molecular Evolution, and I cannot get these values to match up with PAML. I believe the problem is with the function listed below to calculate the rho values, but I cannot find any bugs in the function. Do you know if there are any rounding discrepancies between PAML and HYPHY? Has anyone performed these calculations or are the calculations built into HYPHY?
Code:/*----------------------------------------------------------------------------------------------------------
A HYPHY translation of a Python function written by Tong Zhou to calculate the proportions
of synonymous and nonsynonymous mutations, or rho. Takes as parameters a 61 x 1 frequency vector,
a 61 x 61 rate matrix, and the number of species in the alignment. Returns the proportion of
nonsynonymous mutations, the proportion of synonymous mutations, and the sum of all the mutations
divided by the number of species, which is equal to the t parameter. Function must be placed after
optimization in the script. To calculate dN or dS, the function must be called twice: once where the
selection on the protein has been estimated and once where selection on the protein has been fixed.
----------------------------------------------------------------------------------------------------------*/
function calcRhoRate(frequency_vector, rate_matrix, species)
{
Codon_Index = {
{14,13,14,13, 7, 7, 7, 7,19, 5,19, 5, 2, 2, 3, 2,
12,11,12,11, 6, 6, 6, 6,19,19,19,19, 1, 1, 1, 1,
16,15,16,15, 8, 8, 8, 8,20,20,20,20, 4, 4, 4, 4,
9, 9, 5, 5, 5, 5, 17,18,17, 1, 0, 1, 0}
};
Codon_Index_ENG = {
{"AAA","AAC","AAG","AAT","ACA","ACC","ACG","ACT","AGA","AGC","AGG","AGT","ATA","ATC","ATG","ATT",
"CAA","CAC","CAG","CAT","CCA","CCC","CCG","CCT","CGA","CGC","CGG","CGT","CTA","CTC","CTG","CTT",
"GAA","GAC","GAG","GAT","GCA","GCC","GCG","GCT","GGA","GGC","GGG","GGT","GTA","GTC","GTG","GTT",
"TAC", "TAT","TCA","TCC","TCG","TCT", "TGC","TGG","TGT","TTA","TTC","TTG","TTT"}
};
n = 61;
sum_diag = 0.0;
sum_all = 0.0;
sum_syn = 0.0;
for (i=0; i<n; i=i+1)
{
for (j=0; j<n; j=j+1)
{
if (i == j)
{
sum_diag = sum_diag + (frequency_vector[i] * rate_matrix[i][j] * frequency_vector[j]);
}
else
{
codon_i = Codon_Index[i];
codon_j = Codon_Index[j];
sum_all = sum_all + (frequency_vector[i] * rate_matrix[i][j] * frequency_vector[j]);
if (codon_i == codon_j)
{
sum_syn = sum_syn + (frequency_vector[i] * rate_matrix[i][j] * frequency_vector[j]);
}
}
}
}
if (sum_all != 0)
{
rho_ns = (sum_all - sum_syn) / sum_all;
rho_syn = sum_syn / sum_all;
}
else
{
rho_ns = 0;
rho_syn = 0;
}
t = sum_all/species;
return {{ rho_ns, rho_syn, t }};
}