HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> PAML vs HYPHY - Calculation of dN and dS
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1282767440

Message started by scherrer on Aug 25th, 2010 at 1:17pm

Title: PAML vs HYPHY - Calculation of dN and dS
Post by scherrer on Aug 25th, 2010 at 1:17pm
I am trying to reproduce the dN and dS calculations from PAML with the GY94 model in HYPHY.  The likelihood score and kappa match the PAML values and the ratio of nonSynRate/synRate from the matrix matches the omega value from PAML.

However, I am calculating both the mutational opportunity and physical sites definitions of dN and dS as defined in Yang's book Computational Molecular Evolution, and I cannot get these values to match up with PAML.  I believe the problem is with the function listed below to calculate the rho values, but I cannot find any bugs in the function.  Do you know if there are any rounding discrepancies between PAML and HYPHY?  Has anyone performed these calculations or are the calculations built into HYPHY?


Code (]/*----------------------------------------------------------------------------------------------------------
     A HYPHY translation of a Python function written by Tong Zhou to calculate the proportions
     of synonymous and nonsynonymous mutations, or rho.  Takes as parameters a 61 x 1 frequency vector,
     a 61 x 61 rate matrix, and the number of species in the alignment.  Returns the proportion of
     nonsynonymous mutations, the proportion of synonymous mutations, and the sum of all the mutations
     divided by the number of species, which is equal to the t parameter.  Function must be placed after
     optimization in the script.  To calculate dN or dS, the function must be called twice: once where the
     selection on the protein has been estimated and once where selection on the protein has been fixed.
 ----------------------------------------------------------------------------------------------------------*/
function calcRhoRate(frequency_vector, rate_matrix, species)
{
     Codon_Index = {
                             {14,13,14,13, 7, 7, 7, 7,19, 5,19, 5, 2, 2, 3, 2,
                              12,11,12,11, 6, 6, 6, 6,19,19,19,19, 1, 1, 1, 1,
                              16,15,16,15, 8, 8, 8, 8,20,20,20,20, 4, 4, 4, 4,
                                  9,    9, 5, 5, 5, 5,   17,18,17, 1, 0, 1, 0}
                       };
                             
     Codon_Index_ENG = {

           {"AAA","AAC","AAG","AAT","ACA","ACC","ACG","ACT","AGA","AGC","AGG","AGT","ATA","ATC","ATG","ATT",
            "CAA","CAC","CAG","CAT","CCA","CCC","CCG","CCT","CGA","CGC","CGG","CGT","CTA","CTC","CTG","CTT",
            "GAA","GAC","GAG","GAT","GCA","GCC","GCG","GCT","GGA","GGC","GGG","GGT","GTA","GTC","GTG","GTT",
                  "TAC",      "TAT","TCA","TCC","TCG","TCT",      "TGC","TGG","TGT","TTA","TTC","TTG","TTT"}

                          };                        
                             
                             
     n = 61;
     sum_diag = 0.0;
     sum_all = 0.0;
     sum_syn = 0.0;

     for (i=0; i<n; i=i+1)
     {
           for (j=0; j<n; j=j+1)
           {
                 if (i == j)
                 {
                       sum_diag = sum_diag + (frequency_vector[i):

* rate_matrix[i][j] * frequency_vector[j]);
                 }
                 else
                 {
                             codon_i = Codon_Index[i];
                             codon_j = Codon_Index[j];
                             sum_all =  sum_all + (frequency_vector[i] * rate_matrix[i][j] * frequency_vector[j]);
                             if (codon_i == codon_j)
                             {      
                                   sum_syn = sum_syn + (frequency_vector[i] * rate_matrix[i][j] * frequency_vector[j]);
                             }
                 }
           
           }
     }

     if (sum_all != 0)
     {
           rho_ns = (sum_all - sum_syn) / sum_all;
           rho_syn = sum_syn / sum_all;
     }
     else
     {
           rho_ns = 0;
           rho_syn = 0;
     }

     t = sum_all/species;
     return {{ rho_ns, rho_syn, t }};
     
}

Title: Re: PAML vs HYPHY - Calculation of dN and dS
Post by Sergei on Aug 25th, 2010 at 1:47pm
Hi there,

The code looks correct to me; I would check the scaling of the rate matrix being passed to the function -- I presume in Yang's book it's scaled to have the expected number of substitutions set to the distance between two sequences (or the tree length for multiple alignments). Can you post the code that CALLS your calcRhoRate function?

Sergei

Title: Re: PAML vs HYPHY - Calculation of dN and dS
Post by scherrer on Aug 26th, 2010 at 10:54am
Sergei,

Here is the code I am using along with the PAML code.  Thanks for your help!

Mike
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=PHtest.zip (90 KB | )

Title: Re: PAML vs HYPHY - Calculation of dN and dS
Post by Sergei on Sep 3rd, 2010 at 5:34pm
Hi Mike,

Could you tell me which results from PAML are you trying to match?

In case you haven't already -- take a look at section 1.4.3 in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for a description of how HyPhy deals with related quantities.

Sergei

Title: Re: PAML vs HYPHY - Calculation of dN and dS
Post by scherrer on Sep 8th, 2010 at 1:14pm
I am trying to reproduce dN and dS as calculated by both the mutational opportunity and physical sites definitions, but have not had any success using the attached script.

Title: Re: PAML vs HYPHY - Calculation of dN and dS
Post by Sergei on Sep 8th, 2010 at 1:20pm

scherrer wrote on Sep 8th, 2010 at 1:14pm:
I am trying to reproduce dN and dS as calculated by both the mutational opportunity and physical sites definitions, but have not had any success using the attached script.


Right: but because PAML output is quite cryptic, could you tell me WHICH numbers you are trying to replicate?

S.

Title: Re: PAML vs HYPHY - Calculation of dN and dS
Post by scherrer on Sep 9th, 2010 at 8:13am
Sergei,

Sorry about that, you are right - PAML output is very cryptic  :-?

I have attached a pdf file of the output from the PAML run - dN and dS, highlighted in red are calculated using the mutational opportunity definition and dN* and dS*, highlighted in yellow, are calculated using the physical sites definition.  These are the numbers I am trying to reproduce.

Mike
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=output.pdf (9 KB | )

Title: Re: PAML vs HYPHY - Calculation of dN and dS
Post by Sergei on Sep 15th, 2010 at 10:00pm
Hi Mike,

I looked at the code at it seems to be logically sound.
I may need to look at Yang's book to understand exactly what is being computed by PAML for dS,dN,dS* and dN*.

Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.