Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
PAML vs HYPHY - Calculation of dN and dS (Read 4805 times)
scherrer
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 8
PAML vs HYPHY - Calculation of dN and dS
Aug 25th, 2010 at 1:17pm
 
I am trying to reproduce the dN and dS calculations from PAML with the GY94 model in HYPHY.  The likelihood score and kappa match the PAML values and the ratio of nonSynRate/synRate from the matrix matches the omega value from PAML.

However, I am calculating both the mutational opportunity and physical sites definitions of dN and dS as defined in Yang's book Computational Molecular Evolution, and I cannot get these values to match up with PAML.  I believe the problem is with the function listed below to calculate the rho values, but I cannot find any bugs in the function.  Do you know if there are any rounding discrepancies between PAML and HYPHY?  Has anyone performed these calculations or are the calculations built into HYPHY?

Code:
/*----------------------------------------------------------------------------------------------------------
	A HYPHY translation of a Python function written by Tong Zhou to calculate the proportions
	of synonymous and nonsynonymous mutations, or rho.  Takes as parameters a 61 x 1 frequency vector,
	a 61 x 61 rate matrix, and the number of species in the alignment.  Returns the proportion of
	nonsynonymous mutations, the proportion of synonymous mutations, and the sum of all the mutations
	divided by the number of species, which is equal to the t parameter.  Function must be placed after
	optimization in the script.  To calculate dN or dS, the function must be called twice: once where the
	selection on the protein has been estimated and once where selection on the protein has been fixed.
  ----------------------------------------------------------------------------------------------------------*/
function calcRhoRate(frequency_vector, rate_matrix, species)
{
	Codon_Index = {
					{14,13,14,13, 7, 7, 7, 7,19, 5,19, 5, 2, 2, 3, 2,
					 12,11,12,11, 6, 6, 6, 6,19,19,19,19, 1, 1, 1, 1,
					 16,15,16,15, 8, 8, 8, 8,20,20,20,20, 4, 4, 4, 4,
					     9,    9, 5, 5, 5, 5,   17,18,17, 1, 0, 1, 0}
				};

	Codon_Index_ENG = {

		{"AAA","AAC","AAG","AAT","ACA","ACC","ACG","ACT","AGA","AGC","AGG","AGT","ATA","ATC","ATG","ATT",
		 "CAA","CAC","CAG","CAT","CCA","CCC","CCG","CCT","CGA","CGC","CGG","CGT","CTA","CTC","CTG","CTT",
		 "GAA","GAC","GAG","GAT","GCA","GCC","GCG","GCT","GGA","GGC","GGG","GGT","GTA","GTC","GTG","GTT",
		       "TAC",      "TAT","TCA","TCC","TCG","TCT",      "TGC","TGG","TGT","TTA","TTC","TTG","TTT"}

				   };


	n = 61;
	sum_diag = 0.0;
	sum_all = 0.0;
	sum_syn = 0.0;

	for (i=0; i<n; i=i+1)
	{
		for (j=0; j<n; j=j+1)
		{
			if (i == j)
			{
				sum_diag = sum_diag + (frequency_vector[i] * rate_matrix[i][j] * frequency_vector[j]);
			}
			else
			{
					codon_i = Codon_Index[i];
					codon_j = Codon_Index[j];
					sum_all =  sum_all + (frequency_vector[i] * rate_matrix[i][j] * frequency_vector[j]);
					if (codon_i == codon_j)
					{
						sum_syn = sum_syn + (frequency_vector[i] * rate_matrix[i][j] * frequency_vector[j]);
					}
			}

		}
	}

	if (sum_all != 0)
	{
		rho_ns = (sum_all - sum_syn) / sum_all;
		rho_syn = sum_syn / sum_all;
	}
	else
	{
		rho_ns = 0;
		rho_syn = 0;
	}

	t = sum_all/species;
	return {{ rho_ns, rho_syn, t }};

} 

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PAML vs HYPHY - Calculation of dN and dS
Reply #1 - Aug 25th, 2010 at 1:47pm
 
Hi there,

The code looks correct to me; I would check the scaling of the rate matrix being passed to the function -- I presume in Yang's book it's scaled to have the expected number of substitutions set to the distance between two sequences (or the tree length for multiple alignments). Can you post the code that CALLS your calcRhoRate function?

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
scherrer
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 8
Re: PAML vs HYPHY - Calculation of dN and dS
Reply #2 - Aug 26th, 2010 at 10:54am
 
Sergei,

Here is the code I am using along with the PAML code.  Thanks for your help!

Mike
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (90 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PAML vs HYPHY - Calculation of dN and dS
Reply #3 - Sep 3rd, 2010 at 5:34pm
 
Hi Mike,

Could you tell me which results from PAML are you trying to match?

In case you haven't already -- take a look at section 1.4.3 in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for a description of how HyPhy deals with related quantities.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
scherrer
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 8
Re: PAML vs HYPHY - Calculation of dN and dS
Reply #4 - Sep 8th, 2010 at 1:14pm
 
I am trying to reproduce dN and dS as calculated by both the mutational opportunity and physical sites definitions, but have not had any success using the attached script.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PAML vs HYPHY - Calculation of dN and dS
Reply #5 - Sep 8th, 2010 at 1:20pm
 
scherrer wrote on Sep 8th, 2010 at 1:14pm:
I am trying to reproduce dN and dS as calculated by both the mutational opportunity and physical sites definitions, but have not had any success using the attached script.


Right: but because PAML output is quite cryptic, could you tell me WHICH numbers you are trying to replicate?

S.
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
scherrer
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 8
Re: PAML vs HYPHY - Calculation of dN and dS
Reply #6 - Sep 9th, 2010 at 8:13am
 
Sergei,

Sorry about that, you are right - PAML output is very cryptic  Huh

I have attached a pdf file of the output from the PAML run - dN and dS, highlighted in red are calculated using the mutational opportunity definition and dN* and dS*, highlighted in yellow, are calculated using the physical sites definition.  These are the numbers I am trying to reproduce.

Mike
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (9 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PAML vs HYPHY - Calculation of dN and dS
Reply #7 - Sep 15th, 2010 at 10:00pm
 
Hi Mike,

I looked at the code at it seems to be logically sound.
I may need to look at Yang's book to understand exactly what is being computed by PAML for dS,dN,dS* and dN*.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged