Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
non-synonymous and synonymous rates (Read 6253 times)
adk
Guest


non-synonymous and synonymous rates
Apr 26th, 2005 at 9:13pm
 
Hi,

I'm interested in using your software to estimate not just the dN/dS (aka omega) paramter, but also the branch lengths in units of dN or dS. Can you give a simple example batch file of how I could extract that information. Btw I love the software!

many thanks,

Andy
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: non-synonymous and synonymous rates
Reply #1 - Apr 27th, 2005 at 7:23am
 
Dear Andy,

Quote:
I'm interested in using your software to estimate not just the dN/dS (aka omega) paramter, but also the branch lengths in units of dN or dS. Can you give a simple example batch file of how I could extract that information. Btw I love the software!



I will put an example file together later today.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: non-synonymous and synonymous rates
Reply #2 - Apr 27th, 2005 at 11:23am
 
Dear Andy,

I am afraid the answer to your question is complicated Sad because of the multiple model options.

In the simplest cases you can:

1). If you fit a local model (independent dN and dS) per branch, you can use the tree viewer to scale the tree on synRate (or nonSynRate) and get the values of these parameters in this way
2). If you use AnalyzeCodonData.bf (a standard analysis) to fit a model, you can then call Analysis->Results->syn and non-syn trees (all GUI versions, or the command line version called with -p flag) to  display the trees scaled on the expected synonymous and non-synonymous substitutions per site.

In the options above don't satisfy your needs, let me know and I'll be glad to answer a more specific question.

Also, take a look at pgs 13 and 14 in the 'Getting Started with HyPhy' pdf (included with recent distributions or accessible fromMultimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) file for an explanation of how model parameters relate to expected numbers of substitutions.

Cheers,
Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
adk
Guest


Re: non-synonymous and synonymous rates
Reply #3 - Apr 27th, 2005 at 1:22pm
 
Sergei,
Thanks for the reply. I understand the complication of the multiple models. What I was looking for was a way to script an analysis (i.e. use the batch command language and not the GUI), to get out the expected non-synonymous and synonymous rates. Can you provide an example of this or point me to a piece of source code to look at?
cheers,
Andy
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: non-synonymous and synonymous rates
Reply #4 - Apr 27th, 2005 at 1:44pm
 
Dear Andy,

Quote:
Sergei,
What I was looking for was a way to script an analysis (i.e. use the batch command language and not the GUI), to get out the expected non-synonymous and synonymous rates. Can you provide an example of this or point me to a piece of source code to look at?


Sure thing. The logic is as follows (if there is no rate variation):

1). Traverse the tree node by node
2). For every node obtain a numerical form of the rate matrix using ML parameter values
3). Use the standard variable (_GeneticCode, defined by the inclusion of chooseGeneticCode.def) to partition the rate matrix into syn and non-syn cells
4). Evaluate the appropriately weighted rate sums over the rate matrix to produce E[syn subs/codon] and E[non syn subs/codon]

If there is rate variation involved the same logic applies for every rate category and the final result is obtained by computing another expectation (I can give you details for this as well).

The batch code can be found in TemplateBatchFiles/postDNDS.bf and the relevant bits are pasted below.

This is quite hairy (I really should make a default function for this), so do not hesitate to ask as many questions as needed Smiley

Cheers,
Sergei



/* get all branch names of tree givenTree (replace with own name if necessary) in a column vector*/

branchNames = BranchName (givenTree,-1);

/* count all branches */

T = Columns (branchNames);

/* get the rate matrix from the first node */
ExecuteCommands(
"GetInformation (aRateMx, givenTree."+branchNames[0]+");");

/* obtain the number of non-stop codons from the dimension of the matrix */
nonStopCount = Columns (aRateMx);


synM    = {nonStopCount,nonStopCount};
nonSynM = {nonStopCount,nonStopCount};

vertOnes = {nonStopCount,1};
horOnes  = {1,nonStopCount};

for (h1 = 0; h1<nonStopCount; h1=h1+1)
{
     vertOnes [h1] = 1;
     horOnes  [h1] = 1;
}

/* make syn and non-syn template matrices; i.e. precompute which cells are synonymous and which are non-synonymous;
also: populate the vector of equilibrium frequencies which will be used to
normalize the rate matrix; this steps assumes that _GeneticCode and vectorOfFrequencies (codon freqs) are defined */

hShift = 0;
for (h1 = 0; h1 < 64; h1=h1+1)
{
     gc1 = _Genetic_Code[h1];
     if (gc1 == 10)
     {
           hShift = hShift+1;
     }
     else
     {
           vShift = hShift;
           for (v1 = h1+1; v1 < 64; v1=v1+1)
           {
                 gc2 = _Genetic_Code[v1];
                 if (gc2 == 10)
                 {
                       vShift = vShift + 1;
                 }
                 else
                 {
                       if (gc1 == gc2)
                       {
                             synM [h1-hShift][v1-vShift] = vectorOfFrequencies[h1-hShift];
                             synM [v1-vShift][h1-hShift] = vectorOfFrequencies[v1-vShift];
                       }
                       else
                       {
                             nonSynM [h1-hShift][v1-vShift] = vectorOfFrequencies[h1-hShift];
                             nonSynM [v1-vShift][h1-hShift] = vectorOfFrequencies[v1-vShift];
                       }
                 }
           }
     }
}

synSubsAVL = {};
nsSubsAVL  = {};

/*a utility function to produce tree strings is this file (found in TemplateBatchFiles)*/
#include "TreeTools.ibf";

for (h1=0; h1 < T-1; h1=h1+1)
{
     abn = branchNames[h1];
/* get a rate matrix from the node */
     ExecuteCommands("GetInformation (aRateMx, givenTree."+abn+");");
/* use matrix algebra to compute synSubs */
     synSubs  = (horOnes*(aRateMx$synM))*vertOnes;
/* use matrix algebra to compute non-synSubs */      
nsynSubs = (horOnes*(aRateMx$nonSynM))*vertOnes;
/* normalize sub counts per codon */
     synSubs = synSubs[0]/3;
     nsynSubs = nsynSubs[0]/3;
     
     synSubsAVL[abn] = synSubs;
     nsSubsAVL [abn] = nsynSubs;
}

treeAVL = givenTree ^ 0;

synTreeString            = PostOrderAVL2StringDistances (treeAVL, synSubsAVL);
nonSynTreeString      = PostOrderAVL2StringDistances (treeAVL, nsSubsAVL);

fprintf (stdout, "\nE[Syn subs/nucleotide site] tree: \n\t",    synTreeString,         "\n");
fprintf (stdout, "\nE[Non-syn subs/nucleotide site] tree: \n\t", nonSynTreeString, "\n");
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
adk
Guest


Re: non-synonymous and synonymous rates
Reply #5 - Apr 27th, 2005 at 2:54pm
 
Hi Sergei,
This is great. In fact I saw this code before. One general question I have is how do I call this file/routine from another script? For example I imagine scripting an entire analysis where I define the data file, tree file, model, frequencies, etc. in the script, then optimize the likelihood function, and then do this post-processing. Is there a simple way to call the post_dnds routine from another script? I'm clearly just getting my head around the nuances of the language.
Thanks for your help,
Andy
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: non-synonymous and synonymous rates
Reply #6 - Apr 27th, 2005 at 3:02pm
 
Dear Andy,

Quote:
Is there a simple way to call the post_dnds routine from another script?


You could use:

#include "fileName";

to insert the code in a particular place.

Alternatively, you can call:

fscanf (fileName,"Raw",fileCommands);

and then

ExecuteCommands (fileCommands);

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
adk
Guest


Re: non-synonymous and synonymous rates
Reply #7 - May 11th, 2005 at 1:42pm
 
one more question along these lines- can I express dN not as the number of non-syn. substitutions per site, but as the number of non-syn. substitutions per non-syn. site? does hyphy estimate the number of synonymous and non-synonymous sites in a sequence alignment?

cheers,

andy
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Syn and non-syn sites
Reply #8 - May 11th, 2005 at 2:10pm
 
Dear Andy,

Quote:
one more question along these lines- can I express dN not as the number of non-syn. substitutions per site, but as the number of non-syn. substitutions per non-syn. site? does hyphy estimate the number of synonymous and non-synonymous sites in a sequence alignment?


Not by default. I personally think that the notion of a synonymous (or a non-synonymous) site as most people commonly interpret it is flawed. For a good explanation of why see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login. I meant to implement some of the measures from this paper for a while; I'll post a note here when I do.

In the meantime...

Simon and I have implemented a counting procedure, which does estimate per site syn and non-syn content, based on the ancestral reconstruction from a codon model. After you have fitted a codon model (e.g. using AnalyzeCodonData.bf), choose Analyses->Results -> Suzuki-Gojobori.... (if you use a command line version, use a -p flag on the command line to be prompted for result processing modules at the end of the run). Respond to various options (choose the 1st option in all dialogs and enter some p-value). You will be presented with a table, which will include E[S] and E[NS] columns for each site - those are the estimates of average syn and non-syn sites in that alignment column. If you add up those quantities over all sites, you'll get an estimate of all syn (non-syn) sites in the alignment. This is actually a pretty involved estimate (see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for complete details): it uses ML ancestral states and counts changes over the tree...

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
adk
Guest


Re: non-synonymous and synonymous rates
Reply #9 - May 11th, 2005 at 2:23pm
 
Hey again,

Sounds good. Can your point me to an individual batchfile that I might be able to include in a batchfile I am writing for the analysis?

thanks,
Andy
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: non-synonymous and synonymous rates
Reply #10 - May 11th, 2005 at 2:36pm
 
Dear Andy,

Quote:
Sounds good. Can your point me to an individual batchfile that I might be able to include in a batchfile I am writing for the analysis?


Take a look at post_counting.bf and SGEmulator.bf (in TemplateBatchFiles). The latter does all the work. It assumes (you can modify the source to change that):

1). That you have included 'TemplateModels/chooseGeneticCode.def'
2). You likelihood function is named lf
3). You data filter is called 'filteredData'
4). Your tree is givenTree

At the end - before the last

if (pipeThroughFlag == 0)

the code will populate a matrix called resultMatrix (Nx12), where N is the number of codons. Columns 3 and 4 will contain the estimates of syn (non-syn) sites per codon.

You can sum up over those columns to arrive at the total numbers for the whole sequence.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged