HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> non-synonymous and synonymous rates
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1114575193

Message started by adk on Apr 26th, 2005 at 9:13pm

Title: non-synonymous and synonymous rates
Post by adk on Apr 26th, 2005 at 9:13pm
Hi,

I'm interested in using your software to estimate not just the dN/dS (aka omega) paramter, but also the branch lengths in units of dN or dS. Can you give a simple example batch file of how I could extract that information. Btw I love the software!

many thanks,

Andy

Title: Re: non-synonymous and synonymous rates
Post by Sergei on Apr 27th, 2005 at 7:23am
Dear Andy,


wrote on Apr 26th, 2005 at 9:13pm:
I'm interested in using your software to estimate not just the dN/dS (aka omega) paramter, but also the branch lengths in units of dN or dS. Can you give a simple example batch file of how I could extract that information. Btw I love the software!


I will put an example file together later today.

Cheers,
Sergei

Title: Re: non-synonymous and synonymous rates
Post by Sergei on Apr 27th, 2005 at 11:23am
Dear Andy,

I am afraid the answer to your question is complicated :( because of the multiple model options.

In the simplest cases you can:

1). If you fit a local model (independent dN and dS) per branch, you can use the tree viewer to scale the tree on synRate (or nonSynRate) and get the values of these parameters in this way
2). If you use AnalyzeCodonData.bf (a standard analysis) to fit a model, you can then call Analysis->Results->syn and non-syn trees (all GUI versions, or the command line version called with -p flag) to  display the trees scaled on the expected synonymous and non-synonymous substitutions per site.

In the options above don't satisfy your needs, let me know and I'll be glad to answer a more specific question.

Also, take a look at pgs 13 and 14 in the 'Getting Started with HyPhy' pdf (included with recent distributions or accessible fromMultimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) file for an explanation of how model parameters relate to expected numbers of substitutions.

Cheers,
Sergei


Title: Re: non-synonymous and synonymous rates
Post by adk on Apr 27th, 2005 at 1:22pm
Sergei,
Thanks for the reply. I understand the complication of the multiple models. What I was looking for was a way to script an analysis (i.e. use the batch command language and not the GUI), to get out the expected non-synonymous and synonymous rates. Can you provide an example of this or point me to a piece of source code to look at?
cheers,
Andy

Title: Re: non-synonymous and synonymous rates
Post by Sergei on Apr 27th, 2005 at 1:44pm
Dear Andy,


wrote on Apr 27th, 2005 at 1:22pm:
Sergei,
What I was looking for was a way to script an analysis (i.e. use the batch command language and not the GUI), to get out the expected non-synonymous and synonymous rates. Can you provide an example of this or point me to a piece of source code to look at?


Sure thing. The logic is as follows (if there is no rate variation):

1). Traverse the tree node by node
2). For every node obtain a numerical form of the rate matrix using ML parameter values
3). Use the standard variable (_GeneticCode, defined by the inclusion of chooseGeneticCode.def) to partition the rate matrix into syn and non-syn cells
4). Evaluate the appropriately weighted rate sums over the rate matrix to produce E[syn subs/codon] and E[non syn subs/codon]

If there is rate variation involved the same logic applies for every rate category and the final result is obtained by computing another expectation (I can give you details for this as well).

The batch code can be found in TemplateBatchFiles/postDNDS.bf and the relevant bits are pasted below.

This is quite hairy (I really should make a default function for this), so do not hesitate to ask as many questions as needed :)

Cheers,
Sergei



/* get all branch names of tree givenTree (replace with own name if necessary) in a column vector*/

branchNames = BranchName (givenTree,-1);

/* count all branches */

T = Columns (branchNames);

/* get the rate matrix from the first node */
ExecuteCommands(
"GetInformation (aRateMx, givenTree."+branchNames[0]+");");

/* obtain the number of non-stop codons from the dimension of the matrix */
nonStopCount = Columns (aRateMx);


synM    = {nonStopCount,nonStopCount};
nonSynM = {nonStopCount,nonStopCount};

vertOnes = {nonStopCount,1};
horOnes  = {1,nonStopCount};

for (h1 = 0; h1<nonStopCount; h1=h1+1)
{
     vertOnes [h1] = 1;
     horOnes  [h1] = 1;
}

/* make syn and non-syn template matrices; i.e. precompute which cells are synonymous and which are non-synonymous;
also: populate the vector of equilibrium frequencies which will be used to
normalize the rate matrix; this steps assumes that _GeneticCode and vectorOfFrequencies (codon freqs) are defined */

hShift = 0;
for (h1 = 0; h1 < 64; h1=h1+1)
{
     gc1 = _Genetic_Code[h1];
     if (gc1 == 10)
     {
           hShift = hShift+1;
     }
     else
     {
           vShift = hShift;
           for (v1 = h1+1; v1 < 64; v1=v1+1)
           {
                 gc2 = _Genetic_Code[v1];
                 if (gc2 == 10)
                 {
                       vShift = vShift + 1;
                 }
                 else
                 {
                       if (gc1 == gc2)
                       {
                             synM [h1-hShift][v1-vShift] = vectorOfFrequencies[h1-hShift];
                             synM [v1-vShift][h1-hShift] = vectorOfFrequencies[v1-vShift];
                       }
                       else
                       {
                             nonSynM [h1-hShift][v1-vShift] = vectorOfFrequencies[h1-hShift];
                             nonSynM [v1-vShift][h1-hShift] = vectorOfFrequencies[v1-vShift];
                       }
                 }
           }
     }
}

synSubsAVL = {};
nsSubsAVL  = {};

/*a utility function to produce tree strings is this file (found in TemplateBatchFiles)*/
#include "TreeTools.ibf";

for (h1=0; h1 < T-1; h1=h1+1)
{
     abn = branchNames[h1];
/* get a rate matrix from the node */
     ExecuteCommands("GetInformation (aRateMx, givenTree."+abn+");");
/* use matrix algebra to compute synSubs */
     synSubs  = (horOnes*(aRateMx$synM))*vertOnes;
/* use matrix algebra to compute non-synSubs */      
nsynSubs = (horOnes*(aRateMx$nonSynM))*vertOnes;
/* normalize sub counts per codon */
     synSubs = synSubs[0]/3;
     nsynSubs = nsynSubs[0]/3;
     
     synSubsAVL[abn] = synSubs;
     nsSubsAVL [abn] = nsynSubs;
}

treeAVL = givenTree ^ 0;

synTreeString            = PostOrderAVL2StringDistances (treeAVL, synSubsAVL);
nonSynTreeString      = PostOrderAVL2StringDistances (treeAVL, nsSubsAVL);

fprintf (stdout, "\nE[Syn subs/nucleotide site] tree: \n\t",    synTreeString,         "\n");
fprintf (stdout, "\nE[Non-syn subs/nucleotide site] tree: \n\t", nonSynTreeString, "\n");

Title: Re: non-synonymous and synonymous rates
Post by adk on Apr 27th, 2005 at 2:54pm
Hi Sergei,
This is great. In fact I saw this code before. One general question I have is how do I call this file/routine from another script? For example I imagine scripting an entire analysis where I define the data file, tree file, model, frequencies, etc. in the script, then optimize the likelihood function, and then do this post-processing. Is there a simple way to call the post_dnds routine from another script? I'm clearly just getting my head around the nuances of the language.
Thanks for your help,
Andy

Title: Re: non-synonymous and synonymous rates
Post by Sergei on Apr 27th, 2005 at 3:02pm
Dear Andy,


wrote on Apr 27th, 2005 at 2:54pm:
Is there a simple way to call the post_dnds routine from another script?


You could use:

#include "fileName";

to insert the code in a particular place.

Alternatively, you can call:

fscanf (fileName,"Raw",fileCommands);

and then

ExecuteCommands (fileCommands);

HTH,
Sergei

Title: Re: non-synonymous and synonymous rates
Post by adk on May 11th, 2005 at 1:42pm
one more question along these lines- can I express dN not as the number of non-syn. substitutions per site, but as the number of non-syn. substitutions per non-syn. site? does hyphy estimate the number of synonymous and non-synonymous sites in a sequence alignment?

cheers,

andy

Title: Syn and non-syn sites
Post by Sergei on May 11th, 2005 at 2:10pm
Dear Andy,


wrote on May 11th, 2005 at 1:42pm:
one more question along these lines- can I express dN not as the number of non-syn. substitutions per site, but as the number of non-syn. substitutions per non-syn. site? does hyphy estimate the number of synonymous and non-synonymous sites in a sequence alignment?


Not by default. I personally think that the notion of a synonymous (or a non-synonymous) site as most people commonly interpret it is flawed. For a good explanation of why see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login. I meant to implement some of the measures from this paper for a while; I'll post a note here when I do.

In the meantime...

Simon and I have implemented a counting procedure, which does estimate per site syn and non-syn content, based on the ancestral reconstruction from a codon model. After you have fitted a codon model (e.g. using AnalyzeCodonData.bf), choose Analyses->Results -> Suzuki-Gojobori.... (if you use a command line version, use a -p flag on the command line to be prompted for result processing modules at the end of the run). Respond to various options (choose the 1st option in all dialogs and enter some p-value). You will be presented with a table, which will include E[S] and E[NS] columns for each site - those are the estimates of average syn and non-syn sites in that alignment column. If you add up those quantities over all sites, you'll get an estimate of all syn (non-syn) sites in the alignment. This is actually a pretty involved estimate (see Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for complete details): it uses ML ancestral states and counts changes over the tree...

HTH,
Sergei

Title: Re: non-synonymous and synonymous rates
Post by adk on May 11th, 2005 at 2:23pm
Hey again,

Sounds good. Can your point me to an individual batchfile that I might be able to include in a batchfile I am writing for the analysis?

thanks,
Andy

Title: Re: non-synonymous and synonymous rates
Post by Sergei on May 11th, 2005 at 2:36pm
Dear Andy,


wrote on May 11th, 2005 at 2:23pm:
Sounds good. Can your point me to an individual batchfile that I might be able to include in a batchfile I am writing for the analysis?


Take a look at post_counting.bf and SGEmulator.bf (in TemplateBatchFiles). The latter does all the work. It assumes (you can modify the source to change that):

1). That you have included 'TemplateModels/chooseGeneticCode.def'
2). You likelihood function is named lf
3). You data filter is called 'filteredData'
4). Your tree is givenTree

At the end - before the last

if (pipeThroughFlag == 0)

the code will populate a matrix called resultMatrix (Nx12), where N is the number of codons. Columns 3 and 4 will contain the estimates of syn (non-syn) sites per codon.

You can sum up over those columns to arrive at the total numbers for the whole sequence.

Cheers,
Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.