HyPhy message board | |
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> non-synonymous and synonymous rates http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1114575193 Message started by adk on Apr 26th, 2005 at 9:13pm |
Title: non-synonymous and synonymous rates Post by adk on Apr 26th, 2005 at 9:13pm
Hi,
I'm interested in using your software to estimate not just the dN/dS (aka omega) paramter, but also the branch lengths in units of dN or dS. Can you give a simple example batch file of how I could extract that information. Btw I love the software! many thanks, Andy |
Title: Re: non-synonymous and synonymous rates Post by Sergei on Apr 27th, 2005 at 7:23am
Dear Andy,
wrote on Apr 26th, 2005 at 9:13pm:
I will put an example file together later today. Cheers, Sergei |
Title: Re: non-synonymous and synonymous rates Post by adk on Apr 27th, 2005 at 1:22pm
Sergei,
Thanks for the reply. I understand the complication of the multiple models. What I was looking for was a way to script an analysis (i.e. use the batch command language and not the GUI), to get out the expected non-synonymous and synonymous rates. Can you provide an example of this or point me to a piece of source code to look at? cheers, Andy |
Title: Re: non-synonymous and synonymous rates Post by Sergei on Apr 27th, 2005 at 1:44pm
Dear Andy,
wrote on Apr 27th, 2005 at 1:22pm:
Sure thing. The logic is as follows (if there is no rate variation): 1). Traverse the tree node by node 2). For every node obtain a numerical form of the rate matrix using ML parameter values 3). Use the standard variable (_GeneticCode, defined by the inclusion of chooseGeneticCode.def) to partition the rate matrix into syn and non-syn cells 4). Evaluate the appropriately weighted rate sums over the rate matrix to produce E[syn subs/codon] and E[non syn subs/codon] If there is rate variation involved the same logic applies for every rate category and the final result is obtained by computing another expectation (I can give you details for this as well). The batch code can be found in TemplateBatchFiles/postDNDS.bf and the relevant bits are pasted below. This is quite hairy (I really should make a default function for this), so do not hesitate to ask as many questions as needed :) Cheers, Sergei /* get all branch names of tree givenTree (replace with own name if necessary) in a column vector*/ branchNames = BranchName (givenTree,-1); /* count all branches */ T = Columns (branchNames); /* get the rate matrix from the first node */ ExecuteCommands( "GetInformation (aRateMx, givenTree."+branchNames[0]+");"); /* obtain the number of non-stop codons from the dimension of the matrix */ nonStopCount = Columns (aRateMx); synM = {nonStopCount,nonStopCount}; nonSynM = {nonStopCount,nonStopCount}; vertOnes = {nonStopCount,1}; horOnes = {1,nonStopCount}; for (h1 = 0; h1<nonStopCount; h1=h1+1) { vertOnes [h1] = 1; horOnes [h1] = 1; } /* make syn and non-syn template matrices; i.e. precompute which cells are synonymous and which are non-synonymous; also: populate the vector of equilibrium frequencies which will be used to normalize the rate matrix; this steps assumes that _GeneticCode and vectorOfFrequencies (codon freqs) are defined */ hShift = 0; for (h1 = 0; h1 < 64; h1=h1+1) { gc1 = _Genetic_Code[h1]; if (gc1 == 10) { hShift = hShift+1; } else { vShift = hShift; for (v1 = h1+1; v1 < 64; v1=v1+1) { gc2 = _Genetic_Code[v1]; if (gc2 == 10) { vShift = vShift + 1; } else { if (gc1 == gc2) { synM [h1-hShift][v1-vShift] = vectorOfFrequencies[h1-hShift]; synM [v1-vShift][h1-hShift] = vectorOfFrequencies[v1-vShift]; } else { nonSynM [h1-hShift][v1-vShift] = vectorOfFrequencies[h1-hShift]; nonSynM [v1-vShift][h1-hShift] = vectorOfFrequencies[v1-vShift]; } } } } } synSubsAVL = {}; nsSubsAVL = {}; /*a utility function to produce tree strings is this file (found in TemplateBatchFiles)*/ #include "TreeTools.ibf"; for (h1=0; h1 < T-1; h1=h1+1) { abn = branchNames[h1]; /* get a rate matrix from the node */ ExecuteCommands("GetInformation (aRateMx, givenTree."+abn+");"); /* use matrix algebra to compute synSubs */ synSubs = (horOnes*(aRateMx$synM))*vertOnes; /* use matrix algebra to compute non-synSubs */ nsynSubs = (horOnes*(aRateMx$nonSynM))*vertOnes; /* normalize sub counts per codon */ synSubs = synSubs[0]/3; nsynSubs = nsynSubs[0]/3; synSubsAVL[abn] = synSubs; nsSubsAVL [abn] = nsynSubs; } treeAVL = givenTree ^ 0; synTreeString = PostOrderAVL2StringDistances (treeAVL, synSubsAVL); nonSynTreeString = PostOrderAVL2StringDistances (treeAVL, nsSubsAVL); fprintf (stdout, "\nE[Syn subs/nucleotide site] tree: \n\t", synTreeString, "\n"); fprintf (stdout, "\nE[Non-syn subs/nucleotide site] tree: \n\t", nonSynTreeString, "\n"); |
Title: Re: non-synonymous and synonymous rates Post by adk on Apr 27th, 2005 at 2:54pm
Hi Sergei,
This is great. In fact I saw this code before. One general question I have is how do I call this file/routine from another script? For example I imagine scripting an entire analysis where I define the data file, tree file, model, frequencies, etc. in the script, then optimize the likelihood function, and then do this post-processing. Is there a simple way to call the post_dnds routine from another script? I'm clearly just getting my head around the nuances of the language. Thanks for your help, Andy |
Title: Re: non-synonymous and synonymous rates Post by Sergei on Apr 27th, 2005 at 3:02pm
Dear Andy,
wrote on Apr 27th, 2005 at 2:54pm:
You could use: #include "fileName"; to insert the code in a particular place. Alternatively, you can call: fscanf (fileName,"Raw",fileCommands); and then ExecuteCommands (fileCommands); HTH, Sergei |
Title: Re: non-synonymous and synonymous rates Post by adk on May 11th, 2005 at 1:42pm
one more question along these lines- can I express dN not as the number of non-syn. substitutions per site, but as the number of non-syn. substitutions per non-syn. site? does hyphy estimate the number of synonymous and non-synonymous sites in a sequence alignment?
cheers, andy |
Title: Re: non-synonymous and synonymous rates Post by adk on May 11th, 2005 at 2:23pm
Hey again,
Sounds good. Can your point me to an individual batchfile that I might be able to include in a batchfile I am writing for the analysis? thanks, Andy |
Title: Re: non-synonymous and synonymous rates Post by Sergei on May 11th, 2005 at 2:36pm
Dear Andy,
wrote on May 11th, 2005 at 2:23pm:
Take a look at post_counting.bf and SGEmulator.bf (in TemplateBatchFiles). The latter does all the work. It assumes (you can modify the source to change that): 1). That you have included 'TemplateModels/chooseGeneticCode.def' 2). You likelihood function is named lf 3). You data filter is called 'filteredData' 4). Your tree is givenTree At the end - before the last if (pipeThroughFlag == 0) the code will populate a matrix called resultMatrix (Nx12), where N is the number of codons. Columns 3 and 4 will contain the estimates of syn (non-syn) sites per codon. You can sum up over those columns to arrive at the total numbers for the whole sequence. Cheers, Sergei |
HyPhy message board » Powered by YaBB 2.5.2! YaBB Forum Software © 2000-2024. All Rights Reserved. |