Dear Andy,
Quote:Sergei,
What I was looking for was a way to script an analysis (i.e. use the batch command language and not the GUI), to get out the expected non-synonymous and synonymous rates. Can you provide an example of this or point me to a piece of source code to look at?
Sure thing. The logic is as follows (if there is no rate variation):
1). Traverse the tree node by node
2). For every node obtain a numerical form of the rate matrix using ML parameter values
3). Use the standard variable (_GeneticCode, defined by the inclusion of chooseGeneticCode.def) to partition the rate matrix into syn and non-syn cells
4). Evaluate the appropriately weighted rate sums over the rate matrix to produce E[syn subs/codon] and E[non syn subs/codon]
If there is rate variation involved the same logic applies for every rate category and the final result is obtained by computing another expectation (I can give you details for this as well).
The batch code can be found in TemplateBatchFiles/postDNDS.bf and the relevant bits are pasted below.
This is quite hairy (I really should make a default function for this), so do not hesitate to ask as many questions as needed
Cheers,
Sergei
/* get all branch names of tree givenTree (replace with own name if necessary) in a column vector*/
branchNames = BranchName (givenTree,-1);
/* count all branches */
T = Columns (branchNames);
/* get the rate matrix from the first node */
ExecuteCommands(
"GetInformation (aRateMx, givenTree."+branchNames[0]+");");
/* obtain the number of non-stop codons from the dimension of the matrix */
nonStopCount = Columns (aRateMx);
synM = {nonStopCount,nonStopCount};
nonSynM = {nonStopCount,nonStopCount};
vertOnes = {nonStopCount,1};
horOnes = {1,nonStopCount};
for (h1 = 0; h1<nonStopCount; h1=h1+1)
{
vertOnes [h1] = 1;
horOnes [h1] = 1;
}
/* make syn and non-syn template matrices; i.e. precompute which cells are synonymous and which are non-synonymous;
also: populate the vector of equilibrium frequencies which will be used to
normalize the rate matrix; this steps assumes that _GeneticCode and vectorOfFrequencies (codon freqs) are defined */
hShift = 0;
for (h1 = 0; h1 < 64; h1=h1+1)
{
gc1 = _Genetic_Code[h1];
if (gc1 == 10)
{
hShift = hShift+1;
}
else
{
vShift = hShift;
for (v1 = h1+1; v1 < 64; v1=v1+1)
{
gc2 = _Genetic_Code[v1];
if (gc2 == 10)
{
vShift = vShift + 1;
}
else
{
if (gc1 == gc2)
{
synM [h1-hShift][v1-vShift] = vectorOfFrequencies[h1-hShift];
synM [v1-vShift][h1-hShift] = vectorOfFrequencies[v1-vShift];
}
else
{
nonSynM [h1-hShift][v1-vShift] = vectorOfFrequencies[h1-hShift];
nonSynM [v1-vShift][h1-hShift] = vectorOfFrequencies[v1-vShift];
}
}
}
}
}
synSubsAVL = {};
nsSubsAVL = {};
/*a utility function to produce tree strings is this file (found in TemplateBatchFiles)*/
#include "TreeTools.ibf";
for (h1=0; h1 < T-1; h1=h1+1)
{
abn = branchNames[h1];
/* get a rate matrix from the node */
ExecuteCommands("GetInformation (aRateMx, givenTree."+abn+");");
/* use matrix algebra to compute synSubs */
synSubs = (horOnes*(aRateMx$synM))*vertOnes;
/* use matrix algebra to compute non-synSubs */
nsynSubs = (horOnes*(aRateMx$nonSynM))*vertOnes;
/* normalize sub counts per codon */
synSubs = synSubs[0]/3;
nsynSubs = nsynSubs[0]/3;
synSubsAVL[abn] = synSubs;
nsSubsAVL [abn] = nsynSubs;
}
treeAVL = givenTree ^ 0;
synTreeString = PostOrderAVL2StringDistances (treeAVL, synSubsAVL);
nonSynTreeString = PostOrderAVL2StringDistances (treeAVL, nsSubsAVL);
fprintf (stdout, "\nE[Syn subs/nucleotide site] tree: \n\t", synTreeString, "\n");
fprintf (stdout, "\nE[Non-syn subs/nucleotide site] tree: \n\t", nonSynTreeString, "\n");