HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> simulation questions
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1375907542

Message started by jrm on Aug 7th, 2013 at 1:32pm

Title: simulation questions
Post by jrm on Aug 7th, 2013 at 1:32pm
Hi;

I would like to simulate data under the codon mixture models of Nielsen and Yang (1998) (M3), but instead of a single transition to transversion ratio parameter, I would like to include exchangability parameters from a GTR model.  I think I would do something like the code below for M0, but what about M3?

Cheers,

Joseph

VERBOSITY_LEVEL = 1;

// the 61x61 rate matrix is Q_ij = { 0 if codons differ at more than 1 position, TC*pi_j for synonymous with T->C, ... omega*GA*pi_j for nonsynonymouse with G->A,... }
// for a given set of parameters the rate matrix is specified like...
GTRRateMatrix     = { {*,0.00234,....,0.00323} ... {0.0000932,...,*} };
// specify frequencies
simFreqs          = {{0.03509833,0.00660730,0.02214928...}};
Model GTR         = (GTRRateMatrix, simFreqs);
Tree fiveTaxaTree = (1:1,2:1):1,3:2,4:2,5:2);

for (i=0; i<100; i++) {
    DataSet simulatedData = Simulate(?);
    fprintf(stdout DataSet, "\n\n");
}


Title: Re: simulation questions
Post by Sergei on Aug 7th, 2013 at 1:41pm
Hi Joseph,

You probably want something like this (where GeneticCodeExclusions will be set to "TAA,TAG,TGA", and codonCharacters is the length of the simulated alignment)

[code]
codonCharacters = {{"A","C","G","T"}
                                       {"3",GeneticCodeExclusions,"",""}};
                                       
SetDialogPrompt ("Save simulated data to:");
fprintf            (PROMPT_FOR_FILE, CLEAR_FILE);
save_sims_to     = LAST_FILE_PATH;
IS_TREE_PRESENT_IN_DATA = 1;
DATAFILE_TREE = Format (fiveTaxaTree,1,1);

fprintf (stdout, "\n");

for (it = 0; it < replicates; it += 1) {             
    DataSet sim   = Simulate      (fiveTaxaTree, simFreqs,codonCharacters,codons,0);
    DataSetFilter simFilter = CreateFilter (sim,1);      
     fName = save_sims_to + "." + it;
     fprintf            (fName, CLEAR_FILE,simFilter);
    SetParameter (STATUS_BAR_STATUS_STRING, "Replicate " + (it+1) + " of " + replicates + " generated", 0);
}            
[/code]

Sergei

Title: Re: simulation questions
Post by jrm on Aug 9th, 2013 at 12:53pm
Hi Sergei;

Thanks for your response.  I'm not quite clear on "codonCharacters is the length of the simulated alignment".  I'm also unclear how to get the mixture of the mixture model.  Wouldn't three rate matrices be required?

Below is what I have now.

Thanks again,

Joseph

VERBOSITY_LEVEL = 1;
GTRRateMatrix =
{
{          *, 0.00660730, 0.02214928, 0.00940329, 0.02401722,...}
...
};
simFreqs = {{0.03509833,0.00660730,0.02214928,0.00940329...}};
Model GTR = (GTRRateMatrix, simFreqs);
Tree fiveTaxaTree = ((1:1,2:1):1,3:2,4:2,5:2);
GeneticCodeExclusions = "TAA,TAG,TGA";
codonCharacters = {{"A","C","G","T"} {"3",GeneticCodeExclusions,"",""}};
SetDialogPrompt("\nSave simulated data to");
fprintf(PROMPT_FOR_FILE, CLEAR_FILE);
save_sims_to = LAST_FILE_PATH;
IS_TREE_PRESENT_IN_DATA = 1;
DATAFILE_TREE = Format(fiveTaxaTree,1,1);
fprintf (stdout, "\n");
for (it=0; it<100; it=it+1) {
    DataSet sim = Simulate(fiveTaxaTree,simFreqs,codonCharacters,codons,0);
    DataSetFilter simFilter = CreateFilter(sim,1);
    fName = save_sims_to + "." + it;
    fprintf(fName,CLEAR_FILE,simFilter);
    /*SetParameter(STATUS_BAR_STATUS_STRING, "Replicate " + (it+1) + " of " + replicates + " generated", 0);*/
    fprintf(stdout, "Replicate ", it+1, " of 100 generated");
}

Title: Re: simulation questions
Post by Sergei on Aug 12th, 2013 at 4:46pm
Hi Joseph,

Sorry, I meant to say 'codons' is the length of the simulated alignment.
For simulating under the mixture model: if you include a category variable (e.g. omega for M3) in your rate matrix, HyPhy will automatically sample from the distribution for the variable and simulate the data accordingly. The M3 category variable definition can be found in TemplateBatchFiles/TemplateModels/Yang200Distributions.def, e.g.


Code (]
global P1 = 1/3;
                       global P2 = .5;
                       P1:<1;
                       P2:<1;
                       global W1 = .25;
                       global R_1 = .5;
                       global R_2 = 3;
                       R_1:<1;
                       R_2:>1;
                       categFreqMatrix = {{P1,(1-P1)*P2, (1-P1)*(1-P2)}} ;
                       categRateMatrix = {{W1*R_1,W1,W1*R_2}};
                       category c = (3, categFreqMatrix , MEAN, ,categRateMatrix, 0, 1e25);            
[/code):

And then wherever you would use omega in the model, use 'c' instead.

Alternatively, you can specify three matrices, three trees, and make three separate calls to Simulate to generate the data under each of the model, and then Concatenate them as in

[code]

DataSet sim1   = Simulate      (fiveTaxaTree1, simFreqs,codonCharacters,codons1,0);
DataSet sim2   = Simulate      (fiveTaxaTree2, simFreqs,codonCharacters,codons2,0);
DataSet sim3   = Simulate      (fiveTaxaTree3, simFreqs,codonCharacters,codons3,0);
DataSet joint   = Concatenate      (sim1,sim2,sim3);


Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.