Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
simulation questions (Read 2771 times)
jrm
YaBB Newbies
*
Offline



Posts: 2
simulation questions
Aug 7th, 2013 at 1:32pm
 
Hi;

I would like to simulate data under the codon mixture models of Nielsen and Yang (1998) (M3), but instead of a single transition to transversion ratio parameter, I would like to include exchangability parameters from a GTR model.  I think I would do something like the code below for M0, but what about M3?

Cheers,

Joseph

VERBOSITY_LEVEL = 1;

// the 61x61 rate matrix is Q_ij = { 0 if codons differ at more than 1 position, TC*pi_j for synonymous with T->C, ... omega*GA*pi_j for nonsynonymouse with G->A,... }
// for a given set of parameters the rate matrix is specified like...
GTRRateMatrix     = { {*,0.00234,....,0.00323} ... {0.0000932,...,*} };
// specify frequencies
simFreqs          = {{0.03509833,0.00660730,0.02214928...}};
Model GTR         = (GTRRateMatrix, simFreqs);
Tree fiveTaxaTree = (1:1,2:1):1,3:2,4:2,5:2);

for (i=0; i<100; i++) {
    DataSet simulatedData = Simulate(?);
    fprintf(stdout DataSet, "\n\n");
}

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: simulation questions
Reply #1 - Aug 7th, 2013 at 1:41pm
 
Hi Joseph,

You probably want something like this (where GeneticCodeExclusions will be set to "TAA,TAG,TGA", and codonCharacters is the length of the simulated alignment)

Code:
codonCharacters = {{"A","C","G","T"}
			  			   {"3",GeneticCodeExclusions,"",""}};

SetDialogPrompt ("Save simulated data to:");
fprintf 		(PROMPT_FOR_FILE, CLEAR_FILE);
save_sims_to     = LAST_FILE_PATH;
IS_TREE_PRESENT_IN_DATA = 1;
DATAFILE_TREE = Format (fiveTaxaTree,1,1);

fprintf (stdout, "\n");

for (it = 0; it < replicates; it += 1) {
    DataSet sim   = Simulate 	(fiveTaxaTree, simFreqs,codonCharacters,codons,0);
    DataSetFilter simFilter = CreateFilter (sim,1);
	fName = save_sims_to + "." + it;
	fprintf 		(fName, CLEAR_FILE,simFilter);
    SetParameter (STATUS_BAR_STATUS_STRING, "Replicate " + (it+1) + " of " + replicates + " generated", 0);
}
 



Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
jrm
YaBB Newbies
*
Offline



Posts: 2
Re: simulation questions
Reply #2 - Aug 9th, 2013 at 12:53pm
 
Hi Sergei;

Thanks for your response.  I'm not quite clear on "codonCharacters is the length of the simulated alignment".  I'm also unclear how to get the mixture of the mixture model.  Wouldn't three rate matrices be required?

Below is what I have now.

Thanks again,

Joseph

VERBOSITY_LEVEL = 1;
GTRRateMatrix =
{
{          *, 0.00660730, 0.02214928, 0.00940329, 0.02401722,...}
...
};
simFreqs = {{0.03509833,0.00660730,0.02214928,0.00940329...}};
Model GTR = (GTRRateMatrix, simFreqs);
Tree fiveTaxaTree = ((1:1,2:1):1,3:2,4:2,5:2);
GeneticCodeExclusions = "TAA,TAG,TGA";
codonCharacters = {{"A","C","G","T"} {"3",GeneticCodeExclusions,"",""}};
SetDialogPrompt("\nSave simulated data to");
fprintf(PROMPT_FOR_FILE, CLEAR_FILE);
save_sims_to = LAST_FILE_PATH;
IS_TREE_PRESENT_IN_DATA = 1;
DATAFILE_TREE = Format(fiveTaxaTree,1,1);
fprintf (stdout, "\n");
for (it=0; it<100; it=it+1) {
    DataSet sim = Simulate(fiveTaxaTree,simFreqs,codonCharacters,codons,0);
    DataSetFilter simFilter = CreateFilter(sim,1);
    fName = save_sims_to + "." + it;
    fprintf(fName,CLEAR_FILE,simFilter);
    /*SetParameter(STATUS_BAR_STATUS_STRING, "Replicate " + (it+1) + " of " + replicates + " generated", 0);*/
    fprintf(stdout, "Replicate ", it+1, " of 100 generated");
}
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: simulation questions
Reply #3 - Aug 12th, 2013 at 4:46pm
 
Hi Joseph,

Sorry, I meant to say 'codons' is the length of the simulated alignment.
For simulating under the mixture model: if you include a category variable (e.g. omega for M3) in your rate matrix, HyPhy will automatically sample from the distribution for the variable and simulate the data accordingly. The M3 category variable definition can be found in TemplateBatchFiles/TemplateModels/Yang200Distributions.def, e.g.

Code:
global P1 = 1/3;
				global P2 = .5;
				P1:<1;
				P2:<1;
				global W1 = .25;
				global R_1 = .5;
				global R_2 = 3;
				R_1:<1;
				R_2:>1;
				categFreqMatrix = {{P1,(1-P1)*P2, (1-P1)*(1-P2)}} ;
				categRateMatrix = {{W1*R_1,W1,W1*R_2}};
				category c = (3, categFreqMatrix , MEAN, ,categRateMatrix, 0, 1e25);
 



And then wherever you would use omega in the model, use 'c' instead.

Alternatively, you can specify three matrices, three trees, and make three separate calls to Simulate to generate the data under each of the model, and then Concatenate them as in

Code:
DataSet sim1   = Simulate 	(fiveTaxaTree1, simFreqs,codonCharacters,codons1,0);
DataSet sim2   = Simulate 	(fiveTaxaTree2, simFreqs,codonCharacters,codons2,0);
DataSet sim3   = Simulate 	(fiveTaxaTree3, simFreqs,codonCharacters,codons3,0);
DataSet joint   = Concatenate 	(sim1,sim2,sim3);

 



Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged