NP
YaBB Newbies
Offline
Posts: 11
|
Hi everyone,
I run HyPhy to test the likelihood of a tree and an alignment, evolving under two different models of evolution. To test the preference of one model or the other, I use a Likelihood Ratio Test (LRT).
I run it for each site of the alignment, because I am interested in finding out which sites are more likely to evolve under one or the other model of evolution.
One of the models I´ve chosen, as a null model (H0), is JTT, which is a non-site-specific one. The alternative model consist of a site-specific substitution matrix derived from SCPE simulations (a program from G Parisi).
So I find out the LogLikelihood for each site under the two different models (keeping the same alignment and tree) and then I compute the 2delta statistic, being 2delta = 2 * ( Maximum likelihood under H1 - Maximum likelihood under H0 )
I´m trying to run a Parametric Bootstrap procedure in order to generate a distribution of 100 values of 2delta for each site, from which I could assess the validity of the previously computed 2delta.
(Please find below the script I´m currently using).
What I would like to know is how could I run HyPhy in some kind of batch mode, in order to use different SCPE LogLk each time, but the same JTT LogLk computed once. The point is that I have 30 set of different SCPE-derived site-specific matrices for one protein (each set consists of one matrix for each site of the protein), so If I run the following script for each of the 30 sets, each time HyPhy will generate a different distribution of 100 sets of LogLikelihood for each site under JTT. It is problematic, as it requires too much calculation time and it´s useless, provided the JTT matrix and the derived LogLks will be random from the first time I find them, so I wouldn´t need to generate a different random set each tiem (instead of the SCPE LogLikelihoods, which will be specific among the 30 sets).
So I would be very grateful if anyone can give me directions about how to write a script for a Parametric Bootstrap which will allow me to include the existing logLikelihoods in different Parametric Bootstrap rounds.
Thank you very much in advance to all of you, and forgive me for such a long post!
The script for the Parametric Bootstrap (I´ve previously found the logLikelihood for each site under both models, I´m not pasting the script here to keep things focused):
/*100 rounds of ParamBootstrap*/ for (simCounter = 1; simCounter<=100; simCounter = simCounter+1) {
/*initialize the LogLK*/
lkh0=0; lkh1=0;
/*Initialize and repeat the procedure for each site, given #sites=72*/
for(site=0;site<72;site=site+1) {
ExecuteCommands("DataSet simulated"+site+" = SimulateDataSet (lkh0"+site+");"); ExecuteCommands("DataSetFilter classh0s"+site+" = CreateFilter (simulated"+site+",1);"); ExecuteCommands("LikelihoodFunction lkh0"+site+"= (classh0s"+site+",treeh0"+site+");"); ExecuteCommands("Optimize (paramsh0s"+site+", lkh0"+site+");"); ExecuteCommands("fprintf(site-CodJTT-H0.out,paramsh0s"+site+"[1][0],\"\n\");");
ExecuteCommands("DataSetFilter classh1s"+site+" = CreateFilter (simulated"+site+",1);"); ExecuteCommands("LikelihoodFunction lkh1"+site+"= (classh1s"+site+",treeh1"+site+");"); ExecuteCommands("Optimize (paramsh1s"+site+", lkh1"+site+");"); ExecuteCommands("fprintf(site-CodJTT-H1.out,paramsh1s"+site+"[1][0],\"\n\");");
fprintf(site-CodJTT-dist2delta.out,simCounter," ",lkh0," ",lkh1," ", 2*(lkh1-lkh0), "\n");
}
}
|