Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Parametric Bootstrap using existing values (Read 1619 times)
NP
YaBB Newbies
*
Offline



Posts: 11
Parametric Bootstrap using existing values
Mar 15th, 2007 at 8:13am
 
Hi everyone,

I run HyPhy to test the likelihood of a tree and an alignment, evolving under two different models of evolution. To test the preference of one model or the other, I use a Likelihood Ratio Test (LRT).

I run it for each site of the alignment, because I am interested in finding out which sites are more likely to evolve under one or the other model of evolution.

One of the models I´ve chosen, as a null model (H0), is JTT, which is a non-site-specific one. The alternative model consist of a site-specific substitution matrix derived from SCPE simulations (a program from G Parisi).

So I find out the LogLikelihood for each site under the two different models (keeping the same alignment and tree) and then I compute the 2delta statistic, being
2delta = 2 * ( Maximum likelihood under H1 - Maximum likelihood under H0 )

I´m trying to run a Parametric Bootstrap procedure in order to generate a distribution of 100 values of 2delta for each site, from which I could assess the validity of the previously computed 2delta.

(Please find below the script I´m currently using).

What I would like to know is how could I run HyPhy in some kind of batch mode, in order to use different SCPE LogLk each time, but the same JTT LogLk computed once. The point is that I have 30 set of different SCPE-derived site-specific matrices for one protein (each set consists of one matrix for each site of the protein), so If I run the following script for each of the 30 sets,  each time HyPhy will generate a different distribution of 100 sets of LogLikelihood for each site under JTT. It is problematic, as it requires too much calculation time and it´s useless, provided the JTT matrix and the derived LogLks will be random from the first time I find them, so I wouldn´t need to generate a different random set each tiem (instead of the SCPE LogLikelihoods, which will be specific among the 30 sets).

So I would be very grateful if anyone can give me directions about how to write a script for a Parametric Bootstrap which will allow me to include the existing logLikelihoods in different Parametric Bootstrap rounds.

Thank you very much in advance to all of you, and forgive me for such a long post!


The script for the Parametric Bootstrap (I´ve previously found the logLikelihood for each site under both models, I´m not pasting the script here to keep things focused):

/*100 rounds of ParamBootstrap*/
for (simCounter = 1; simCounter<=100; simCounter = simCounter+1)
{

/*initialize the LogLK*/

    lkh0=0;
    lkh1=0;

/*Initialize and repeat the procedure for each site, given #sites=72*/

    for(site=0;site<72;site=site+1) {

       ExecuteCommands("DataSet simulated"+site+" = SimulateDataSet (lkh0"+site+");");
    
       ExecuteCommands("DataSetFilter classh0s"+site+" = CreateFilter (simulated"+site+",1);");
       ExecuteCommands("LikelihoodFunction lkh0"+site+"= (classh0s"+site+",treeh0"+site+");");
       ExecuteCommands("Optimize (paramsh0s"+site+", lkh0"+site+");");
       ExecuteCommands("fprintf(site-CodJTT-H0.out,paramsh0s"+site+"[1][0],\"\n\");");

       ExecuteCommands("DataSetFilter classh1s"+site+" = CreateFilter (simulated"+site+",1);");
       ExecuteCommands("LikelihoodFunction lkh1"+site+"= (classh1s"+site+",treeh1"+site+");");
       ExecuteCommands("Optimize (paramsh1s"+site+", lkh1"+site+");");
       ExecuteCommands("fprintf(site-CodJTT-H1.out,paramsh1s"+site+"[1][0],\"\n\");");

    fprintf(site-CodJTT-dist2delta.out,simCounter," ",lkh0," ",lkh1," ", 2*(lkh1-lkh0), "\n");

    }

}
Back to top
 
 
IP Logged
 
Art Poon
Global Moderator
*****
Offline


Feed your monkey!

Posts: 0
Re: Parametric Bootstrap using existing values
Reply #1 - Mar 23rd, 2007 at 1:53pm
 
Dear NPalopoli,

Sorry for the delay in responding to your message!

If I understand your problem correctly, then you're doing a parametric bootstrap over your null model (JTT) for a given site, so that each of 30 alternative "SCPE" models is evaluated for each column of a simulated data set.

What's wrong with nesting a loop over the various SCPE models within the bootstrap replication loop?  Also, why not move SimulateDataSet outside of the loop over sites?  You can always filter it for each site later.  Finally, you probably want to be using the tree estimated from the entire alignment, rather than using a site-specific tree. 

I'm appending a modified version of your script.  One of the issues that comes up is re-specifying the model when it has already been linked to the tree, which I decided to do by defining a generalized model matrix with parameters that could subsequently be constrained to JTT or SCPE values.  I'm assuming that your SCPE matrices are fully reversible with the maximum number of parameters:



/* populate general reversible amino acid substitution matrix with global variables */
rev_matrix = {20,20};
global            t = 1;

for (aa1 = 0; aa1 < 20; aa1 = aa1 + 1)
{
     for (aa2 = aa1 + 1; aa2 < 20; aa2 = aa2 + 1)
     {
           if (aa1 != aa2)
           {
                 if (aa1 == 0 && aa2 == 1)
                 {
                       rev_matrix[0][1]:=t;
                       rev_matrix[1][0]:=t;
                 }
                 else
                 {
                       ExecuteCommands ("global gv_"+aa1+"_"+aa2+" = 1;");
                       ExecuteCommands ("rev_matrix["+aa1+"]["+aa2+"] := gv_"+aa1+"_"+aa2+" * t;")
                       ExecuteCommands ("rev_matrix["+aa2+"]["+aa1+"] := gv_"+aa1+"_"+aa2+" * t;")
                 }
           }
     }
}


     

function constrain_to_JTT (noarg)
{
     /* set global variables to JTT values here */
}

function constrain_to_SCPE (mode)
{
     /* set global variables to SCPE values according to mode (which of 30 matrices) */
}


/*100 rounds of ParamBootstrap*/
for (simCounter = 1; simCounter<=100; simCounter = simCounter+1)
{
     DataSet                  sim      = SimulateDataSet (lkh0);
     DataSetFilter      temp      = CreateFilter (sim, 1);
     
     HarvestFrequencies (obs_freqs, temp, 1, 1, 0);
     
     Model                  rev_mod = (rev_matrix, obs_freqs);
     
     for (site = 0; site < 72; site = site+1)
     {
           DataSetFilter      dsf = CreateFilter (sim, 1, siteIndex == site);
           
           constrain_to_JTT (0);
           
           LikelihoodFunction      lf = (dsf, tree);
           Optimize (res0, lkh0);
           
           fprintf (site-CodJTT_H0.out, simCounter, "\t", site, "\t", res0[1][0], "\n");
           
           for (mod = 0; mod < 30; mod = mod+1)
           {
                 constrain_to_SCPE(mod);
                 Optimize(res, lkh0);
                 
                 fprintf (site-CodJTT-H1.out, simCounter, "\t", site, "\t", mod, "\t", res[1][0], "\t", 2*(res[1][0]-res0[1][0], "\n");
           }
     }
}



I hope I've interpreted your problem correctly!
Best,
- Art.
Back to top
 
 
IP Logged
 
NP
YaBB Newbies
*
Offline



Posts: 11
Re: Parametric Bootstrap using existing values
Reply #2 - Mar 23rd, 2007 at 4:03pm
 
Dear Art,
Thank you very much for your answer! It arrives much sooner than what I could have expected.
Inspecting your script, I think you´ve got the answer to my problem! I still have to finish my runs to know if the script is fulfilling my needs. I am quite a newbie with HyPhy, so I would need some time to learn all the commands.
Thank you very much once again.
Nicolas.-
Back to top
 
 
IP Logged