Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Simulating Sequences Under Selection (Read 1446 times)
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Simulating Sequences Under Selection
May 3rd, 2005 at 11:23am
 
Greetings,

I have posted a set of analyses and example simulations to evaluate statistical properties of tests for selection as discussed in Kosakovsky Pond and Frost 2005 (Mol. Biol. and Evolution 22(5):1208-1222).

The archive can be downloaded from Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (5MB).

The README file is attached:


There are 3 kinds of HYPHY analyses (batch files)  included in this distribution:

1). The files used to simulate data using MG94xREV models of evolution with variable dS and dN
across sites. The main file is Simulator/CodonSimulator.bf. It is configured using
simulation.config in the same directory. Please peruse the comments in simulation.config for
user options. Open CodonSimulator.bf in HyPhy as a batch file to generate codon data.

2). The analysis files are located in the Analysis directory. Note that all of the analyses
REQUIRE an MPI environment. It is important to adhere to the following instructions for the analyses
to run properly.

     i).  cd to the dNdSSimulator directory
     ii). invoke the analysis you want to run by (use the appropriate syntax for your MPI distro):
            HYPHY_BASE directory should be replaced with the full path to HYPHY distribution directory
            (e.g. /home/sergei/HYPHY/)
           
           $mpirun -np 16 HYPHY_BASE/HYPHYMPI BASEPATH=HYPHY_BASE Analysis/fileToRun.bf
           
     iii). Follow the prompts for each analysis as needed.

     (A). SLAC_MPI.bf and WAC_MPI.bf process data generated with CodonSimulator.bf using the SLAC
     or WAC methods. For each sample file sample.xx they generate sample.nucfit.xx (nucleotide model
     fits) and file.results.xx (tables of quantities inferred by the algorithm suitable for processing
     with the files discussed later). The analyses also generate a sample.dNdS file later used by FEL_MPI.bf
     
     (B). FEL_MPI.bf runs the FEL method on simulated data. It requires the files generated by SLAC_MPI.bf,
     and prompts for the location of the sample.dNdS file output by SLAC previously.
     
     (C). REL_MPI.bf runs the REL method on simulated data. It should be invoked as (adjust MPI settings
     as needed - the analysis requires the number of processors equal to MxN+1, where M is the number
     of synonymous rate classes, and N is the number of non-synonymous rate classes)
     
           $mpirun -np 16 HYPHY_BASE/HYPHYMPI BASEPATH=HYPHY_BASE MPIOPTIMIZER Analysis/REL_MPI.bf
           
           The analysis outputs sample.marginals.xx and sample.bayesFactors.xx files used by result processors later on,
           as well as sample.fullfit.xx file which include a self-contained likelihood function with parameter MLEs
           in the NEXUS format, suitable for opening in HyPhy as a batch file, if one desires to explore the fit of the model
           to a particular data file.
           
           
3). Result processing modules. These batch files require a GUI version of HyPhy to work properly. All files make use of
     the files output by the analyses from (2) and a configuration file 'rateProfile.config' which should have the 'omega'
     matrix used for simulations as defined in 'simulation.config'. If the rateProfile.config file does not use the same
     omegas as were used to generate the data, result processing will be invalid.

     (A). SLAC_ROC.bf prompts for the sample.results.0 file and generates  ROC curves for detecting positively
     and negatively selected sites as well as tabulating Type I and Type II errors for a given nominal p-value.
     SLAC_Rates.bf tabulates averages of inferred dN-dS at each site and compares it to the values used to simulate
     the data.
     
     (B). FEL_ROC.bf prompts for the  sample.fel.0 file and generates  ROC curves for detecting positively
     and negatively selected sites as well as tabulating Type I and Type II errors for a given nominal p-value.

     (C). REL_ROC.bf prompts for the  sample.results.0 (the output of SLAC_MPI) file and generates  ROC curves for detecting positively
     and negatively selected sites as well as tabulating Type I and Type II errors for a given nominal Bayes Factor.
     This file requires sample.marginals.xx files to be present in the same directory as sample.results.xx
     
     (D). Integrative_Rates.bf tabulates averages of inferred dN-dS for all three methods at each site and compares it to
     the values used to simulate the data.
     
     You may try out result processing files on the result files included in the 'ExampleSimulations' directory.
     
     
For further information contact help at datamonkey dot org.

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged