Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
simulated sequences (Read 5173 times)
dmerl
YaBB Newbies
*
Offline



Posts: 15
simulated sequences
Apr 29th, 2005 at 2:39pm
 
Hello Sergei - Is there a way to generate a simulated sequence alignment  with HYPHY, so that the sequences are evolved according to one of the underlying substitution models in HYPHY?

many thanks,
Dan
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: simulated sequences
Reply #1 - Apr 29th, 2005 at 2:53pm
 
Dear Dan,

Quote:
Hello Sergei - Is there a way to generate a simulated sequence alignment  with HYPHY, so that the sequences are evolved according to one of the underlying substitution models in HYPHY?


Luckily, HyPhy has some of the most powerful simulation tools available in public domain software. Essentially, if one can define a likelihood function (with multiple models, partitions etc), a simple call to SimulateDataSet (lf) in the batch language will simulate data based on the models and parameter values attached to the likelihood function. For some examples, take a look at  Examples/Simulations in the HyPhy distribution. Also look at pages 47 and on in Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login.

Another command is Simulate (if you do not want to define a likelihood function explicitly). Look at Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for some detail.

One can do this through the interface as well; using the data viewer and Data->Simulate menu options. You can look at Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (the first 15 pages or so) for an introduction on how to set up analyses in the interface.

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
dmerl
YaBB Newbies
*
Offline



Posts: 15
Re: simulated sequences
Reply #2 - May 2nd, 2005 at 1:55pm
 
Thanks as always Sergei - here's another:

If I have fit a variable rates codon model such as MG94xREV_3x4_DualRV_GDD with 3 rate classes to some data, and then simulate an alignment based on this likelihood, is there a way of finding out which rate classes were used for each site during the generation of the alignment?  ie so as to know which sites were simulated under positive selection?

thanks again,
Dan
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: simulated sequences
Reply #3 - May 2nd, 2005 at 2:04pm
 
Dear Dan,

Quote:
If I have fit a variable rates codon model such as MG94xREV_3x4_DualRV_GDD with 3 rate classes to some data, and then simulate an alignment based on this likelihood, is there a way of finding out which rate classes were used for each site during the generation of the alignment?  ie so as to know which sites were simulated under positive selection?



Not through the interface (I should add that actually). However, if you simulate a data set using SimulateDataSet, you can specify optional arguments as in the following example:


DataSet simmedData=SimulateDataSet (lf, "", myRates, myCatVars);


This will store rates used for simulation into myRates (a 2xN rate matrix. where N is the number of sites). myCatVars will store the names of category variables used for simulation. Based on this raw info you can tabulate which sites had dN>dS.

I'm putting together a set of simulation scripts from the "Not so different..." for someone who requested it; I should put this up later in the afternoon - you may want to look at those for simulation purposes as well.

Best,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: simulated sequences
Reply #4 - May 2nd, 2005 at 2:39pm
 
Dear Dan,

Quote:
Thanks as always Sergei - here's another:

If I have fit a variable rates codon model such as MG94xREV_3x4_DualRV_GDD with 3 rate classes to some data, and then simulate an alignment based on this likelihood, is there a way of finding out which rate classes were used for each site during the generation of the alignment?  ie so as to know which sites were simulated under positive selection?



Drop this file Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login into UserAddIns in the HyPhy directory; relaunch the program and reload your MG94xREV_3x4_DualRV_GDD (or MG94xREV_3x4_DualRV) fit. Then click on the button with the gearwheels in the console wndow and choose SimulateAndTabDN_DS. This will simulate a single file based on the likelihood function (it will let you choose one if you have multiple), save it to file and open a chart window with dN, dS and dN-dS used for simulation at each site.

You can look at the source of the file in case you wanna know how it's done.

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
dmerl
YaBB Newbies
*
Offline



Posts: 15
Re: simulated sequences
Reply #5 - May 2nd, 2005 at 3:25pm
 
excellent! thanks!
Back to top
 
 
IP Logged
 
dmerl
YaBB Newbies
*
Offline



Posts: 15
Re: simulated sequences
Reply #6 - May 2nd, 2005 at 5:17pm
 

Hi again Sergei  Tongue

a slightly different question but related - whenever I save my results after optimizing the likelihood function, and attempt to reload the saved results in a different session of hyphy, hyphy attempt to re-optimize the likelihood function.  I see in the batch file that was generated when I saved my LF that there is an Optimize call - can I just get rid of that line?    will the batch file reload the previously calculated results?  this batch file was generated using the "export likelihood function" from the gear menu next to the console.

thanks again,
Dan
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: simulated sequences
Reply #7 - May 2nd, 2005 at 5:20pm
 
Dear Dan,

Quote:
 I see in the batch file that was generated when I saved my LF that there is an Optimize call - can I just get rid of that line?    will the batch file reload the previously calculated results?  this batch file was generated using the "export likelihood function" from the gear menu next to the console.


The 'Optimize' call should not be there... you should just remove it. I'll fix the AddIn in the next build.

In the meantime, open AddIns/ExportLikelihoodFunction in a text editor and comment out (using C style comments /* */) the line before last. This will get rid of the Optimize call.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged