Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
"Free-ratios"-like analysis in PAML (Read 4725 times)
avilella
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 35
"Free-ratios"-like analysis in PAML
Jul 26th, 2005 at 6:10am
 
Dears,

I would like to know if it is possible to analyse a 4-species alignment in a similar fashion as the "Free-ratios" model in PAML. Specifically, I would like to specify that each branch has it's own omega ratio.

For what I see, I should use BranchClassDNDS.bf (or maybe SelectionLRT.bf), but I don't see how to prepartition the tree for such condition.

What is the format to prepartition the tree?

Any help would be greatly appreciated,

Bests,

    Albert.
Back to top
 
 
IP Logged
 
Simon
Ex Member


Re: "Free-ratios"-like analysis in PAML
Reply #1 - Jul 26th, 2005 at 8:18am
 
Dear Albert,

You can easily test a free ratio model for a 4 species alignment using the graphical interface. Open the data (using the File menu), select the sequences, generate a partition of these sequences (Data>Selection -> Partition), toggle the partition type to 'Codon', add your tree, choose a codon substitution model, then choose the 'local' option in the 'Parameter type' section. You can then optimize the likelihood function. The nice thing about this approach is that you can then easily perform likelihood ratio tests for differences in dN/dS, etc.

Best
Simon
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: "Free-ratios"-like analysis in PAML
Reply #2 - Jul 26th, 2005 at 9:05am
 
Dear Albert,

To add to what Simon had already said, you can also use the Standard Analysis approach if all you want to do is to fit the 'free-ratio' model (I don't particularly like this description, but it is the accepted one).

Choose Basic Analyses->AnalyzeCodonData.bf and follow the prompts until you get to the model parameter dialog.

Most models (e.g. MG94 and derivatives and GY94) will have the 'Local' option, which you need to select. HyPhy, unlike most other packages is able to fit an arbitrary number of model parameters per branch - take a look at the HyPhy walk-through document [url]http://www.hyphy.org/docs/HyphyDocs.pdf[/url] starting at page 12, for a detailed example in the context of GUI based nucleotide models. In the codon model context this means that dS (paramterized by 'synRate' in most models) and dN ('nonSynRate') will be fitted separately for every branch.

For instance, I used the MG94Custom codon modelwith 010010 model string, i.e. MG94xHKY85, on the p51.nex example file which is included with the distribution (in the 'data' directory).

[b]Global Model[/b] (dN/dS is the same for all branches)

[tt]
______________RESULTS______________
Log Likelihood = -3192.68505802982;
Shared Parameters:
AC=0.148414
R=0.23238
AT=AC=0.148414
CG=AC=0.148414
CT=1=1
GT=AC=0.148414

Tree givenTree=(((D_CD_83_ELI_ACC_K03454:0.0206494,D_CD_83_NDK_ACC_M27323:0.0109048)N
ode3:0.0114611,D_UG_94_94UG114_ACC_U88824:0.0597732)Node2:0.00370895,D_CD_84_84Z
R085_ACC_U88822:0.0313679,(B_US_83_RF_ACC_M17451:0.028637,((B_FR_83_HXB2_ACC_K03
455:0.0123754,B_US_86_JRFL_ACC_U63632:0.0194165)Node11:0.00196883,B_US_90_WEAU16
0_ACC_U21135:0.0228499)Node10:0.00563753)Node8:0.0264147);
[/tt]

If I now go to 'Analyses->Results->View Results, and choose the 'Parameters+Values' option, HyPhy prints out the complete list of model parameters and their MLEs (in the command line version, include -p on the command line when you start HyPhy to call result processing at the end of an analysis):

[b]Local Model (every branch with own dN and dS)[/b]

[tt]
______________RESULTS______________
Log Likelihood = -3180.50428940175;
Shared Parameters:
AC=0.150083
AT=AC=0.150083
CG=AC=0.150083
CT=1=1
GT=AC=0.150083

Tree givenTree=(((D_CD_83_ELI_ACC_K03454:0.0195808,D_CD_83_NDK_ACC_M27323:0.0106406)N
ode3:0.0130999,D_UG_94_94UG114_ACC_U88824:0.0582857)Node2:0.00383832,D_CD_84_84Z
R085_ACC_U88822:0.0298677,(B_US_83_RF_ACC_M17451:0.0289752,((B_FR_83_HXB2_ACC_K0
3455:0.0124571,B_US_86_JRFL_ACC_U63632:0.0191141)Node11:0.00234226,B_US_90_WEAU1
60_ACC_U21135:0.023113)Node10:0.0047989)Node8:0.0283525);
[/tt]

And the 'View Results' output

[tt]
global AC=0.1500833478035055;
givenTree.B_US_90_WEAU160_ACC_U21135.synRate=0.1849469616011374;
givenTree.Node10.synRate=0.0229292435029055;
givenTree.Node8.nonSynRate=0.03246789433625548;
givenTree.Node8.synRate=0.2484940702663311;
givenTree.Node11.synRate=0.02726373716021484;
givenTree.B_US_86_JRFL_ACC_U63632.synRate=0.1275403099620458;
givenTree.D_CD_83_NDK_ACC_M27323.nonSynRate=0.02818187734159924;
givenTree.Node11.nonSynRate=0;
givenTree.Node10.nonSynRate=0.01311409837601478;
givenTree.B_US_90_WEAU160_ACC_U21135.nonSynRate=0.03348708338783461;
givenTree.B_FR_83_HXB2_ACC_K03455.nonSynRate=0.01405335626654316;
givenTree.B_US_86_JRFL_ACC_U63632.nonSynRate=0.03781224086019262;
givenTree.B_US_83_RF_ACC_M17451.nonSynRate=0.04109613927773711;
givenTree.D_CD_84_84ZR085_ACC_U88822.nonSynRate=0.06246557853601946;
givenTree.D_CD_83_ELI_ACC_K03454.synRate=0.08830950094179651;
givenTree.D_CD_83_NDK_ACC_M27323.synRate=0.05309109774992424;
givenTree.D_CD_84_84ZR085_ACC_U88822.synRate=0.1908060683975823;
givenTree.Node3.synRate=0.1424544294182528;
givenTree.D_UG_94_94UG114_ACC_U88824.synRate=0.3941850584375284;
givenTree.Node2.nonSynRate=0.0006161123625948533;
givenTree.Node3.nonSynRate=0.003993412386792515;
givenTree.B_US_83_RF_ACC_M17451.synRate=0.2340765380757457;
givenTree.B_FR_83_HXB2_ACC_K03455.synRate=0.1097111783727254;
givenTree.D_CD_83_ELI_ACC_K03454.nonSynRate=0.05559894940886865;
givenTree.D_UG_94_94UG114_ACC_U88824.nonSynRate=0.1132037804037879;
givenTree.Node2.synRate=0.04313067425617549;
global AT:=AC;
global CG:=AC;
global CT:=1;
global GT:=AC;
[/tt]

Each branch now has two paramters, and compute the dN/dS for a branch you simply divide the two appropriate estimates. Note, that this parameterization allows for dS to be 0, and dN>0, which is the infinite ratio (and can lead to numerical issues) in PAML.

You can also call Analysis->Results->Syn and nonsyn trees to get the trees scaled on E[syn subs] and E[non syn subs].

Finally you can use the GUI (as described in the tutorial, pp13-15), to draw trees on dN/dS and do LRT as Simon suggested.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
avilella
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 35
Re: "Free-ratios"-like analysis in PAML
Reply #3 - Jul 26th, 2005 at 10:10am
 
My datasets are typically of very long seqs (MBs) for few species, and usually the runs of the analyses take a lot of time (hours/days).

If I can pre-set the parameters of the analysis in a params.txt file, then I can batch the jobs to a server computer and run them like:

./HYPHYMP < params.txt > stdout.txt

In my previous tests, I couldn't find a way to "plug" the results of the batch part to the "Analysis->Results" part, in the GUI, that you describe.

Is this possible?

Then, ideally, I would be able to have the longest part of the analysis done as a batch job and then use the resulting data in the GUI for the "Analysis->Results" part.

Thanks in advance,

    Albert.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: "Free-ratios"-like analysis in PAML
Reply #4 - Jul 26th, 2005 at 11:19am
 
Quote:
./HYPHYMP < params.txt > stdout.txt

In my previous tests, I couldn't find a way to "plug" the results of the batch part to the "Analysis->Results" part, in the GUI, that you describe. 

Is this possible?


This is indeed possible.

1). Add the following lines to the end of params.txt (square bracketed stuff are my comments and should not be included in the file)


y [In response to "Continue with result processing?"]
2 [Save Results]
7 [Save all the components of the analysis in a self-contained HyPhy batch file.]
File Name to Store Results in
n [In response to "Continue with result processing?"]


2). Call ./HYPHYMPU CPU=xx -p <params.txt > stdout.thx

where xx is the number of processors on your system.

3). Open the result file made by the first step (with the name you had specified) in the GUI version (File->Open->Open Batch File) and use the Result processing modules.

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged