|
Description
|
This example demonstrates how to use a codon data set, consisting of
several partitions with different functional constraints to compare selective pressures among the partitions.
This example is very similar to some of the tests performed by
Yang in Swanson (2002).
The logic of the test is quite straitforward
- Start with a codon data set with two (or more) classes of sites (e.g. buried and exposed sites)
- Fit a model of codon-subsitution indepedentlyto each data set (alternative hypothesis). The user has great flexibility here in terms
of which model paramaters are estimated independently (e.g. base frequencies, branch lengths, nucleotide bias rates, etc)
- Fit a model from the previous step, but forcing the dN/dS ratio to be the same in both partitions,
- Test whether or not the dN/dS are significantly different by applying the likelihood ratio test to the two models.
|
|
Load and set up an example data set.
|
We will use the lysin data set analyzed by Swanson and Yang.
The data file is NEXUS format and includes some partitioning information in an
ASSUMPTION block. The example file can be obtained
here (you can also load
your own data file and define the partition using one the many tools in the
HyPhy data panel).
Save this example
file on your hard drive, open it in HyPhy, using File:Open:Open Data File, then
change model settings to look like in the picture below.
We use the Muse Gaut 94 model with two additional parameters to correct for
the transition/transversion bias, plus unequal rates of A←→G and C←→T transitions
Note the yellow light in the status bar:
HyPhy is now prepared to build the likelihood function for the alternative hypothesis: all model paramaters
(branch lengths, base frequencies, dN/dS, both nucleotide bias rates) are estimated independently
for each partition.
|
|
Fit the alternative hypothesis
|
Choose 'Likelihood:Build Function', followed by 'Likelihood:Optimize'.
HyPhy will now proceed to fit the model, and display the table with
parameter estimates as shown below.
Note that the buried partition has a dN/dS estimate of 0.44, while
the exposed partition has 1.22; the log-likelihood of the alternative model is -4428.83.
Save this likelihood function state by clicking in the pulldown which shows Current LF and choosing
Save LF State and naming your hypothesis "Independent dN/dS Model". Having done this (note
that active LF state is "Independent dN/dS Model") pull the same menu down
and set choose "Select as alternative".
|
|
Fit the null hypothesis
|
Clock on the row for "buried_Shared_R" and shift-click on the row for "exposed_Shared_R" (both rows will now be highlighted).
Click on the 'Constrain selected parameters to be equal' button in the window button bar (2nd from the left), and choose
which constraint to impose (they are equivalent, so you can choose either).
Select 'Likelihood:Optimize LF' and wait while HyPhy re-estimates parameter values with the constraint enforced. The result
will look like this:
Follow the same steps as for the alternative hypothesis to save the results of the Null model (name them, e.g. "Constrained dN/dS"),
and select the saved LF state as 'Null'.
Therefore
|
|
Test dN/dS for equality between partitions.
|
Lastly choose, 'LRT' from the window pulldown menu. The results are printed to the console:
Likelihood Ratio Test
2*LR = 23.5078
DF = 1
P-Value = 1.2441e-06
Consequently, the data suggest that dN/dS differ between partitions with a high degree of confidence. One can also perform a parameteric boostrap simulation
to verify the p-value (this is a time consuming procedure!).
|
|
Save the results.
|
|
Switch back to the data panel and select 'File:Save', choosing 'Format: Include sequence data, NEXUS' in the file dialog.
Choose a file name for the analysis and save it do disk. The resulting file can be opened by HyPhy (using 'File:Open:Open Batch File'),
and all the model fits and hypotheses will be restored.
|