Dear Gustavo,
Quote:We are studying a family of a protein that adopts two different quaternary structures in different species. We can define a substitution model specific for each quaternary structure. Is it possible to perform ML analysis of a given tree (that includes both types of species) combining our quaternary structure-specific substitution models? I know that HYPHY can use different models for different positions in the alignment, but I don't know if it is possible to use specie-specific models.
It is indeed possible to do have different models in different parts of the tree. I am attaching a complete example (based on the standard p51.nex file which comes with the HyPhy distribution; the example will actually download the file from our server) to fit a rather meaningless mix of HKY85 and F81 to different parts of the tree; in this case HKY85 to subtype B viruses and F81 to subtype D viruses.
The key part of the example is the use of extended Newick syntax to define the tree and attach different models to different branches.
One caveat though: the models should have the same equilibrium frequencies; this limitation is NOT imposed by HyPhy - it will happily compute the likelihood, but the likelihood may actually be incorrect, because we can't infer the distribution of characters at the root.
Cheers,
Sergei
Code:/* download the data, make data filter and count character frequencies */
GetURL (p51data, "http://www.hyphy.org/pubs/seqs/p51.nex");
DataSet ds = ReadFromString (p51data);
DataSetFilter dsf = CreateFilter (ds,1);
HarvestFrequencies (freqs, ds, 1, 1, 1);
/* define the F81 and HKY85 rate matrices */
F81Matrix = {{*,t,t,t}
{t,*,t,t}
{t,t,*,t}
{t,t,t,*}};
HKY85Matrix = {{*,tv,ts,tv}
{tv,*,tv,ts}
{ts,tv,*,tv}
{tv,ts,tv,*}};
/* define the F81 and HKY85 models */
Model F81 = (F81Matrix, freqs, 1);
Model HKY85 = (HKY85Matrix, freqs, 1);
/* define the extended tree syntax; model name enclosed in curly braces following the name
of the sequence or a closing ) for internal nodes.
Note that we don't have to specify HKY85 explictly (although it's done below for clarity),
because HyPhy will use the last defined model (or the model set with UseModel), to define
all branches without an explicit model attachment.
*/
Tree p51_tree = ((((D_CD_83_ELI_ACC_K03454{F81},D_CD_83_NDK_ACC_M27323{F81}){F81},
D_UG_94_94UG114_ACC_U88824{F81}){F81},D_CD_84_84ZR085_ACC_U88822{F81}){F81},
B_US_83_RF_ACC_M17451{HKY85},((B_FR_83_HXB2_ACC_K03455{HKY85},B_US_86_JRFL_ACC_U63632{HKY85}){HKY85},
B_US_90_WEAU160_ACC_U21135{HKY85}){HKY85});
LikelihoodFunction lf = (dsf, p51_tree);
Optimize (res, lf);
LIKELIHOOD_FUNCTION_OUTPUT = 1;
fprintf (stdout, lf);