HyPhy message board - ML with defined constraints ?

	Welcome, Guest. Please Login

Home

Help

HyPhy message board › Methodology Questions › How to › ML with defined constraints ?

(Moderators: Sergei, Simon)

‹ Previous Topic | Next Topic ›

Pages: 1

Send Topic

ML with defined constraints ? (Read 1613 times)

Bryony Mackenzie

Guest

ML with defined constraints ?
May 11^th, 2005 at 3:22am

Hello,

I have a very large protein data set (104 partitions, 76 taxa ~17000 amino acid positions). Obviously it is far too large for ML analysis Sad

I would like to set constraints so that the well defined subgroups in my tree are fixed and ML analysis is carried out for the deep branches only.

To do this I need either 1) a program that can take a constraints tree and the large dataset and calculate ML using only deep branch arrangements or 2) to reconstruct the ancestor sequences for each subgroup and use these to reduce the number of taxa in my analysis.

Would HYPHY be suitable for either of these methods ? How large a data set can it cope with on a single processor ? There is a grid system available to me if the program could run in parallel.

many thanks
basm101

IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: ML with defined constraints ?
Reply #1 - May 11^th, 2005 at 7:25am

Dear Bryony,

Your problem is definitely an interesting one...

Quote:

Hello,

I have a very large protein data set (104 partitions, 76 taxa ~17000 amino acid positions). Obviously it is far too large for ML analysis Sad

HyPhy can be used to implement both approaches, however each one would require some work to implement. One could try to do (2) out of the box by: creating a separate data set for each resolved subtree; fitting an amino acid model to that subtree independently (you'd have to root each subtree using an outgroup, since most amino-acid models are time reversible); reconstructing ancestral sequences based on that model fit; replacing the subtree with its MRCA. This can be done via standard analyses (AnalyzeNucProtData.bf, followed by Reconstuct Ancestral Sequences from the Analyses->Results menu).

Are you trying to fit each of the 104 partitions with an individual model, or just treat all 17K residues as a contiguous block?

(1) could also be implemented, however one would need to write quite a bit of custom HyPhy scripting. If you are interested in pursuing this, I'll be happy to give you some pointers and help you along the way.

Quote:

How large a data set can it cope with on a single processor ? There is a grid system available to me if the program could run in parallel.

You should be able to fit your data to a given tree on a single processor machine in a reasonable time (depending on which models and how many of different models are used). The HYPHY batch language can be used to implement the search part of (1) in parallel on an MPI cluster.

Feel free to ask further questions

It would be helpful, however, to have more details (especially about how you want to deal with the 104 separate partitions - a joint analysis - one or multiple models - or a separate fit on each partition)...

HTH,
Sergei

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Pages: 1

Send Topic

‹ Previous Topic | Next Topic ›

« Home

‹ Board

Top of this page