HyPhy message board - Modeling State Uncertainty

Nicola

YaBB Newbies

Offline

Curious HyPhy user

Posts: 13
Gender: male

Modeling State Uncertainty
Apr 30^th, 2013 at 3:21am

Dear all,

I am developing a special model to allow polymorphisms in a phylogenetic analysis.
Until now I assumed that observed allele frequencies are the "true" ones, but now I want to account for sampling variance ("uncertainty", or "ambiguity" one could say). That is, observed states at leaves will not be really observed any more, and tips of the tree will be similar to inner nodes.

The perfect solution for me would be to specify the input file not the usual way: ">species1 ATTGCA...", but explicitly declaring at each site the probability of each character: ">species1 (A:0.9,C:0.05,G:0.05)TTG(A:0.05,C:0.9,G:0.05)A..." or something similar (in fact, instead of nucleotides I have codons).
I guess that to obtain this result I would have to modify the functions "ReadDataFile", "CreateFilter", probably also "LikelihoodFunction"? Does this sound feasible in a reasonable time?

The alternative option would be to introduce new characters, the "observed" states (Ao, Co, Go, To) distinct from the "unobserved" states (Au, Cu, Gu, Tu), allow only unobserved characters in the inner phylogeny, add "dummy" branches, one below each tree tip, and on those dummy branches allow only substitutions from unobserved to observed states. This should work fine, but should also slow down everything (as the state space would have increased dimension, and I am already basically working with codons).

Alternatively, is there a way to allow at most 1 substitution on a branch?

Thanks, ad best regards,

Nicola De Maio

Back to top

IP Logged

Nicola YaBB Newbies Offline Curious HyPhy user Posts: 13 Gender:	Re: Modeling State Uncertainty Reply #1 - Apr 30^th, 2013 at 3:29am I forgot to mention that I am defining my own bf for HyPhy. Also, I just thought, a nice solution could be to specify the probability matrix instead of the rate matrix for "dummy" branches. Is there an easy way to do that? Thanks again, Nicola De Maio
Back to top	IP Logged

Sergei YaBB Administrator Offline Datamonkeys are forever... Posts: 1658 UCSD Gender:	Re: Modeling State Uncertainty Reply #2 - Apr 30^th, 2013 at 9:38am Hi Nicola, Yes, there is a way to define the transition matrix directly. All you do is include a flag in the Model definition as in: Code: Model MG = (TransitionMatrix, EqFreqs ,EXPLICIT_FORM_MATRIX_EXPONENTIAL); Sergei
Back to top	Associate Professor Division of Infectious Diseases Division of Biomedical Informatics School of Medicine University of California San Diego WWW IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: Modeling State Uncertainty
Reply #3 - Apr 30^th, 2013 at 9:41am

Hi Nicola,

Let me think about how to do what you propose in the most efficient way. There is a hack that allows one to define explicit character frequencies at the leaves, but I only needed it for three taxon trees. The tricky part is not in the likelihood function (it already supports such distributions for ambiguous characters, such as R), but in providing the frequency distribution to the likelihood function from the batch language.

I'll write back in a little bit once I have thought about it some more.

Sergei

Back to top

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Nicola

YaBB Newbies

Offline

Curious HyPhy user

Posts: 13
Gender: male

Re: Modeling State Uncertainty
Reply #4 - Apr 30^th, 2013 at 11:41am

Hi Sergei,

Thank you very much!
I think that I should be able to solve my problems with the command that you suggested me! I will let you know!

One small complication is that I need to enforce the molecular clock on the original tree, but I guess it should be fine once I fix the "dummy" branches to a constant, uniform, length.

Thank you very much again! It was an incredibly fast and helpful reply!

Nicola De Maio

Back to top

IP Logged

Nicola

YaBB Newbies

Offline

Curious HyPhy user

Posts: 13
Gender: male

Re: Modeling State Uncertainty
Reply #5 - May 2^nd, 2013 at 7:08am

Hi Sergei,

I am using the flag that you suggested, with something similar to the following example:

Code:

matrix_probs = {{0.9,0.1,0.0,0.0}{0.0,0.9,0.1,0.0}{0.0,0.1,0.9,0.0}{0.0,0.0,0.1,0.9}};

Model forLeaves = (matrix_probs, Freqs ,EXPLICIT_FORM_MATRIX_EXPONENTIAL);

But I get the error message:

"The expression for the explicit matrix exponential passed to Model must be a valid matrix-valued HyPhy formula that is not an assignment"

While it seems clear to me that I define the probability matrix in the wrong way, it is not clear how I should define it otherwise: as a matrix exponential?

Cheers,
Nicola

Back to top

IP Logged

Nicola

YaBB Newbies

Offline

Curious HyPhy user

Posts: 13
Gender: male

Re: Modeling State Uncertainty
Reply #6 - May 3^rd, 2013 at 4:05am

Hi again!

I found the error, I just had to use "matrix_probs" instead of matrix_probs.

But now I have more problems.
With the MolecularClock() function, If I set:

Code:

Tree   myTree = "((s3{MLeaf})u3,((s2{MLeaf})u2,(s1{MLeaf})u1)));";
myTree.s1.t := 1.0;
myTree.s2.t := 1.0;
myTree.s3.t := 1.0;
LikelihoodFunction theLF = (myFilter, myTree);
MolecularClock (myTree,t);

I get the (reasonable) error message: "Molecular clock constraint has failed, since variable t is not an independent member of the node testTree.s3".

I am sure that I can get around this in particular instances by defining the molecular clock constraints by hand, but, as I am running a topology search algorithm, I would like to have something that works on any tree.

Cheers,
Nicola

Back to top

IP Logged

Nicola

YaBB Newbies

Offline

Curious HyPhy user

Posts: 13
Gender: male

Re: Modeling State Uncertainty
Reply #7 - May 20^th, 2013 at 9:14am

Dear Sergei,

I have the following problem:
the command EXPLICIT_FORM_MATRIX_EXPONENTIAL that you suggested me works fine for a nucleotide state space. Yet, when I use for example dinucleotides or trinucleotides, then I get always a likelihood of -1e+100 (and parameters are not correctly estimated).

I am attaching the simplest possible examples.
In the first case ("nuc") I have a nucleotide state space, and a 4x4 identity matrix, and on a single nucleotide alignment the log-likelihood is correctly 0.
In the second case ("dinuc") I define a dinucleotide state space with a 16x16 identity matrix, a single dinucleotide alignment, but this time the log-likelihood is -1e+100.
Maybe I am doing something wrong, but the two cases are extremely similar, and I can't see what could cause the problem.

Thank you again for the great help so far,
Nicola De Maio

Back to top

Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login (3 KB | 1 )

IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: Modeling State Uncertainty
Reply #8 - May 21^st, 2013 at 1:16pm

Hi Nicola,

It's a HyPhy bug (basically, the 16x16 matrix you specify gets stored in a sparse format, but the LF evaluator assumes that the matrix is dense, which it will be if derived from the rate matrix). I'll push a fix in shortly.

Sergei

PS Fix pushed to github (Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login , Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login). Please check out, recompile HyPhy and the issue should disappear.

Back to top

« Last Edit: May 21^st, 2013 at 4:09pm by Sergei »

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

Nicola YaBB Newbies Offline Curious HyPhy user Posts: 13 Gender:	Re: Modeling State Uncertainty Reply #9 - May 25^th, 2013 at 11:03am Thank you very much! Now seems to work perfectly! Best Regards, Nicola
Back to top	IP Logged

	Welcome, Guest. Please Login