HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> Modeling State Uncertainty
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1367317273

Message started by nicola de maio on Apr 30th, 2013 at 3:21am

Title: Modeling State Uncertainty
Post by nicola de maio on Apr 30th, 2013 at 3:21am
Dear all,

I am developing a special model to allow polymorphisms in a phylogenetic analysis.
Until now I assumed that observed allele frequencies are the "true" ones, but now I want to account for sampling variance ("uncertainty", or "ambiguity" one could say). That is, observed states at leaves will not be really observed any more, and tips of the tree will be similar to inner nodes.

The perfect solution for me would be to specify the input file not the usual way: ">species1 ATTGCA...", but explicitly declaring at each site the probability of each character: ">species1 (A:0.9,C:0.05,G:0.05)TTG(A:0.05,C:0.9,G:0.05)A..." or something similar (in fact, instead of nucleotides I have codons).
I guess that to obtain this result I would have to modify the functions "ReadDataFile", "CreateFilter", probably also "LikelihoodFunction"? Does this sound feasible in a reasonable time?

The alternative option would be to introduce new characters, the "observed" states (Ao, Co, Go, To) distinct from the "unobserved" states (Au, Cu, Gu, Tu), allow only unobserved characters in the inner phylogeny, add "dummy" branches, one below each tree tip, and on those dummy branches allow only substitutions from unobserved to observed states. This should work fine, but should also slow down everything (as the state space would have increased dimension, and I am already basically working with codons).

Alternatively, is there a way to allow at most 1 substitution on a branch?


Thanks, ad best regards,

Nicola De Maio

Title: Re: Modeling State Uncertainty
Post by nicola de maio on Apr 30th, 2013 at 3:29am
I forgot to mention that I am defining my own bf for HyPhy.

Also, I just thought, a nice solution could be to specify the probability matrix instead of the rate matrix for "dummy" branches. Is there an easy way to do that?

Thanks again,

Nicola De Maio

Title: Re: Modeling State Uncertainty
Post by Sergei on Apr 30th, 2013 at 9:38am
Hi Nicola,

Yes, there is a way to define the transition matrix directly. All you do is include a flag in the Model definition as in:

[code]
Model            MG = (TransitionMatrix, EqFreqs ,EXPLICIT_FORM_MATRIX_EXPONENTIAL);
[/code]

Sergei

Title: Re: Modeling State Uncertainty
Post by Sergei on Apr 30th, 2013 at 9:41am
Hi Nicola,

Let me think about how to do what you propose in the most efficient way. There is a hack that allows one to define explicit character frequencies at the leaves, but I only needed it for three taxon trees. The tricky part is not in the likelihood function (it already supports such distributions for ambiguous characters, such as R), but in providing the frequency distribution to the likelihood function from the batch language.

I'll write back in a little bit once I have thought about it some more.

Sergei

Title: Re: Modeling State Uncertainty
Post by nicola de maio on Apr 30th, 2013 at 11:41am
Hi Sergei,

Thank you very much!
I think that I should be able to solve my problems with the command that you suggested me! I will let you know!

One small complication is that I need to enforce the molecular clock on the original tree, but I guess it should be fine once I fix the "dummy" branches to a constant, uniform, length.

Thank you very much again! It was an incredibly fast and helpful reply!

Nicola De Maio

Title: Re: Modeling State Uncertainty
Post by nicola de maio on May 2nd, 2013 at 7:08am
Hi Sergei,

I am using the flag that you suggested, with something similar to the following example:

[code]
matrix_probs = {{0.9,0.1,0.0,0.0}{0.0,0.9,0.1,0.0}{0.0,0.1,0.9,0.0}{0.0,0.0,0.1,0.9}};

Model forLeaves = (matrix_probs, Freqs ,EXPLICIT_FORM_MATRIX_EXPONENTIAL);[/code]

But I get the error message:


"The expression for the explicit matrix exponential passed to Model must be a valid matrix-valued HyPhy formula that is not an assignment"


While it seems clear to me that I define the probability matrix in the wrong way, it is not clear how I should define it otherwise: as a matrix exponential?


Cheers,
Nicola

Title: Re: Modeling State Uncertainty
Post by nicola de maio on May 3rd, 2013 at 4:05am
Hi again!

I found the error, I just had to use "matrix_probs" instead of matrix_probs.

But now I have more problems.
With the MolecularClock() function, If I set:

[code]
Tree   myTree = "((s3{MLeaf})u3,((s2{MLeaf})u2,(s1{MLeaf})u1)));";
myTree.s1.t := 1.0;
myTree.s2.t := 1.0;
myTree.s3.t := 1.0;
LikelihoodFunction theLF = (myFilter, myTree);
MolecularClock (myTree,t);
[/code]

I get the (reasonable) error message: "Molecular clock constraint has failed, since variable t is not an independent member of the node testTree.s3".

I am sure that I can get around this in particular instances by defining the molecular clock constraints by hand, but, as I am running a topology search algorithm, I would like to have something that works on any tree.

Cheers,
Nicola

Title: Re: Modeling State Uncertainty
Post by nicola de maio on May 20th, 2013 at 9:14am
Dear Sergei,

I have the following problem:
the command EXPLICIT_FORM_MATRIX_EXPONENTIAL that you suggested me works fine for a nucleotide state space. Yet, when I use for example dinucleotides or trinucleotides, then I get always a likelihood of -1e+100 (and parameters are not correctly estimated).

I am attaching the simplest possible examples.
In the first case ("nuc") I have a nucleotide state space, and a 4x4 identity matrix, and on a single nucleotide alignment the log-likelihood is correctly 0.
In the second case ("dinuc") I define a dinucleotide state space with a 16x16 identity matrix, a single dinucleotide alignment, but this time the log-likelihood is -1e+100.
Maybe I am doing something wrong, but the two cases are extremely similar, and I can't see what could cause the problem.

Thank you again for the great help so far,
Nicola De Maio
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=Examples.zip (3 KB | 1 )

Title: Re: Modeling State Uncertainty
Post by Sergei on May 21st, 2013 at 1:16pm
Hi Nicola,

It's a HyPhy bug (basically, the 16x16 matrix you specify gets stored in a sparse format, but the LF evaluator assumes that the matrix is dense, which it will be if derived from the rate matrix). I'll push a fix in shortly.

Sergei

PS Fix pushed to github (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login , Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login). Please check out, recompile HyPhy and the issue should disappear.

Title: Re: Modeling State Uncertainty
Post by nicola de maio on May 25th, 2013 at 11:03am
Thank you very much!
Now seems to work perfectly!

Best Regards,
Nicola

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.