Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Modeling State Uncertainty (Read 5718 times)
Nicola
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 13
Gender: male
Modeling State Uncertainty
Apr 30th, 2013 at 3:21am
 
Dear all,

I am developing a special model to allow polymorphisms in a phylogenetic analysis.
Until now I assumed that observed allele frequencies are the "true" ones, but now I want to account for sampling variance ("uncertainty", or "ambiguity" one could say). That is, observed states at leaves will not be really observed any more, and tips of the tree will be similar to inner nodes.

The perfect solution for me would be to specify the input file not the usual way: ">species1 ATTGCA...", but explicitly declaring at each site the probability of each character: ">species1 (A:0.9,C:0.05,G:0.05)TTG(A:0.05,C:0.9,G:0.05)A..." or something similar (in fact, instead of nucleotides I have codons).
I guess that to obtain this result I would have to modify the functions "ReadDataFile", "CreateFilter", probably also "LikelihoodFunction"? Does this sound feasible in a reasonable time?

The alternative option would be to introduce new characters, the "observed" states (Ao, Co, Go, To) distinct from the "unobserved" states (Au, Cu, Gu, Tu), allow only unobserved characters in the inner phylogeny, add "dummy" branches, one below each tree tip, and on those dummy branches allow only substitutions from unobserved to observed states. This should work fine, but should also slow down everything (as the state space would have increased dimension, and I am already basically working with codons).

Alternatively, is there a way to allow at most 1 substitution on a branch?


Thanks, ad best regards,

Nicola De Maio
Back to top
 
 
IP Logged
 
Nicola
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 13
Gender: male
Re: Modeling State Uncertainty
Reply #1 - Apr 30th, 2013 at 3:29am
 
I forgot to mention that I am defining my own bf for HyPhy.

Also, I just thought, a nice solution could be to specify the probability matrix instead of the rate matrix for "dummy" branches. Is there an easy way to do that?

Thanks again,

Nicola De Maio
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Modeling State Uncertainty
Reply #2 - Apr 30th, 2013 at 9:38am
 
Hi Nicola,

Yes, there is a way to define the transition matrix directly. All you do is include a flag in the Model definition as in:

Code:
Model 		MG = (TransitionMatrix, EqFreqs ,EXPLICIT_FORM_MATRIX_EXPONENTIAL);
 



Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Modeling State Uncertainty
Reply #3 - Apr 30th, 2013 at 9:41am
 
Hi Nicola,

Let me think about how to do what you propose in the most efficient way. There is a hack that allows one to define explicit character frequencies at the leaves, but I only needed it for three taxon trees. The tricky part is not in the likelihood function (it already supports such distributions for ambiguous characters, such as R), but in providing the frequency distribution to the likelihood function from the batch language.

I'll write back in a little bit once I have thought about it some more.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Nicola
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 13
Gender: male
Re: Modeling State Uncertainty
Reply #4 - Apr 30th, 2013 at 11:41am
 
Hi Sergei,

Thank you very much!
I think that I should be able to solve my problems with the command that you suggested me! I will let you know!

One small complication is that I need to enforce the molecular clock on the original tree, but I guess it should be fine once I fix the "dummy" branches to a constant, uniform, length.

Thank you very much again! It was an incredibly fast and helpful reply!

Nicola De Maio
Back to top
 
 
IP Logged
 
Nicola
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 13
Gender: male
Re: Modeling State Uncertainty
Reply #5 - May 2nd, 2013 at 7:08am
 
Hi Sergei,

I am using the flag that you suggested, with something similar to the following example:

Code:
matrix_probs = {{0.9,0.1,0.0,0.0}{0.0,0.9,0.1,0.0}{0.0,0.1,0.9,0.0}{0.0,0.0,0.1,0.9}};

Model forLeaves = (matrix_probs, Freqs ,EXPLICIT_FORM_MATRIX_EXPONENTIAL); 



But I get the error message:


"The expression for the explicit matrix exponential passed to Model must be a valid matrix-valued HyPhy formula that is not an assignment"


While it seems clear to me that I define the probability matrix in the wrong way, it is not clear how I should define it otherwise: as a matrix exponential?


Cheers,
Nicola
Back to top
 
 
IP Logged
 
Nicola
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 13
Gender: male
Re: Modeling State Uncertainty
Reply #6 - May 3rd, 2013 at 4:05am
 
Hi again!

I found the error, I just had to use "matrix_probs" instead of matrix_probs.

But now I have more problems.
With the MolecularClock() function, If I set:

Code:
Tree   myTree = "((s3{MLeaf})u3,((s2{MLeaf})u2,(s1{MLeaf})u1)));";
myTree.s1.t := 1.0;
myTree.s2.t := 1.0;
myTree.s3.t := 1.0;
LikelihoodFunction theLF = (myFilter, myTree);
MolecularClock (myTree,t);
 



I get the (reasonable) error message: "Molecular clock constraint has failed, since variable t is not an independent member of the node testTree.s3".

I am sure that I can get around this in particular instances by defining the molecular clock constraints by hand, but, as I am running a topology search algorithm, I would like to have something that works on any tree.

Cheers,
Nicola
Back to top
 
 
IP Logged
 
Nicola
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 13
Gender: male
Re: Modeling State Uncertainty
Reply #7 - May 20th, 2013 at 9:14am
 
Dear Sergei,

I have the following problem:
the command EXPLICIT_FORM_MATRIX_EXPONENTIAL that you suggested me works fine for a nucleotide state space. Yet, when I use for example dinucleotides or trinucleotides, then I get always a likelihood of -1e+100 (and parameters are not correctly estimated).

I am attaching the simplest possible examples.
In the first case ("nuc") I have a nucleotide state space, and a 4x4 identity matrix, and on a single nucleotide alignment the log-likelihood is correctly 0.
In the second case ("dinuc") I define a dinucleotide state space with a 16x16 identity matrix, a single dinucleotide alignment, but this time the log-likelihood is -1e+100.
Maybe I am doing something wrong, but the two cases are extremely similar, and I can't see what could cause the problem.

Thank you again for the great help so far,
Nicola De Maio
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (3 KB | 1 )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Modeling State Uncertainty
Reply #8 - May 21st, 2013 at 1:16pm
 
Hi Nicola,

It's a HyPhy bug (basically, the 16x16 matrix you specify gets stored in a sparse format, but the LF evaluator assumes that the matrix is dense, which it will be if derived from the rate matrix). I'll push a fix in shortly.

Sergei

PS Fix pushed to github (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login , Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login). Please check out, recompile HyPhy and the issue should disappear.
Back to top
« Last Edit: May 21st, 2013 at 4:09pm by Sergei »  

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Nicola
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 13
Gender: male
Re: Modeling State Uncertainty
Reply #9 - May 25th, 2013 at 11:03am
 
Thank you very much!
Now seems to work perfectly!

Best Regards,
Nicola
Back to top
 
 
IP Logged