Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Site Rate with custom BF, base freqs problem (Read 2400 times)
Francesc
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 32
Site Rate with custom BF, base freqs problem
Sep 6th, 2011 at 1:22pm
 
Hi.
I have been in contact before concerning to a BF that I created with your help to estimate substitution rates in an alignment for each site independently. I am using the BF in a pipeline of a webapp. THANKS!

Please, find attached the BF and samples files.
The BF has hardcoded the input files and the model (HKY85 with kappa=2).

First issue:
I am estimating the base freqs using:
Code:
HarvestFrequencies (Freqs, myFilter, 1, 1, 1); 



and I am getting the following:
Quote:
Base freqs: {
{    0.191379310345}
{    0.170689655172}
{    0.274137931034}
{    0.363793103448}
}


If I calculate them manually or with phyml I am getting:
Quote:
0.1463
0.1098
0.2927
0.4512


Estimating the freqs in the maximum likelihood framework with phyml:
Quote:
0.2413
0.1309
0.2497
0.3781


Isn't that function estimating the frequencies from the data?
Is there any error in Hyphy? Are they estimated in the LH framework but I am getting different results than PhyML? Which one should I use?

Issue 2:
I am getting rates when the column in the alignment is full of indels except for 1 site ("one informative site", sites 1 and 2 in the sample alignment, see below).  That doesn't make much sense. Is there any way to "filter" that sites (or filter sites with a minimum X number of informative sites) and maybe show them us "undefined rate"?


Issue 3:
Usually, I use chronograms with millions years as units for branch lengths. That implies that some branches can be easily over 100. I usually change the unit to work with values <10 and then I modify the rates with the same factor accordingly. I am including the same tree in different magnitudes and there is a problem when units are too high.
Why is this happening? What's the proper range to work with?

Original tree:
Quote:
Chronogram length (time units): 578
Site  1 Rate =  1.1109 subst/time, info sites = 1
Site  2 Rate =  1.1109 subst/time, info sites = 1
Site  3 Rate =  1.0148 subst/time, info sites = 2
Site  4 Rate =  1.1477 subst/time, info sites = 5
Site  5 Rate =  1.0514 subst/time, info sites = 5
Site  6 Rate =  1.1476 subst/time, info sites = 5
Site  7 Rate =  1.0514 subst/time, info sites = 5
Site  8 Rate =  1.0514 subst/time, info sites = 5
Site  9 Rate =  1.0514 subst/time, info sites = 5
Site 10 Rate =  1.1477 subst/time, info sites = 5


Original tree divided by 100:

Quote:
Chronogram length (time units): 5.78
Site  1 Total Rate =  0.9914 subst/time, info sites = 1
Site  2 Total Rate =  1.0522 subst/time, info sites = 1
Site  3 Total Rate =  0.0000 subst/time, info sites = 2
Site  4 Total Rate =  0.0000 subst/time, info sites = 5
Site  5 Total Rate =  0.0000 subst/time, info sites = 5
Site  6 Total Rate =  0.3230 subst/time, info sites = 5
Site  7 Total Rate =  0.2319 subst/time, info sites = 5
Site  8 Total Rate =  2.9807 subst/time, info sites = 5
Site  9 Total Rate =  0.0000 subst/time, info sites = 5
Site 10 Total Rate =  0.0000 subst/time, info sites = 5



Original tree divided by 1000:
Quote:
Chronogram length (time units): 0.578
Site 1 Rate =  1.1470 subst/time, info sites = 1
Site 2 Rate =  1.0697 subst/time, info sites = 1
Site 3 Rate =  0.0000 subst/time, info sites = 2
Site 4 Rate =  0.0000 subst/time, info sites = 5
Site 5 Rate =  0.0000 subst/time, info sites = 5
Site 6 Rate =  3.2301 subst/time, info sites = 5
Site 7 Rate =  2.3185 subst/time, info sites = 5
Site 8 Rate = 29.8072 subst/time, info sites = 5
Site 9 Rate =  0.0000 subst/time, info sites = 5
Site10 Rate =  0.0000 subst/time, info sites = 5



For the last two trees in which there is no branch lengths bigger than two, the columns with enough informative sites have the same rate (after applying correction factor).


Thank you very much for your time and help,
Francesc

Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (2 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site Rate with custom BF, base freqs problem
Reply #1 - Sep 12th, 2011 at 3:42pm
 
Hi Francesc,

1). Add the line

Code:
COUNT_GAPS_IN_FREQUENCIES = 0;
 



at the top of your .bf file. By default, HyPhy counts gaps as contributing .25 to each nucleotide count.

2). A site with a single character has a likelihood that doesn't depend on model parameters. It is simply the probability of drawing the corresponding letter (e.g. 'C' in the first site of your example) from the equilibrium distribution. Because of this, rate estimates for such sites are meaningless (any value of parameters will give exactly the same likelihood). You can filter such sites, for example, by checking that they don't match a regular expression (see attached example).

3). Set the initial rate value to something small, otherwise you may end up pushing branch lengths to infinity and the optimizer won't be able to find its way to the maximum (because the likelihood surface is flat at infinite branch lengths). A good rule of thumb is to set siteRate = 0.1/chronoLength;

Sergei

Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (4 KB | )

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Francesc
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 32
Re: Site Rate with custom BF, base freqs problem
Reply #2 - Sep 14th, 2011 at 5:02pm
 
Thanks Sergei.
I've never caught you off guard.
I incorporated all changes in the BF and in the web app.

I don't know If I mentioned that to you before...
the website is this one:
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

and this is the publication:
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Thanks for your patience and help,
Francesc
Back to top
 
 
IP Logged