HyPhy message board | |
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
HYPHY Package >> HyPhy feedback >> Site Rate with custom BF, base freqs problem http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1315340521 Message started by Francesc on Sep 6th, 2011 at 1:22pm |
Title: Site Rate with custom BF, base freqs problem Post by Francesc on Sep 6th, 2011 at 1:22pm
Hi.
I have been in contact before concerning to a BF that I created with your help to estimate substitution rates in an alignment for each site independently. I am using the BF in a pipeline of a webapp. THANKS! Please, find attached the BF and samples files. The BF has hardcoded the input files and the model (HKY85 with kappa=2). First issue: I am estimating the base freqs using: [code]HarvestFrequencies (Freqs, myFilter, 1, 1, 1);[/code] and I am getting the following: Quote:
If I calculate them manually or with phyml I am getting: Quote:
Estimating the freqs in the maximum likelihood framework with phyml: Quote:
Isn't that function estimating the frequencies from the data? Is there any error in Hyphy? Are they estimated in the LH framework but I am getting different results than PhyML? Which one should I use? Issue 2: I am getting rates when the column in the alignment is full of indels except for 1 site ("one informative site", sites 1 and 2 in the sample alignment, see below). That doesn't make much sense. Is there any way to "filter" that sites (or filter sites with a minimum X number of informative sites) and maybe show them us "undefined rate"? Issue 3: Usually, I use chronograms with millions years as units for branch lengths. That implies that some branches can be easily over 100. I usually change the unit to work with values <10 and then I modify the rates with the same factor accordingly. I am including the same tree in different magnitudes and there is a problem when units are too high. Why is this happening? What's the proper range to work with? Original tree: Quote:
Original tree divided by 100: Quote:
Original tree divided by 1000: Quote:
For the last two trees in which there is no branch lengths bigger than two, the columns with enough informative sites have the same rate (after applying correction factor). Thank you very much for your time and help, Francesc http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=BF_and_sample_files.zip (2 KB | )
|
Title: Re: Site Rate with custom BF, base freqs problem Post by Sergei on Sep 12th, 2011 at 3:42pm
Hi Francesc,
1). Add the line [code] COUNT_GAPS_IN_FREQUENCIES = 0; [/code] at the top of your .bf file. By default, HyPhy counts gaps as contributing .25 to each nucleotide count. 2). A site with a single character has a likelihood that doesn't depend on model parameters. It is simply the probability of drawing the corresponding letter (e.g. 'C' in the first site of your example) from the equilibrium distribution. Because of this, rate estimates for such sites are meaningless (any value of parameters will give exactly the same likelihood). You can filter such sites, for example, by checking that they don't match a regular expression (see attached example). 3). Set the initial rate value to something small, otherwise you may end up pushing branch lengths to infinity and the optimizer won't be able to find its way to the maximum (because the likelihood surface is flat at infinite branch lengths). A good rule of thumb is to set siteRate = 0.1/chronoLength; Sergei http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?action=downloadfile;file=SiteRates_GTR_002.bf (4 KB | )
|
HyPhy message board » Powered by YaBB 2.5.2! YaBB Forum Software © 2000-2024. All Rights Reserved. |