Message started by Epistatic on Oct 26th, 2006 at 11:51am

Title: Site specific rate estimation (SiteRates.bf)
Post by Epistatic on Oct 26th, 2006 at 11:51am

I need to estimate site specific rates of my alignments, as DNArates does.  I chose to use HYPHY because it is really easy to use and also will estimate the site rates for protein alignments.

I compared my output for a nucleotide file for DNArates and have noticed a few differences.  The branchlengths of my input tree effect the rate estimates in DNArates, if I half all my branch lengths the rate estimates change.  If I half my branch lengths, move decimal point over, or even input a tree with zero branch lengths, HYPHY will give me the same rate estimation.  I was curious if HYPHY was even using the input tree and changed the topology and got slightly different rates.

I would appreciate knowing how HYPHY is estimating the substitution rates in reference to the input tree.  I am using the rate estimates to profile phylogenetic informativeness of genes and would like to use and recommend HYPHY instead of DNArates.

Thank you!

Title: Re: Site specific rate estimation (SiteRates.bf)
Post by Sergei on Oct 26th, 2006 at 11:59am

For this analysis, HyPhy does not use branch lengths input with the tree at all. What HyPhy reports are rate*Tree Length, which is due to the standard confounding of evolutionary rates and times. Branch lengths are estimated from the entire alignment before site-by-site estimation of rates is carried out.

If you want to decouple rates and times (e.g. you have some information to date the tree), then it's a simple matter of dividing the [known] tree length from the output.

Hope this helps,

Title: Re: Site specific rate estimation (SiteRates.bf)
Post by Travis_Clark on Oct 26th, 2006 at 12:15pm
WOW, Thank you for the ultra-quick reply.

I was inputing a chronogram (made with r8s) as my input tree.  If I understand your reply correctly, the HYPHY output is the number of times a site changed on the tree, not the actual rate?  I will divide the tree length from the output as you suggest!

Thank you again!

Title: Re: Site specific rate estimation (SiteRates.bf)
Post by Sergei on Oct 26th, 2006 at 12:43pm
Dear Travis,

Essentially, this should work. Rates are measured in expected substitutions/site/unit time, not quite in the number of times the site changed, but the same in spirit.


Title: Re: Site specific rate estimation (SiteRates.bf)
Post by Travis_Clark on Jan 4th, 2007 at 6:49am
Hi Sergei,

I somewhat came back to my original question because I am not confident I am analyzing my data correctly.

In the analysis HYPHY outputs a Newick formatted tree with branch lengths, is this the tree length that is estimated by HYPHY to do the analysis?  If so, is this the proper tree length to divide the data by to get the rate?

I believe I was in error when I previously divided by the tree length of my final phylogeny for the rate estimations of different genes.  I was finding odd results with the same gene analyzing it in nucleotide versus protein sequence.

Best regards,


Title: Re: Site specific rate estimation (SiteRates.bf)
Post by Sergei on Jan 4th, 2007 at 11:03am
Dear Travis,

You should NOT use the tree output by HyPhy to determine the total tree length, because that length is the product of (mean rate)*evolutionary_time, hence normalizing your site-specific rate estimate by this length will result in
(site rate)/((mean rate) * (evolutionary time))

Site rates, as output by HyPhy, are relative to the 'mean' (not necessarily average, I use this term loosely) rate for the entire alignment, and can only be compared to each other. Because all sites in the same gene (most likely) evolved for the same duration of time, this comparison is meaningful.

If you have some other means of estimating tree lengths (e.g. based on molecular clock), then you can divide out the time.

What are some of the oddities you are seeing?

Hope this helps.

Title: Re: Site specific rate estimation (SiteRates.bf)
Post by nicola de maio on Nov 30th, 2013 at 11:10am
Dear Sergei,

I have the opposite problem than Travis.
I want to use HyPhy for dating, assuming that I specify my transition matrix Q with the correct mutation rates.
Now, Q is a bit complicated (PoMo).
Nevertheless, I understand from this thread that I can estimate the time t defining the probability matrix e^tQ of a branch, using the HyPhy output branch length T:
t=T/(- \sum_i \pi_i * q_{ii})
with \pi_i the equilibrium frequency of state i.

Do you think this is correct?
Best wishes,
Nicola De Maio

