HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
HYPHY Package >> HyPhy feedback >> character entropy
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1370024179

Message started by Rosz on May 31st, 2013 at 11:16am

Title: character entropy
Post by Rosz on May 31st, 2013 at 11:16am
Does anyone know how Hyphy calculates the character entropy in a multiple sequence alignment of nucleotide sequences? I've tried to search the manual and this forum but couldn't find the answer.

In addition, can someone suggest what may be the best way to present an entropy plot; instead of the rather crowded bar chart? I'm thinking of using a simple moving average of saying 100bp. Does that sound reasonable? Thank you.

Title: Re: character entropy
Post by Sergei on Jun 3rd, 2013 at 4:54pm
Hi there,

HyPhy uses the standard sum of -p log (p), where the sum is over nucleotides (A,C,G,T), p is the frequency of a given nucleotide, and log has base 2.
The only trick is how HyPhy deals with ambiguous bases (e.g. R,Y), which basically adds fractional counts to the possible resolutions (e.g. an R would contribute 0.5 A and 0.5 G).

There are a number of ways to present a plot of sequence variation: a moving average is reasonable, but depends on what you are trying to show. I would actually encourage you NOT to use entropy: it ignores the fact that all sequences are related by a phylogenetic tree. As an extreme case, consider a site with 25% of each base (the entropy is 2, which is the maximal value for nucleotide data): however if the sequences are related by a tree like this ((A,C),(G,T)), where A stands for all the sequences that have an A, etc, then only 3 substitutions are needed to explain the observed pattern.

I would recommend you use some estimate of site by site evolutionary rates, and then smooth that (e.g. using the SiteRates.bf standard analysis). Better yet, just estimate MEAN rates (or whatever you are interested in) using phylogenetic likelihood (implemented in SlidingWindowAnalysis.bf) -- the output is a CSV file which reports sliding window averages.

Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.