Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
character entropy (Read 86 times)
YaBB Newbies

Curious HyPhy user

Posts: 3
character entropy
May 31st, 2013 at 11:16am
Does anyone know how Hyphy calculates the character entropy in a multiple sequence alignment of nucleotide sequences? I've tried to search the manual and this forum but couldn't find the answer.

In addition, can someone suggest what may be the best way to present an entropy plot; instead of the rather crowded bar chart? I'm thinking of using a simple moving average of saying 100bp. Does that sound reasonable? Thank you.
Back to top
IP Logged
YaBB Administrator

Datamonkeys are forever...

Posts: 1658
Gender: male
Re: character entropy
Reply #1 - Jun 3rd, 2013 at 4:54pm
Hi there,

HyPhy uses the standard sum of -p log (p), where the sum is over nucleotides (A,C,G,T), p is the frequency of a given nucleotide, and log has base 2.
The only trick is how HyPhy deals with ambiguous bases (e.g. R,Y), which basically adds fractional counts to the possible resolutions (e.g. an R would contribute 0.5 A and 0.5 G).

There are a number of ways to present a plot of sequence variation: a moving average is reasonable, but depends on what you are trying to show. I would actually encourage you NOT to use entropy: it ignores the fact that all sequences are related by a phylogenetic tree. As an extreme case, consider a site with 25% of each base (the entropy is 2, which is the maximal value for nucleotide data): however if the sequences are related by a tree like this ((A,C),(G,T)), where A stands for all the sequences that have an A, etc, then only 3 substitutions are needed to explain the observed pattern.

I would recommend you use some estimate of site by site evolutionary rates, and then smooth that (e.g. using the SiteRates.bf standard analysis). Better yet, just estimate MEAN rates (or whatever you are interested in) using phylogenetic likelihood (implemented in SlidingWindowAnalysis.bf) -- the output is a CSV file which reports sliding window averages.

Back to top

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
IP Logged