Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
character entropy (Read 1280 times)
Rosz
YaBB Newbies
*
Offline


Curious HyPhy user

Posts: 3
character entropy
May 31st, 2013 at 11:16am
 
Does anyone know how Hyphy calculates the character entropy in a multiple sequence alignment of nucleotide sequences? I've tried to search the manual and this forum but couldn't find the answer.

In addition, can someone suggest what may be the best way to present an entropy plot; instead of the rather crowded bar chart? I'm thinking of using a simple moving average of saying 100bp. Does that sound reasonable? Thank you.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: character entropy
Reply #1 - Jun 3rd, 2013 at 4:54pm
 
Hi there,

HyPhy uses the standard sum of -p log (p), where the sum is over nucleotides (A,C,G,T), p is the frequency of a given nucleotide, and log has base 2.
The only trick is how HyPhy deals with ambiguous bases (e.g. R,Y), which basically adds fractional counts to the possible resolutions (e.g. an R would contribute 0.5 A and 0.5 G).

There are a number of ways to present a plot of sequence variation: a moving average is reasonable, but depends on what you are trying to show. I would actually encourage you NOT to use entropy: it ignores the fact that all sequences are related by a phylogenetic tree. As an extreme case, consider a site with 25% of each base (the entropy is 2, which is the maximal value for nucleotide data): however if the sequences are related by a tree like this ((A,C),(G,T)), where A stands for all the sequences that have an A, etc, then only 3 substitutions are needed to explain the observed pattern.

I would recommend you use some estimate of site by site evolutionary rates, and then smooth that (e.g. using the SiteRates.bf standard analysis). Better yet, just estimate MEAN rates (or whatever you are interested in) using phylogenetic likelihood (implemented in SlidingWindowAnalysis.bf) -- the output is a CSV file which reports sliding window averages.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged