Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
effect of missing data (Read 2346 times)
Dan Fulop
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 29
effect of missing data
Jul 27th, 2011 at 1:10pm
 
Hi,

Does specifying missing nucleotide data with 'N' instead of '-' or '?' affect results when fitting codon models?  That is, are Ns averaged over all resulting codon possibilities, such that an N in the first or second codon positions inflates the number of non-synonymous substitutions?

Thanks!
Dan.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: effect of missing data
Reply #1 - Jul 29th, 2011 at 4:58am
 
Hi Dan,

N's and '-'s are treated as missing data, and will NOT contribute to dN estimates in maximum likelihood analyses. See page 5 of Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

They DO contribute however to frequency calculations (e.g. ANT will contribute 1/4 of AAT, ACT, AGT and ATT to the total tally of each codon). You can override this behavior (for '-' only) by setting

COUNT_GAPS_IN_FREQUENCIES = 0;

in your BF.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Dan Fulop
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 29
Re: effect of missing data
Reply #2 - Jul 29th, 2011 at 11:11am
 
Hi Sergei,

Thanks for your reply.

If I understand correctly then, even though N's and '-' do not contribute to dN estimates they will affect the codon frequencies differently, right?

I wouldn't want to turn off counting frequencies in gaps, but I may not want to potentially bias the frequencies by using N's instead of '-'.

N's in our case are sites for which we do not have enough coverage in a given species to assign it either a SNP or the reference's nucleotide.

So, it seems more appropriate to me to give them a '-' than an 'N' because of their effects on the codon frequencies.  Am I right?

Thanks,
Dan.
Back to top
 
 
IP Logged