Hi,
I am NOT a bioinformatician, nor a statistician, nor a molecular evolution guru, so please go easy on me
-- these may well be dumb questions with obvious answers.
I have what I would guess to be a somewhat unusual situation for this board: I have done some evolution experiments in which defined CLONES of small DNA viruses were inoculated into hosts and then passaged over a period of time. Virus DNA was then isolated and cloned, and a small number (eleven) of complete 2.7 Kb genomes were sequenced. All in all, the data set includes 41 unique point mutations.
i.e. I know exactly what the ancestral sequence(s) were, and I can define each and every mutation precisely
First question:
Is there a relatively simple/easy-to-use application out there that will tell me what the expected dN/dS should be given:
a) a known input sequence,
b) specified frequencies for each of the 12 possible mutation types (i.e. A->C, A->G, A->T, C->A, C->G, etc.),
c) the assumption of neutral drift (i.e. neither "positive" nor "purifying" selection)
d) the assumption that each position is equally likely to become mutated?
- I would like to do a chi-squared test to determine whether my observed dN/dS is significantly different from what I would expect to see in the absence of selection.
Which brings me to my second question:
How/why does everyone seem to assume that:
dN/dS > 1 means positive selection,
dN/dS = 1 means no selection,
dN/dS < 1 means purifying selection ?
What I mean is, does it really work out such that random mutations in a "typical" coding sequence result in an equal number of synonymous and non-synonymous mutations..?!? Even though the first position almost certainly results in a non-synonymous mutation, while at the other two positions it usually results in synonymous mutation? Surely we would expect a dN/dS closer to 1/3 in the absence of selection?
I'd really appreciate some help/insight on either of these two questions (preferably both!)
Eric