Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
mammalian sequence analysis (Read 1657 times)
sunilkalmadi
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 15
mammalian sequence analysis
Jan 25th, 2008 at 5:59pm
 
Dears,
First of all, I would like to like congratulate the authors for developing such wonderful software package and more importantly, even better written manuals understandable even by people like me who are new to molecular evolutionary studies. However, since the documentation was more focused towards viral sequences, I had certain queries related to mammalian sequences which I would like to clarify,

1.Global mean DN/DS value doesn’t give any idea about rate heterogeneity at different sites.  I would like to know what is a classical criterion for categorizing a gene “positively/negatively selected”. I mean is it global DN/DS OR number of selected sites OR  their proportion  OR selection strength?  

2.Will a nucleotide substitution bias model and codon model estimated for a mammalian sequence alignment data hold good for other mammalian datasets as well? OR is it necessary to estimate these models for each dataset?

3.If the study is focused towards mammalian phylogeny, the  number of species specific genic  sequences are  usually less (4 – 8). Is this dataset sufficiently large enough for inferring positive selection sites? SLAC is for larger datasets, REL gives higher false positive for smaller datasets, Whether FEL results can be expected to hold good for these small datasets? If yes,  what would be a conservative level of significance ?

4.Whether a universally accepted tree topology based on mitochondrial DNA can used for analyzing mammalian datasets OR is it necessary to construct a tree using the sequence alignment itself? Since chances of horizontal gene transfer at mammalian level is low, is there any reason to consider GARD analysis for recombination detection prior selection analysis?

Thanks in advance.
Cheers.
Sunil

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: mammalian sequence analysis
Reply #1 - Jan 27th, 2008 at 6:04pm
 
Dear Sunil,

Good questions!


Quote:
1.Global mean DN/DS value doesn’t give any idea about rate heterogeneity at different sites.  I would like to know what is a classical criterion for categorizing a gene “positively/negatively selected”. I mean is it global DN/DS OR number of selected sites OR  their proportion  OR selection strength?  


I would say that a traditional test is to claim positive selection if there is a p>0 proportion of sites with dN>dS (i.e. a selection operates on some sites). You could also say that if at least one site is detected by a site-wise method (but now after an appropriate correction for multiple testing), then a gene is under selection. Take a look at our selection book chapter (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) for further insight. If you are after this test, I would recommend
PARRIS (implemented in datamonkey.org), as it can also deal with recombinant data.

Quote:
2.Will a nucleotide substitution bias model and codon model estimated for a mammalian sequence alignment data hold good for other mammalian datasets as well? OR is it necessary to estimate these models for each dataset?


It is probably best to estimate the appropriate model for each data set; it's not computationally expensive.
Different genes may have different models, as well as alignments of different sizes (simpler models for smaller alignments, complex models for larger alignments).

Quote:
3.If the study is focused towards mammalian phylogeny, the  number of species specific genic  sequences are  usually less (4 – 8). Is this dataset sufficiently large enough for inferring positive selection sites? SLAC is for larger datasets, REL gives higher false positive for smaller datasets, Whether FEL results can be expected to hold good for these small datasets? If yes,  what would be a conservative level of significance ?


Generally speaking, 4-8 sequences have almost no power to detect selection at a single site. You are better off testing for selection in general (i.e. with PARRIS), or doing a region test - i.e. you can partition your gene into surface and buried residues (if that is known), or similarly based on structure and compare dN/dS between the two partitions. SLAC and FEL will have very little power with 4-8 sequences, and REL could suffer from errors associated with parameter estimates derived from small data sets. To detect selection from 4-8 sequences your model must be spot on and have very few parameters - in reality even the best models are biologically wrong, and we don't really know a priori which parameters matter (i.e. do synonymous rates vary from site to site) and which don't.

Quote:
4.Whether a universally accepted tree topology based on mitochondrial DNA can used for analyzing mammalian datasets OR is it necessary to construct a tree using the sequence alignment itself? Since chances of horizontal gene transfer at mammalian level is low, is there any reason to consider GARD analysis for recombination detection prior selection analysis?


The first part has to do with gene trees vs species trees; I can't really say without looking at your genes. As far as screening for possible recombination: in some gene families in mammals (i.e. immune genes such as interferon), there is a lot of gene conversion, which would look as recombination in phylogenies. GARD is fairly conservative; you could always try your analysis with and without a GARD screen to see if that makes a difference.

Cheers,
Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
sunilkalmadi
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 15
Re: mammalian sequence analysis
Reply #2 - Jan 28th, 2008 at 9:43am
 
Thanks a lot Sergei.. that was very informative.. i hope i can make good use of it..

cheers
sunil
Back to top
 
 
IP Logged