HyPhy message board - mammalian sequence analysis

	Welcome, Guest. Please Login

Home

Help

HyPhy message board › Methodology Questions › How to › mammalian sequence analysis

(Moderators: Sergei, Simon)

‹ Previous Topic | Next Topic ›

Pages: 1

Send Topic

mammalian sequence analysis (Read 1658 times)

sunilkalmadi

YaBB Newbies

Offline

Feed your monkey!

Posts: 15

mammalian sequence analysis
Jan 25^th, 2008 at 5:59pm

Dears,
First of all, I would like to like congratulate the authors for developing such wonderful software package and more importantly, even better written manuals understandable even by people like me who are new to molecular evolutionary studies. However, since the documentation was more focused towards viral sequences, I had certain queries related to mammalian sequences which I would like to clarify,

1.Global mean DN/DS value doesn’t give any idea about rate heterogeneity at different sites. I would like to know what is a classical criterion for categorizing a gene “positively/negatively selected”. I mean is it global DN/DS OR number of selected sites OR their proportion OR selection strength?

2.Will a nucleotide substitution bias model and codon model estimated for a mammalian sequence alignment data hold good for other mammalian datasets as well? OR is it necessary to estimate these models for each dataset?

3.If the study is focused towards mammalian phylogeny, the number of species specific genic sequences are usually less (4 – 8). Is this dataset sufficiently large enough for inferring positive selection sites? SLAC is for larger datasets, REL gives higher false positive for smaller datasets, Whether FEL results can be expected to hold good for these small datasets? If yes, what would be a conservative level of significance ?

4.Whether a universally accepted tree topology based on mitochondrial DNA can used for analyzing mammalian datasets OR is it necessary to construct a tree using the sequence alignment itself? Since chances of horizontal gene transfer at mammalian level is low, is there any reason to consider GARD analysis for recombination detection prior selection analysis?

Thanks in advance.
Cheers.
Sunil

IP Logged

Sergei

YaBB Administrator

Offline

Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male

Re: mammalian sequence analysis
Reply #1 - Jan 27^th, 2008 at 6:04pm

Dear Sunil,

Good questions!

Quote:

1.Global mean DN/DS value doesn’t give any idea about rate heterogeneity at different sites. I would like to know what is a classical criterion for categorizing a gene “positively/negatively selected”. I mean is it global DN/DS OR number of selected sites OR their proportion OR selection strength?

I would say that a traditional test is to claim positive selection if there is a p>0 proportion of sites with dN>dS (i.e. a selection operates on some sites). You could also say that if at least one site is detected by a site-wise method (but now after an appropriate correction for multiple testing), then a gene is under selection. Take a look at our selection book chapter (Multimedia File Viewing and Clickable Links are available for Registered Members only!! You need to

Login) for further insight. If you are after this test, I would recommend
PARRIS (implemented in datamonkey.org), as it can also deal with recombinant data.

Quote:

2.Will a nucleotide substitution bias model and codon model estimated for a mammalian sequence alignment data hold good for other mammalian datasets as well? OR is it necessary to estimate these models for each dataset?

It is probably best to estimate the appropriate model for each data set; it's not computationally expensive.
Different genes may have different models, as well as alignments of different sizes (simpler models for smaller alignments, complex models for larger alignments).

Quote:

3.If the study is focused towards mammalian phylogeny, the number of species specific genic sequences are usually less (4 – 8). Is this dataset sufficiently large enough for inferring positive selection sites? SLAC is for larger datasets, REL gives higher false positive for smaller datasets, Whether FEL results can be expected to hold good for these small datasets? If yes, what would be a conservative level of significance ?

Generally speaking, 4-8 sequences have almost no power to detect selection at a single site. You are better off testing for selection in general (i.e. with PARRIS), or doing a region test - i.e. you can partition your gene into surface and buried residues (if that is known), or similarly based on structure and compare dN/dS between the two partitions. SLAC and FEL will have very little power with 4-8 sequences, and REL could suffer from errors associated with parameter estimates derived from small data sets. To detect selection from 4-8 sequences your model must be spot on and have very few parameters - in reality even the best models are biologically wrong, and we don't really know a priori which parameters matter (i.e. do synonymous rates vary from site to site) and which don't.

Quote:

4.Whether a universally accepted tree topology based on mitochondrial DNA can used for analyzing mammalian datasets OR is it necessary to construct a tree using the sequence alignment itself? Since chances of horizontal gene transfer at mammalian level is low, is there any reason to consider GARD analysis for recombination detection prior selection analysis?

The first part has to do with gene trees vs species trees; I can't really say without looking at your genes. As far as screening for possible recombination: in some gene families in mammals (i.e. immune genes such as interferon), there is a lot of gene conversion, which would look as recombination in phylogenies. GARD is fairly conservative; you could always try your analysis with and without a GARD screen to see if that makes a difference.

Cheers,
Sergei

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego

WWW

IP Logged

sunilkalmadi YaBB Newbies Offline Feed your monkey! Posts: 15	Re: mammalian sequence analysis Reply #2 - Jan 28^th, 2008 at 9:43am Thanks a lot Sergei.. that was very informative.. i hope i can make good use of it.. cheers sunil
Back to top	IP Logged

Pages: 1

Send Topic

‹ Previous Topic | Next Topic ›

« Home

‹ Board

Top of this page