HyPhy message board
Theoretical questions >> Sequence Analysis >> Positive selection in paralogous gene families

Message started by Ana Guimaraes on Nov 29th, 2012 at 9:28am

Title: Positive selection in paralogous gene families
Post by Ana Guimaraes on Nov 29th, 2012 at 9:28am
Dear community,

We work with a bacterium that has large numbers of paralogous gene families (from 2 to 69 members). Some families may or may not undergo recombination, others undergo phase variation through slipped strand mispairing. We would like to know if they are under positive selection within the genome. We only have one strain sequenced. Which test would be the most appropriate - i.e if we decide to use recombination corrected FEL for all, will that work ok if there is no recombination going on in certain families (since we cannot confirm)?
Thank you very much
Ana G.

Title: Re: Positive selection in paralogous gene families
Post by konrad on Dec 4th, 2012 at 2:07am
Dear Ana,

Using recombination correction in the absence of recombination is not a problem - it may cause some loss of power but should not invalidate the test, so is the right thing to do when you are unsure.

The phase variation sounds like a concern - are these frame shift mutations? If yes, it would completely invalidate any codon model analysis.

Hope this helps,
Konrad Scheffler

Title: Re: Positive selection in paralogous gene families
Post by Ana Guimaraes on Dec 6th, 2012 at 10:04am
Dear Dr Scheffler,

Thank you for your reply.
Yes, we believe that some of these families have mononucleotide repeats that cause frameshift mutations (we certainly observed this mechanism in one of them - others were not analyzed). We used BLASTCLUST with 70% length coverage to cluster these families; thus, most of the protein fragments as result of frameshifts were probably not included. Again, it is hard to know which families undergo phase variation or recombination with what we have. We plan to analyze 8 different bacterial species, and they all have multiple families. If we use high thresholds of length to cluster the proteins (i.e 70-80%), can the model be valid? I know we will exclude some genes that are eventually expressed once the repeats contract or expand (and loose power of the analyses) - but I cannot think of any other way...  (???)

I guess for now we can stick with recombination corrected FEL, or is there any other test that may aid in this situation?

Thank you very much for your help


HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.