Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
GARD interpretation (Read 2053 times)
Anita
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 1
GARD interpretation
Jan 13th, 2010 at 2:42am
 
I would like to check my interpretation of GARD analysis on a set of toxin sequences. My whole file is too large to run on the webserver, and attempts to install HyPhy locally have failed (even on a Linux box that has an MPI library installed I get the message"requires an MPI environment to run"). I have therefore split my sequences up into subfiles. Reassuringly, the analyses have tended to come up with the same number of significant breakpoints, many of which are in the same locations. I have also analysed the whole file using RDP3 and the breakpoints identified in these tend to confirm the identified breakpoints. I then checked and removed all the identified recombinants and ran the GARD analysis again but was surprised to see that the same results were obtained. Checking the position of the identified breakpoints, I have noticed that they correspond (very precisely in some cases) to intron/exon boundaries. Essentially, the breakpoints have divided my alignment into the following regions: 5'UTR and translated signal region (there is also a breakpoint within this region but it is the least stable in position between different analyses and also not very strongly supported), the coding region of exon2 and a bit of intron2, first half of intron2 (very long), second half of intron 2, exon3 (coding), intron3, exon4 (coding and 3'UTR). My interpretation is therefore that this is not the result of true recombination (with the possible exception of the breakpoint in the middle of intron2) but instead reflects positive selection on the protein coding region. I plan therefore to separate introns and exons and analyse them separately, using the introns and 5' and 3'UTR+ signal region for estimating the phylogenetic history of the alleles (having first removed the clearly identified recombinants) and using the protein coding region for studying selection. Does this sound like a justifiable approach? Do you think it would be worth repeating the analysis with the protein coding regions excised?
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GARD interpretation
Reply #1 - Jan 14th, 2010 at 4:42pm
 
Hi Anita,

GARD will run locally if you start HyPhy in an MPI environment (i.e. not just build it on a machine with libmpich), e.g. using

Code:
$mpirun -np 21 HYPHYMPI
 



or

Code:
$qsub myMPIHyPhyJob.sh
 



etc.

GARD can report breakpoints to differing branch lengths (e.g. heterotachy), even though the topology of the tree is the same. To reject this possibility, GARD also reports KH test p-values - were those significant in your case?

It is worth running coding and non-coding regions separately through GARD to confirm your suspicion. Also, datamonkey.org will let you analyze (potentially) recombinant data for selection using a partitioned data approach. I would screen coding data through GARD and then run it through selection screens on datamonkey.org with and without accounting for recombination (if any is detected) and see if this makes any difference on the inference of selection.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged