Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
PARRIS : Too few sites for c-AIC inference (Read 2472 times)
ljouneau
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 2
PARRIS : Too few sites for c-AIC inference
Jul 31st, 2008 at 7:55am
 
Hi,

using PARRIS, I got following error :
"Too few sites for c-AIC inference"

I did not find anything similar using "search" on the forum.

can someone tell me if it means that my sequences are too short to be analyzed ?

Thanks in advance for any help
--
Jouneau Luc
Ingénieur de Recherche
Unité Virologie et Immunologie Moléculaire
Unité Biologie du Développement et Reproduction
INRA - Domaine de Vilvert - Jouy en Josas 78352 Cedex
Tél: +(33) 1 34 65 24 76 - eMail: luc.jouneau@jouy.inra.fr
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PARRIS : Too few sites for c-AIC inference
Reply #1 - Jul 31st, 2008 at 6:42pm
 
Dear Luc,

Sounds like you may have too few sites indeed. This usually happens when you have about twice as many sites as you have sequences (the exact cutoff is determined by the total model parameter count). Have you tried removing all identical sequences that your alignment may have?

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
ljouneau
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 2
Re: PARRIS : Too few sites for c-AIC inference
Reply #2 - Aug 1st, 2008 at 8:23am
 
Dear Serguei,

Thanks for your quick answer,

We tried to run the GARD/HYPHY program on 3 different sets of sequences:

- set A contained 38 sequences, with a length of 435 nucleotides
- set B contained 55 sequences, with a length of 465 nucleotides
- set C contained 49 sequences, with a length of 258 nucleotides

For all 3 sets we wanted to search for 2 or 20 sites of recombinations. The program worked well for the set A and B, but not for set C.
For set C we get error previously reported.

You suggest we should  try to minimise the number of sites in our data-set and to remove identical sequences. Could you please precise what would be the optimal parameters/optimal length of sequence to use in our analysis of data set C, without loosing to much information on our sequences?

Best regards
Luc
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: PARRIS : Too few sites for c-AIC inference
Reply #3 - Aug 1st, 2008 at 8:34am
 
Dear Luc,

The basic idea for forcing to have a minimum number of sites is that one can't infer a good tree from too few sites (imagine trying to build a tree on 5000 sequences from 50 sites). Hence, GARD demands at least p + 2 sites per partition, where p is the number of parameters (number of tree branches + global parameters).

For your second dataset (49 species), there are 2*49-3 = 95 branch lengths PER partition. GARD will expect at least two, so there are 190 parameters to estimate for branch lengths alone. As you can see, we are trying to estimate almost as many parameters as there are observations (sites), which is statistically untenable.

You can run single breakpoint partition analysis on any dataset (e.g. using Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) and base your inference on AIC (not c-AIC), which does not demand a minimum number of sites.

Alternatively, reduce the size of your alignment by pruning sequences. You can remove those that are closely related (e.g. pick one per clade), but this is very heuristic and should be dealt with based on what your data are.

Cheers,
Sergei


Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged