Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
very simple statistic, but I'm not sure how/where! (Read 4098 times)
rbjmax
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 17
very simple statistic, but I'm not sure how/where!
Oct 28th, 2011 at 2:09pm
 
I need to provide a colleague with the maximum % nucleotide and amino acid diversity in my set of sequences (overall and within subpopulations). everywhere I look I see mean measures! and I feel goofy for not being able to find this!

thanks in advance,
Rachel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: very simple statistic, but I'm not sure how/where!
Reply #1 - Oct 31st, 2011 at 7:22am
 
Hi Rachel,

F_ST (a standard under Compartmentalization) is probably your best bet. This will compute mean pairwise diversity (no phylogenetic correction).

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
rbjmax
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 17
Re: very simple statistic, but I'm not sure how/where!
Reply #2 - Nov 1st, 2011 at 2:27pm
 
thanks, Sergei. I'm after maximum measures of diversity, not mean, in order to compare to past published data. at this point I'm considering just a crude pairwise matrix, but the number of taxa involved makes me cringe with sore eyes just to imagine it. I'll keep looking.

Rachel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: very simple statistic, but I'm not sure how/where!
Reply #3 - Nov 1st, 2011 at 7:35pm
 
Hi Rachel,

If your sequences are aligned, I have a really fast command line utility that can compute pairwise TN93 distances on 50,000 sequences in <5 minutes and output the maximum as well as all distances above a fixed threshold. Let me know if you are interested.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
rbjmax
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 17
Re: very simple statistic, but I'm not sure how/where!
Reply #4 - Nov 2nd, 2011 at 10:32am
 
that would be super, Sergei! thank you!

Rachel
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: very simple statistic, but I'm not sure how/where!
Reply #5 - Nov 3rd, 2011 at 3:21pm
 
Hi Rachel,

Here you go. Do the usual Linux source install:

./configure
make
make install

You should now have the TN93dist binary which takes a .fasta file and a number of arguments (hopefully the help line is somewhat meaningful) and runs the pairwise distance calculation. Maximum distance is spooled to the console; only those distances exceeding the threshold you put on the command line are output to a file.

Sergei

Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (87 KB | )

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: very simple statistic, but I'm not sure how/where!
Reply #6 - Nov 8th, 2011 at 3:58pm
 
Hi Rachel,

A more documented version:

Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged