Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Harvest Frequencies / Create Filter error (Read 3943 times)
flashtop
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 18
Harvest Frequencies / Create Filter error
Apr 15th, 2009 at 3:03am
 
hi guys

Using the GTK hyphy MP build, the batch file TEST.BF below shows that either the HarvestFrequencies or CreateFilter command does not cut out the stop codons. Both codon frequency vectors are 64x1 and are identical.

### test.bf ####
DataSet PF01226 = ReadDataFile("/home/flash/Jedi/pipeline/drumline/coli-dN-high.fasta");
DataSetFilter entireF_nostops = CreateFilter(PF01226,3,"","","TAA,TAG,TGA");
DataSetFilter entireF_withstops = CreateFilter(PF01226,3);
HarvestFrequencies(obsFreqs,entireF_nostops,3,3,1);
HarvestFrequencies(estFreqs,entireF_withstops,3,3,1);
fprintf(stdout, obsFreqs, estFreqs);

The message log says that:

"More than 50% of characters in the data are not in the alphabet."

Perhaps the filter does not count the alphabet exclusions as codons but rather as nucleotides??

The Commands work fine on nucleotide data, as run with TEST2.BF:

### test2.bf ###
DataSet PF01226 = ReadDataFile("/home/flash/Jedi/pipeline/drumline/coli-dN-high.fasta");
DataSetFilter entireF_nostops = CreateFilter(PF01226,1,"","","T");
DataSetFilter entireF_withstops = CreateFilter(PF01226,1);
HarvestFrequencies(obsFreqs,entireF_nostops,1,1,1);
HarvestFrequencies(estFreqs,entireF_withstops,1,1,1);
fprintf(stdout, obsFreqs, estFreqs);

I've put the fasta file I'm using through the StripStopCodons.bf under Standard Analyses which said that no stop codons were present. I can't attach it but I can mail it to you if you need it.

Would you guys mind checking it out if you have time?

Thanx
Gordon
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Harvest Frequencies / Create Filter error
Reply #1 - Apr 16th, 2009 at 10:20am
 
Hi Gordon,

Are there a lot of gaps or 'N' characters in your FATSA file? Sometimes this confuses HyPhy's format autodetect mechanism. Can you e-mail me the alignment?

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Harvest Frequencies / Create Filter error
Reply #2 - Apr 20th, 2009 at 10:57am
 
Dear Gordon,

Right; I forgot that HarvestFrequencies actually operates on the dataset itself (it only uses the filter to select sites and sequences that are in that filter), hence you will be seeing a 64x1 vector in both cases. Because entireF_nostops does NOT include TAA,TAG or TGA in its valid character states, the entires for those codons in the frequency vector will be 0.

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
flashtop
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 18
Re: Harvest Frequencies / Create Filter error
Reply #3 - May 13th, 2009 at 4:27am
 
hi Sergei

Yeah it should, but I get to identical 64x1 freq vectors??

Here are my 2 vectors obsFreqs and estFreqs:

Code:
{
{   0.0180327803299}
{   0.0161491971654}
{   0.0143287799705}
{   0.0161426852098}
{   0.0138856413902}
{   0.0167759728943}
{   0.0163474862142}
{   0.0140292300117}
{   0.0130202024876}
{   0.0161042646716}
{   0.0129345702712}
{    0.014326500786}
{   0.0135359493731}
{   0.0166851311133}
{   0.0176264342989}
{   0.0181184125464}
{   0.0159763047436}
{   0.0144043186557}
{    0.018909289557}
{   0.0155106999164}
{   0.0140757904944}
{   0.0141148622282}
{    0.017333070698}
{   0.0141718418399}
{    0.013516739104}
{    0.017267625544}
{   0.0141936568912}
{   0.0162410157397}
{   0.0136733516368}
{   0.0148214094135}
{   0.0226149179054}
{   0.0150597469894}
{   0.0192283753827}
{   0.0160287259863}
{   0.0162481788909}
{   0.0186048556316}
{   0.0158125290596}
{   0.0182587451901}
{   0.0206229106797}
{   0.0151577519215}
{    0.014420272947}
{   0.0185648071045}
{   0.0149747659685}
{   0.0161107766272}
{   0.0146774951942}
{   0.0158493216089}
{   0.0174092605788}
{   0.0155947041439}
{   0.0126610681349}
{    0.014584048631}
{   0.0126610681349}
{   0.0157373159721}
{   0.0138986653014}
{   0.0142545436763}
{   0.0145410697238}
{   0.0139445745886}
{   0.0126610681349}
{   0.0139631336621}
{   0.0161534299366}
{   0.0136665140834}
{    0.015736013581}
{   0.0151636126816}
{   0.0155920993617}
{   0.0172904173887}
}
{
{   0.0180327803299}
{   0.0161491971654}
{   0.0143287799705}
{   0.0161426852098}
{   0.0138856413902}
{   0.0167759728943}
{   0.0163474862142}
{   0.0140292300117}
{   0.0130202024876}
{   0.0161042646716}
{   0.0129345702712}
{    0.014326500786}
{   0.0135359493731}
{   0.0166851311133}
{   0.0176264342989}
{   0.0181184125464}
{   0.0159763047436}
{   0.0144043186557}
{    0.018909289557}
{   0.0155106999164}
{   0.0140757904944}
{   0.0141148622282}
{    0.017333070698}
{   0.0141718418399}
{    0.013516739104}
{    0.017267625544}
{   0.0141936568912}
{   0.0162410157397}
{   0.0136733516368}
{   0.0148214094135}
{   0.0226149179054}
{   0.0150597469894}
{   0.0192283753827}
{   0.0160287259863}
{   0.0162481788909}
{   0.0186048556316}
{   0.0158125290596}
{   0.0182587451901}
{   0.0206229106797}
{   0.0151577519215}
{    0.014420272947}
{   0.0185648071045}
{   0.0149747659685}
{   0.0161107766272}
{   0.0146774951942}
{   0.0158493216089}
{   0.0174092605788}
{   0.0155947041439}
{   0.0126610681349}
{    0.014584048631}
{   0.0126610681349}
{   0.0157373159721}
{   0.0138986653014}
{   0.0142545436763}
{   0.0145410697238}
{   0.0139445745886}
{   0.0126610681349}
{   0.0139631336621}
{   0.0161534299366}
{   0.0136665140834}
{    0.015736013581}
{   0.0151636126816}
{   0.0155920993617}
{   0.0172904173887}
} 



sorry for getting back to this so late

cheers
g
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Harvest Frequencies / Create Filter error
Reply #4 - May 13th, 2009 at 9:47am
 
Hi Gordon,

That your frequency vectors are the same makes sense. Filtering stop codons does nothing (because there aren't any), so HarvestFrequencies starts with the same dataset. The reason you see some non-zero entries in TAA, TAG and TGA slots, is because the default behavior in HyPhy is to allocate frequencies of ambiguous states (including gaps) to all possible resolutions. Hence a '---' will contribute 1/64 to each vector entry.

To override the behavior, set
Code:
COUNT_GAPS_IN_FREQUENCIES = 0;
 


before calling HarvestFrequencies. Note that other ambiguities (e.g. S,M and ?) will not be affected by this setting, only '-' will be.

Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
flashtop
YaBB Newbies
*
Offline


Feed your monkey!

Posts: 18
Re: Harvest Frequencies / Create Filter error
Reply #5 - Jun 11th, 2009 at 9:13am
 
awesome that did the trick thanx a lot

i've been offline for quite a while. Hence I've had enough time to think of something else to ask you!! Smiley

You wouldn't have a GYcustom.mdl lying around (like the MGcustom.mdl that comes with the batch language) which I can implement in my analysis. I've tried hacking it out but I can't seem to hack the tree inference parts. Any guidelines?

I can post my code if it will help at all.

I also see that you've posted an upgrade to the GenomeFitters dl which I'll try to get onto my machine tomorrow.

Keep well
Gordon

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Harvest Frequencies / Create Filter error
Reply #6 - Jun 11th, 2009 at 10:41am
 
Hi Gordon,

No GYCustom.mdl; do send me your code and I'll see if I can spot what's causing the issues.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged