Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Odd bootstrap output (Read 5973 times)
Jeff
YaBB Newbies
*
Offline


HyPhy Junkie

Posts: 9
Odd bootstrap output
Jan 11th, 2007 at 8:28am
 
Hi Sergei,

Sorry to bug you again.  I am delving into variance estimates using post_npbs.bf and have run into some odd output.

It seems like the output is fine for the first half, where most of the ML estimates are very similar to the Mean BS estimates.  Then halfway through it spits out a funny Mean, which looks more like a log likelihood value than a paramater Mean.  After that, most of the Means are very different from the ML estimates.  Some sample output is below.

After Spec6.nonSynRate (where the problems seem to start), I think maybe the Means are right but they are not associating with the proper parameter name.  For example, there are two parameters with 0.23-0.25 ML estimates and two different params with 0.23-0.25 Mean estimates.  Ther are also four params with ML estimates of 0 and four with Means of essentially 0. 

This happens whether I am starting from AnalyzeCodonData.bf (MG94W9 w/local params) or AnalyzeNucProtData.bf (HKY85 w/local params), with multiple different data sets, with nonparametric and parametric analyses, and on the recent Windows and Unix builds.

Is it possibly a tree traversal issue, or maybe a loop indexing issue?  Any help would be greatly appreciated.  I can send you the appropriate files if you want them.


Thanks!
Jeff


______________READ THE FOLLOWING DATA______________
10 species:{Spec1,Spec2,Spec3,Spec4,Spec5,Spec6,Spec7,Spec8,Spec9,Spec10};
Total Sites:666;
Distinct Sites:210
______________RESULTS______________
Log Likelihood = -2518.56927129667;
Tree givenTree=(((Spec1:0.0804376,(Spec2:0.0706275,Spec3:0.0359743)Node5:0)Node3:7.05
826e-31,Spec4:0.0303331)Node2:0.0142859,(Spec5:0.411042,Spec6:0.00961238)Node9:0
.0177847,((Spec7:0.00570061,Spec8:0.0640759)Node13:0.0147856,(Spec9:0.0105448,Sp
ec10:0.123455)Node16:0.00932126)Node12:7.05826e-31);

How many data replicates should be generated?:10

Iteration 1/10
Iteration 2/10
Iteration 3/10
Iteration 4/10
Iteration 5/10
Iteration 6/10
Iteration 7/10
Iteration 8/10
Iteration 9/10
Iteration 10/10

           BOOTSTRAPPING SUMMARY

+--------------------------------------------------------------------------+
| Parameter                  |    MLE      |     Mean     |    Variance    |
+--------------------------------------------------------------------------+
| givenTree.Spec8.nonSynRate |    0.030080 |     0.028633 |      0.0000643 |
| givenTree.Spec7.synRate    |    0.032956 |     0.033718 |      0.0006663 |
| givenTree.Spec8.synRate    |    0.277769 |     0.299612 |      0.0093143 |
| givenTree.Node13.nonSynRate|    0.014308 |     0.014675 |      0.0000638 |
| givenTree.Spec7.nonSynRate |    0.000000 |     0.000000 |      0.0000000 |
| givenTree.Spec6.synRate    |    0.006907 |     0.021542 |      0.0004916 |
| givenTree.Node9.nonSynRate |    0.015058 |     0.014620 |      0.0000307 |
| givenTree.Node9.synRate    |    0.056430 |     0.030523 |      0.0015137 |
| givenTree.Node13.synRate   |    0.041403 |     0.040092 |      0.0003577 |
| givenTree.Node16.synRate   |    0.043033 |     0.042696 |      0.0006757 |
| givenTree.Node16.nonSynRate|    0.003524 |     0.002781 |      0.0000066 |
| givenTree.Node12.nonSynRate|    0.000000 |     0.000000 |      0.0000000 |
| givenTree.Node12.synRate   |    0.000000 |     0.000000 |      0.0000000 |
| givenTree.Spec10.synRate   |    0.565417 |     0.548985 |      0.0027109 |
| givenTree.Spec9.nonSynRate |    0.013558 |     0.009385 |      0.0000165 |
| givenTree.Spec9.synRate    |    0.019195 |     0.019439 |      0.0002286 |
| givenTree.Spec10.nonSynRate|    0.048140 |     0.055572 |      0.0002249 |
| givenTree.Spec6.nonSynRate |    0.015797 | -2474.726445 |  10351.7719102 |
| givenTree.Node3.synRate    |    0.000000 |     0.093623 |      0.0014363 |
| givenTree.Spec1.nonSynRate |    0.071306 |     0.029711 |      0.0001926 |
| givenTree.Spec3.synRate    |    0.106497 |     0.000482 |      0.0000010 |
| givenTree.Node3.nonSynRate |    0.000000 |     0.000000 |      0.0000000 |
| givenTree.Node5.synRate    |    0.000000 |     0.232996 |      0.0049520 |
| givenTree.Spec4.synRate    |    0.073258 |     0.069984 |      0.0000960 |
| givenTree.Spec4.nonSynRate |    0.033145 |     0.251677 |      0.0038649 |
| givenTree.Spec1.synRate    |    0.245362 |     0.063980 |      0.0001488 |
| givenTree.Spec3.nonSynRate |    0.032941 |     0.000000 |     -0.0000000 |
| givenTree.Node2.synRate    |    0.026516 |     0.032334 |      0.0003458 |
| givenTree.Spec5.nonSynRate |    0.173676 |     0.166593 |      0.0008723 |
| givenTree.Spec5.synRate    |    1.841278 |     1.974571 |      0.1907162 |
| givenTree.Node2.nonSynRate |    0.018203 |     0.020235 |      0.0000209 |
| givenTree.Spec2.nonSynRate |    0.057648 |     0.000000 |      0.0000000 |
| givenTree.Spec2.synRate    |    0.230723 |     0.028381 |      0.0000396 |
| givenTree.Node5.nonSynRate |    0.000000 |     0.078498 |      0.0009763 |
| Ln-Lklhood                 |-2518.569271 |     0.000000 |      0.0000000 |
+--------------------------------------------------------------------------+
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Odd bootstrap output
Reply #1 - Jan 11th, 2007 at 9:15am
 
Dear Jeff,

I took a quick look at the simpleBootstrap.bf file (where the bootstrapping code is) and couldn't find any apparent indexing bugs.

Could you e-mail me the file and model settings I need to recreate the problem?

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Odd bootstrap output
Reply #2 - Jan 12th, 2007 at 9:04am
 
Dear Jeff,

I did manage to introduce the indexing bug affecting the screen output when I added the code to output branch lengths to a CSV file for coding data. The offending lines are in the file simpleBootstrap.bf, lines 339 and 340, which should read:

Code:
dataMatrix[0][dataDimension]=dataMatrix[0][dataDimension]+temp;
dataMatrix[1][dataDimension]=dataMatrix[1][dataDimension]+temp*temp;
 



Thanks for pointing this out yet again!
Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Odd bootstrap output
Reply #3 - Jan 12th, 2007 at 9:15am
 
Dear Jeff,

There is another indexing problem in the file ... stay tuned for an update.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Odd bootstrap output
Reply #4 - Jan 15th, 2007 at 10:58pm
 
Dear Jeff,

Download Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login and place it in the TemplateBatchFiles directory.

I'll roll this into the next build as well.

Thanks for the report!
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Jeff
YaBB Newbies
*
Offline


HyPhy Junkie

Posts: 9
Re: Odd bootstrap output
Reply #5 - Jan 16th, 2007 at 6:28am
 
Hi Sergei,

Thanks for the quick update.  So far the means and variances look good for my smaller data sets, so it seems like the updated file takes care of the problem.  I'll let you know if I notice any issues with my larger data sets.

Thanks again!
Jeff
Back to top
 
 
IP Logged
 
Danny
YaBB Newbies
*
Offline



Posts: 31
Re: Odd bootstrap output
Reply #6 - Feb 23rd, 2009 at 1:01am
 
Dear Sergei,

I seem to be having similar strange results with
nonparametric
bootstrap estimates using simpleBootstrap.bf.  However, for the same dataset the
parametric
means and variances seem reasonable.  Below are some of the summary values that seem strange from the nonparametric bootstraps.

+-------------------------------------------------------------------------------
-------------+
| Parameter                  |    MLE      |     Mean     |    Variance    |
+-------------------------------------------------------------------------------
-------------+
| AC                      |    0.680938 |    26.357789 |  18220.0152105 |
| AT                      |    0.106453 |   101.102172 | 999795.7313688 |
| givenTree.Node114.nonSynRate  |    0.517753 |  332.868276 | 2954236.9980986 |
| givenTree.Node114.synRate | 1.072250 |  47.574958 | 135608.0128181 |
Ln-Lklhood  |-21483.516754 | -34435.874822 | 29173030835.8369141

I am using the simpleBootstrap.bf that comes with 0.99beta and is identical to 1.00beta.  I have modified the script to write the summary to a file and I have hardwired some of the #includes and ExecuteAFile arguments, but I don't see anything I changed that could have effected the results.  Again, the parametric bootstraps seemed fine with the same script.  I was using the MPI version.  I am using the same tree and dataset that I sent you a few days ago. Before I try to debug this myself I was hoping the difference in behaviour between the parametric and nonparametric might give you a clue to what could be happening.  Below are links to my slightly modified simpleBootstrap.bf and the nonparametric results.

Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

p.s. I haven't been able to connect to Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login for several days.
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Odd bootstrap output
Reply #7 - Feb 23rd, 2009 at 7:31am
 
Hi Danny,

Let me take a look; non-parameteric bootstrap file that you are using has not been updated in a long while.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Danny
YaBB Newbies
*
Offline



Posts: 31
Re: Odd bootstrap output
Reply #8 - Feb 25th, 2009 at 3:10pm
 
Dear Sergei,

I just noticed that Syn_sites + NS_sites != Total_sites in both the parametric and nonparametric csv files.  This is true for the MLE rows also.

-Danny
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Odd bootstrap output
Reply #9 - Feb 25th, 2009 at 3:32pm
 
Dear Danny,

That's unfortunate wording -- I should change column labels. Total sites simply reports the number of codon columns in the alignment. Synonymous+Non-syn sites will actually always be less than the number of columns in the data, because the way HyPhy counts the number of expected syn and ns sites (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login box on page 14 of the manuscript) does not give every codon 3 sites. For example, codon TGG will only have 7/9 sites because two of its one-mutation neighbors (TAG and TGA) are stop codons and contribute neither to syn nor to non-syn counts.

HTH,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Danny
YaBB Newbies
*
Offline



Posts: 31
Re: Odd bootstrap output
Reply #10 - Mar 5th, 2009 at 5:11pm
 
Dear Sergei,

I just noticed a problem with the simpleBootstrap.bf when run with HYPHYMPI.  The problem occurs with both parametric and nonparametric iterations.

The parameter values for the BS iterations are very scrabbled with respect to the headers in the output.

I'm guessing that possibly the simulated MLEs  are getting a different index when

simulatedResults = simulatedLF_MLES;

is set in the MPI version.

In any case the GetString order of simulatedLF does not match the order in simulatedResults.  For example during the simulated variable mapping loop in ProcessIterate:

i  _i  var  res[0][_i]  simulatedResults[0][i] simulatedResults[0][_i]
2  1  GT  0.295536  0.0143457                 0.291832

It seems that with MPI the simulatedLF gets out of sync with simulatedResults.  Interestingly, simulatedResults[0][_i] gives the right values because it is ordered like the native lf.

Maybe the simulatedLF is getting passed by value to the MPI functions?

Althought GetString seems to be able to step through simulatedLF, if I call:

LIKELIHOOD_FUNCTION_OUTPUT=5; fprintf(simulatedLF, simulatedLF, "\n");

just before the GetString stepping, only a zero is printed to the file.

-Danny
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Odd bootstrap output
Reply #11 - Mar 5th, 2009 at 5:14pm
 
Dear Danny,

That's bad! I'll clean that up. Thanks for pointing out the scrambling behavior!

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged