Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
GADatedClock (Read 4644 times)
aculasso
YaBB Newbies
*
Offline


monkey business

Posts: 13
Buenos Aires - Argentina
Gender: male
GADatedClock
Mar 22nd, 2011 at 12:27pm
 
Hello!

I'm trying to estimate the sustitution rate of some region of Hepatits C Virus.
I've several (7) samples of a patient spanning several years.
Up to now I've 2 estimations, one with BEAST using cloned sequences of each sample, and a "quick and dirt" estimation by linear regresion of branch length over time in the best tree (I say "the best", because it was the result of an exhaustive search with the best fit model) of direct sequences (7 taxa).
Some time ago I've used DatedTips to estimate this rate which was an order of magnitude less than the other estimations.
I'm trying to perform the GADatedClock analysis.
I've the sequences, the tree, set local or global estimations, and always get an error:

Expression syntax/semantics error.
Current BL Command:ExportAMatrix(modelFile,StringToMatrix(currentPopulation[populationSize-1]),1,1) Current Task has been terminated. Would you like to see the remaining error...

and the program stops.

Here is the output of the console:

What units are the dates measured in (e.g. months. This is only used for reporting the results.)?days
Read the following dates:
Nov1998_16301:      16301      days
Jul1999_16524:      16524      days
Mar2000_16772:      16772      days
Feb2002_17490:      17490      days
Ago2006_19107:      19107      days
Oct2001_17356:      17356      days

______________FREE RATES______________
Log Likelihood = -1146.88982261698;
Shared Parameters:
R=0.494877=0.494877

Tree givenTree=(Nov1998_16301:0.00676707,Jul1999_16524:0,(Mar2000_16772:0.0107896,(Fe
b2002_17490:0.00564523,(Ago2006_19107:0.0309347,Oct2001_17356:0.00512011)50:0.00
606189)28:0.0071273)98:0.0200414);

BEST AIC=2313.78
Fitting the single rate model...
Pass 1(0) log-L = -1155.32
Pass 2(1) log-L = -1155.32
Pass 3(2) log-L = -1155.32

______________SINGLE RATE______________
Log Likelihood = -1155.32412978027;
Shared Parameters:
clockTree_scaler_0=0.373943
R=0.494877=0.494877

Tree clockTree=(Nov1998_16301:0.00617536,Jul1999_16524:0.0102448,(Mar2000_16772:0.004
91676,(Feb2002_17490:0.0101288,(Ago2006_19107:0.0357021,Oct2001_17356:0.00374861
)50:0.00393489)28:0.0078905)98:0.00985373);

AIC=2320.65


Starting GA with 2 rate classes
Baseline Model Fit
c-AIC = 2320.65
Starting the GA now...
Generation 2 with 2 rate classes.
Best c-AIC found far is 2310.53
This is a 10.1221 improvement over the single-rate model
GA has considered 29 unique models so far
Total run time so far is 0 hrs 0 mins 4 seconds
Average cluster time per generation is  4.00 seconds
Average CPU time per model is  0.14 seconds
Convergence criterion: 0/50

Other usefull data...
Model HKY
OS: Windows SP3
HyPhy Version: 1.0020080508

Any suggestions to solve this issue?

Thanks in advance!

Andrés

Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GADatedClock
Reply #1 - Mar 23rd, 2011 at 7:35pm
 
Hi Andres,

Please download the 2.00 version of HyPhy for Windows from Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

The version that you are using is about 3 years old and it's quite likely that the error you see has been fixed in the interim. Please do let me know, however, if the error persists in the new version as well, and we'll try to diagnose the problem.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
aculasso
YaBB Newbies
*
Offline


monkey business

Posts: 13
Buenos Aires - Argentina
Gender: male
Re: GADatedClock
Reply #2 - Mar 24th, 2011 at 2:37pm
 
Thank you very much.
I've used version 2.0 of HyPhy for windows and I've got similar error, however using the version compiled from svn in linux I was able to complete the analysis. (only after tunning my regional configuration switching "," per ".")
I wonder if there is any suggested bibliography in order to understand the theoretical framework behind this analysis.
I'm familiar with phylogenetics analysis with distance, parsimony, maximum likelihood and bayesian methods... also know something about genetic algorithms.

Thank you!

Andrés
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GADatedClock
Reply #3 - Mar 24th, 2011 at 2:45pm
 
Hi Andres,

This analysis is something that I never quite finished. I'll see if I can dig up some documentation for you.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
aculasso
YaBB Newbies
*
Offline


monkey business

Posts: 13
Buenos Aires - Argentina
Gender: male
Re: GADatedClock
Reply #4 - Mar 26th, 2011 at 5:10pm
 
Thanks!

BTW, Is there some way to load decimal dates (i.e. 42.32 years) in any of the tip dates analysis? (the load tree script does not like "." in leaf names)
Back to top
 
 
IP Logged
 
aculasso
YaBB Newbies
*
Offline


monkey business

Posts: 13
Buenos Aires - Argentina
Gender: male
Re: GADatedClock
Reply #5 - Mar 27th, 2011 at 1:13pm
 
Just an update:
I've checked the DatedTips scripts. These support decimal values (the regular expression searching for "Date" have an optional dot)
However, in the source code of hyphy, Line 1443 and 1445 of calcnode.cpp both if blocks lacks an '||c=="."' which allows the name of a node to have a dot inside.
Unfortunately, modifying these lines makes that the tree can be read, but the dotted node name mess up with "variable names"...
I've tryied to dig deeper in the source code, but I'm a "BASIC (Sinclair Z80)" programmer, and don't have knowledge enoght to figure out what is happening.
Meanwhile i've converted the dates into days, which leads to the use of integer numbers, but as a results substitution rates are in substitutions per day per site. I can convert it to s/s/y, but since they are really small numbers (10E-8 or less) I'm wondering if some "number conversion" (single/double) is biasing my results.

Thank you again for your time...

Andrés
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GADatedClock
Reply #6 - Mar 30th, 2011 at 1:12pm
 
Hi Andres,

I am impressed with your patience digging through HyPhy source code! In HyPhy language '.' is similar to 'member access' in C, i.e. "tree.node.variable". Adding a '.' inside the node name would break a lot of downstream dependancies. The easiest thing to do is simply to convert all the dates to integers (up to a given precision, e.g. by multiplying the dates by 100).

Sorry there isn't an easier fix.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
aculasso
YaBB Newbies
*
Offline


monkey business

Posts: 13
Buenos Aires - Argentina
Gender: male
Re: GADatedClock
Reply #7 - Mar 30th, 2011 at 1:54pm
 
I'm thinking that it would be possible to alter the tipdates.ibf so it can translate other symbol (a "D" or "#" or whatever) to a "." in the date in order to set decimal dates when it reads the data.
I'll keep you informed if I'm succesfull.

Thanks!

Andrés
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: GADatedClock
Reply #8 - Mar 30th, 2011 at 1:56pm
 
Hi Andres,

That will work too; I suggest using the underscore symbol, as it will be accepted by the tree parser.

Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged