HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Methodology Questions >> How to >> GADatedClock
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1300822057

Message started by aculasso on Mar 22nd, 2011 at 12:27pm

Title: GADatedClock
Post by aculasso on Mar 22nd, 2011 at 12:27pm
Hello!

I'm trying to estimate the sustitution rate of some region of Hepatits C Virus.
I've several (7) samples of a patient spanning several years.
Up to now I've 2 estimations, one with BEAST using cloned sequences of each sample, and a "quick and dirt" estimation by linear regresion of branch length over time in the best tree (I say "the best", because it was the result of an exhaustive search with the best fit model) of direct sequences (7 taxa).
Some time ago I've used DatedTips to estimate this rate which was an order of magnitude less than the other estimations.
I'm trying to perform the GADatedClock analysis.
I've the sequences, the tree, set local or global estimations, and always get an error:

Expression syntax/semantics error.
Current BL Command:ExportAMatrix(modelFile,StringToMatrix(currentPopulation[populationSize-1]),1,1) Current Task has been terminated. Would you like to see the remaining error...

and the program stops.

Here is the output of the console:

What units are the dates measured in (e.g. months. This is only used for reporting the results.)?days
Read the following dates:
Nov1998_16301:      16301      days
Jul1999_16524:      16524      days
Mar2000_16772:      16772      days
Feb2002_17490:      17490      days
Ago2006_19107:      19107      days
Oct2001_17356:      17356      days

______________FREE RATES______________
Log Likelihood = -1146.88982261698;
Shared Parameters:
R=0.494877=0.494877

Tree givenTree=(Nov1998_16301:0.00676707,Jul1999_16524:0,(Mar2000_16772:0.0107896,(Feb2002_17490:0.00564523,(Ago2006_19107:0.0309347,Oct2001_17356:0.00512011)50:0.00606189)28:0.0071273)98:0.0200414);

BEST AIC=2313.78
Fitting the single rate model...
Pass 1(0) log-L = -1155.32
Pass 2(1) log-L = -1155.32
Pass 3(2) log-L = -1155.32

______________SINGLE RATE______________
Log Likelihood = -1155.32412978027;
Shared Parameters:
clockTree_scaler_0=0.373943
R=0.494877=0.494877

Tree clockTree=(Nov1998_16301:0.00617536,Jul1999_16524:0.0102448,(Mar2000_16772:0.00491676,(Feb2002_17490:0.0101288,(Ago2006_19107:0.0357021,Oct2001_17356:0.00374861)50:0.00393489)28:0.0078905)98:0.00985373);

AIC=2320.65


Starting GA with 2 rate classes
Baseline Model Fit
c-AIC = 2320.65
Starting the GA now...
Generation 2 with 2 rate classes.
Best c-AIC found far is 2310.53
This is a 10.1221 improvement over the single-rate model
GA has considered 29 unique models so far
Total run time so far is 0 hrs 0 mins 4 seconds
Average cluster time per generation is  4.00 seconds
Average CPU time per model is  0.14 seconds
Convergence criterion: 0/50

Other usefull data...
Model HKY
OS: Windows SP3
HyPhy Version: 1.0020080508

Any suggestions to solve this issue?

Thanks in advance!

Andrés


Title: Re: GADatedClock
Post by Sergei on Mar 23rd, 2011 at 7:35pm
Hi Andres,

Please download the 2.00 version of HyPhy for Windows from Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

The version that you are using is about 3 years old and it's quite likely that the error you see has been fixed in the interim. Please do let me know, however, if the error persists in the new version as well, and we'll try to diagnose the problem.

Sergei

Title: Re: GADatedClock
Post by aculasso on Mar 24th, 2011 at 2:37pm
Thank you very much.
I've used version 2.0 of HyPhy for windows and I've got similar error, however using the version compiled from svn in linux I was able to complete the analysis. (only after tunning my regional configuration switching "," per ".")
I wonder if there is any suggested bibliography in order to understand the theoretical framework behind this analysis.
I'm familiar with phylogenetics analysis with distance, parsimony, maximum likelihood and bayesian methods... also know something about genetic algorithms.

Thank you!

Andrés

Title: Re: GADatedClock
Post by Sergei on Mar 24th, 2011 at 2:45pm
Hi Andres,

This analysis is something that I never quite finished. I'll see if I can dig up some documentation for you.

Sergei

Title: Re: GADatedClock
Post by aculasso on Mar 26th, 2011 at 5:10pm
Thanks!

BTW, Is there some way to load decimal dates (i.e. 42.32 years) in any of the tip dates analysis? (the load tree script does not like "." in leaf names)

Title: Re: GADatedClock
Post by aculasso on Mar 27th, 2011 at 1:13pm
Just an update:
I've checked the DatedTips scripts. These support decimal values (the regular expression searching for "Date" have an optional dot)
However, in the source code of hyphy, Line 1443 and 1445 of calcnode.cpp both if blocks lacks an '||c=="."' which allows the name of a node to have a dot inside.
Unfortunately, modifying these lines makes that the tree can be read, but the dotted node name mess up with "variable names"...
I've tryied to dig deeper in the source code, but I'm a "BASIC (Sinclair Z80)" programmer, and don't have knowledge enoght to figure out what is happening.
Meanwhile i've converted the dates into days, which leads to the use of integer numbers, but as a results substitution rates are in substitutions per day per site. I can convert it to s/s/y, but since they are really small numbers (10E-8 or less) I'm wondering if some "number conversion" (single/double) is biasing my results.

Thank you again for your time...

Andrés

Title: Re: GADatedClock
Post by Sergei on Mar 30th, 2011 at 1:12pm
Hi Andres,

I am impressed with your patience digging through HyPhy source code! In HyPhy language '.' is similar to 'member access' in C, i.e. "tree.node.variable". Adding a '.' inside the node name would break a lot of downstream dependancies. The easiest thing to do is simply to convert all the dates to integers (up to a given precision, e.g. by multiplying the dates by 100).

Sorry there isn't an easier fix.

Sergei

Title: Re: GADatedClock
Post by aculasso on Mar 30th, 2011 at 1:54pm
I'm thinking that it would be possible to alter the tipdates.ibf so it can translate other symbol (a "D" or "#" or whatever) to a "." in the date in order to set decimal dates when it reads the data.
I'll keep you informed if I'm succesfull.

Thanks!

Andrés

Title: Re: GADatedClock
Post by Sergei on Mar 30th, 2011 at 1:56pm
Hi Andres,

That will work too; I suggest using the underscore symbol, as it will be accepted by the tree parser.

Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.