Welcome, Guest. Please Login
YaBB - Yet another Bulletin Board
 
  HomeHelpSearchLogin  
 
Site rate estimation (new batch file) (Read 6828 times)
Francesc
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 32
Site rate estimation (new batch file)
Sep 27th, 2010 at 2:54pm
 
Hi,

I started a very long topic long time ago (Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login) about modifying the siteRate.bf to estimate the rates for each site independently and using a chronogram. I think that bf file is included with Hyphy (siteRates2.bf).
There are some issues/complexities with that file that doesn't make it suitable for my purposes.

Tongue Sorry for bringing back this topic.

My main goal with this new topic is to simplify (easier to understand) to the bare-bones that bf file hardcoding the model (allowing anyone) and to use it without GUI.

Approach:
- Site rate estimation for each site individually.
- Using an alignment and a chronogram (ultrametric tree).Filenames are hardcoded in the bf file.
- Model is also hardcoded with the parameters fixed.
- My idea to estimate the rates is to optimize only the branch length for each site based on the chronogram topology and the model. If I am not wrong that should be the expected substitutions for that site, Then dividing the expected substitutions by the chronogram lengths, one should get the rate in subs/time units.

I am including the bf file I am preparing and really simple input files for testing.
Embarrassed Still doesn't work.

How can i estimate the branch length only?
Should I create a global parameter in matrix/model?
Does my approach make sense?


Thank you very much for your time and help,

Francesc
Back to top
« Last Edit: Sep 28th, 2010 at 1:41pm by Francesc »  
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (1 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site rate estimation (new batch file)
Reply #1 - Sep 29th, 2010 at 9:25am
 
Hi Francesc,

In your implementation, there was no parameter to optimize (all branch lengths were set to 1).
I added the code to take your chronogram as a guide tree, and ensure that at each site and each branch we have

branch length = site rate * chronogram branch length.

On your data the output looks like:

Code:
Chronogram length (time units): 9
 Site    1 Total subst =  1.3710 subst, Rate =  0.1523 subst/time, Log(L) -4.0538
 Site    2 Total subst =  1.3711 subst, Rate =  0.1523 subst/time, Log(L) -4.0538
 Site    3 Total subst =  1.5668 subst, Rate =  0.1741 subst/time, Log(L) -4.7104
 Site    4 Total subst =  1.3710 subst, Rate =  0.1523 subst/time, Log(L) -4.0538
 Site    5 Total subst =  0.0000 subst, Rate =  0.0000 subst/time, Log(L) -1.3863
THE END
 



Cheers,
Sergei
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (3 KB | )

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Francesc
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 32
Re: Site rate estimation (new batch file)
Reply #2 - Sep 29th, 2010 at 2:46pm
 
Hi Sergei,
Thanks for the quick reply. You really take care of people using your soft.

Yes, I notice that I wasn't optimizing anything and also I tried to do a similar approach than you but I didn't know how to fix some global parameters.
This operator [code]:=[/code] makes the global parameters to be fixed, right?

The approach works fine. Thanks!
I also compared it with rate4site that has JC model for DNA sequences and the results are identical.
DNArates still produces different results when I am trying to reproduce JC but it is another proof that DNArates is doing something that I cannot control.

I did a couple of changes in the script, Removing some comments and changing the model to make it easier to modify.
So, if I want to use K2P with Kappa=2, the model will look like:
[code]
global                  AC := 1;
global                  AT := 1;
global                      AG := 2;      /*I added AG now */
global                  CG := 1;
global                  CT := 2;
global                  GT := 1;

                          /* A         C         G          T */                          
GTRRateMatrix = {{*,       AC*t,     AG*t,     AT*t}
                                {AC*t,    *,        CG*t,     CT*t}
                                {AG*t,    CG*t,     *,        GT*t}
                                {AT*t,    CT*t,     GT*t,       * }};
[/code]

In the same way to have a HKY85 model, I would have to change the fixed frequencies for the estimated ones.
[code]
HarvestFrequencies (obsFreqs, myFilter, 1, 1, 1);
[/code]

So, with this script any GTR model can be applied right?

I think this file is easier to understand and modify (even to work with GUI) than  siteRates2.bf.

I will use this script for my pipeline with command-line hyphy.
THe command line version of hyphy is not available in the dmg file for MAC OS X right?
Do I have to compile it from source to get it? I am having some trouble to compile it in the Mac.
I usually use Ubuntu.

Thanks!!!
Francesc
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site rate estimation (new batch file)
Reply #3 - Oct 13th, 2010 at 10:39pm
 
Hi Francesc,

[quote]
This operator [code]:=[/code] makes the global parameters to be fixed, right?
[/quote]

Yes, ':=' is the constraint assignment operator.

[quote]

I did a couple of changes in the script, Removing some comments and changing the model to make it easier to modify.
So, if I want to use K2P with Kappa=2, the model will look like:
[code]
global                  AC := 1;
global                  AT := 1;
global                      AG := 2;      /*I added AG now */
global                  CG := 1;
global                  CT := 2;
global                  GT := 1;

                          /* A         C         G          T */                          
GTRRateMatrix = {{*,       AC*t,     AG*t,     AT*t}
                                {AC*t,    *,        CG*t,     CT*t}
                                {AG*t,    CG*t,     *,        GT*t}
                                {AT*t,    CT*t,     GT*t,       * }};
[/code]

In the same way to have a HKY85 model, I would have to change the fixed frequencies for the estimated ones.
[code]
HarvestFrequencies (obsFreqs, myFilter, 1, 1, 1);
[/code]

So, with this script any GTR model can be applied right?

[/quote]

Both of your examples are correct, and, indeed the GTR model can be applied by setting global model parameters (AC, AG etc) accordingly.

[quote]
I will use this script for my pipeline with command-line hyphy.
THe command line version of hyphy is not available in the dmg file for MAC OS X right?
Do I have to compile it from source to get it? I am having some trouble to compile it in the Mac.
[/quote]

Do you have Developer Tools installed on the Mac (it ships with the OS X disks, but is typically NOT installed by default)?
Open a terminal and type in
[code]
g++ --version
[/code]

at the prompt. If your gcc version is 4.2 or greater, then the standard Linux build instructions should generate a command line binary.

Sergei

Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Francesc
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 32
Re: Site rate estimation (new batch file)
Reply #4 - Oct 15th, 2010 at 9:02am
 
Thanks Sergei.
Yep, I have gcc version 4.3.1.
The error I am getting is related to the flag -fast. It says it doesn't recognize it.
In any case, the command line version seems to be compiled correctly.
It works.
I would like to add a detail in the script.
Now it is printing the site number but I would like to print also the nucleotide of the first sequence.
So:
site 1 A rate=....
site 2 C rate=...
How can I do that?
Thanks.
Francesc
Back to top
 
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site rate estimation (new batch file)
Reply #5 - Oct 15th, 2010 at 10:49am
 
Hi Francesc,

Edit the script (build.sh) to say -03 (the letter O) in place of -fast to make sure the compilation includes optimizations.
Here's an example of grabbing letters from a simple alignment and using the consensus sequence as well (I am guessing this is what you would prefer in place of just the first letter).


Code:
test =
">1
ACGT
>2
CACT
>3
ACAT";

DataSet 		ds 		= ReadFromString (test);
DataSetFilter	filter  = CreateFilter	 (ds,1);

GetDataInfo     (firstSequence, filter, 0);
GetDataInfo		(consensusSequence, filter, "CONSENSUS");

fprintf (stdout, "\n");

for (siteID = 0; siteID < filter.sites; siteID += 1)
{
	fprintf (stdout, "Site ", siteID+1, " ", firstSequence[siteID], " ", consensusSequence[siteID], "\n");
}
 



Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Francesc
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 32
Re: Site rate estimation (new batch file)
Reply #6 - Nov 1st, 2010 at 2:55pm
 
Thanks for all your help.
I really appreciate it. Smiley
I am using this script in a web application that implements  the Townsend (2007) phylogenetic informativeness analysis.
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

Briefly, the method uses the substitution rates for predicting the utility of loci to solve specific phylogenetic questions.
The web application facilitates uploading of alignments and ultrametric trees to calculate and depict profiles of
informativeness over specified time ranges and to provide rankings of locus prioritization for epochs of interest.

The web application is fully functional and can be found here:
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login

In the instructions sections, there are files to test it.

In the web, we are using hyphy and this script as default option to calculate the substitution rates (fist step to get
the phylogenetic informativeness profiles). DNArates and Rate4site (for aa sequences) are also available.
Of course, if someone uses the web application, we ask to cite also the software used to calculate the rates.

The attached bf file needs to be modified with tree file, base frequencies and model parameters.

cheers,
Francesc
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (3 KB | )
 
IP Logged
 
Francesc
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 32
Re: Site rate estimation (new batch file)
Reply #7 - Nov 23rd, 2010 at 9:23am
 
Hi Sergei,
I am having some problems with the Site rate script we created.
So, I have this alignment (rpb2_align_cut.txt) and a tree (tree.txt).
I run them several times with the bf file (SiteRates_GTR_template.bf) using K2P model with kappa=2.59 (see bf file for details).
For all runs there are sites that have the same rate (1-622) and others that
the rate changes substantially (after 622 some change and other no). I don't
think is a question of the position in the alignment.
ex:

Site 20 we always get 8162.0301 subst/time (There are a lot of sites in that
scale)
Site 702 we get:
6256.3822      7763.2148      0.0448 in 3 different runs.

The problem is that when I run each of this sites individually the rate
doesn't change.
For site 20, the rate is 0.5043 subst/time (different from the first one)
For site 702, the rate is 0.0448 subst/time (same as one of the runs above).

I attached the files that I used for this testing.

Any idea what is happening?

Extra information:
I am using command-line hyphymp compiled in a Mac OS X 10.6. The compilation is not perfect but I think what it fails is the GUI.
I repeated the test with hyphy GUI precompiled version (dmg) and the results are the same.

Thanks,
Francesc
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (15 KB | )
 
IP Logged
 
Sergei
YaBB Administrator
*****
Offline


Datamonkeys are forever...

Posts: 1658
UCSD
Gender: male
Re: Site rate estimation (new batch file)
Reply #8 - Nov 24th, 2010 at 4:59pm
 
Hi Francesc,

Looks like site 5 has legitimate 'infinite' rates, but what happened after that is that the 'siteRate' variable was stuck at the value estimated from the previous site, and when HyPhy tried to initialize the computation for site 6, the really bad starting value for siteRate (infinity from site 5) prevented the optimizer from converging properly.

Add the line

Code:
siteRate = 1;
 



before you call Optimize on siteLikelihood.

Cheers,
Sergei
Back to top
 

Associate Professor
Division of Infectious Diseases
Division of Biomedical Informatics
School of Medicine
University of California San Diego
WWW WWW  
IP Logged
 
Francesc
YaBB Newbies
*
Offline


I love YaBB 1G - SP1!

Posts: 32
Re: Site rate estimation (new batch file)
Reply #9 - Nov 26th, 2010 at 10:29am
 
Thanks Sergei.
It seems to work fine now.
I am uploading the new bf just in case.
Back to top
 
Multimedia File Viewing and Clickable Links are available for Registered Members only!!  You need to Login Login (3 KB | )
 
IP Logged