HyPhy message board
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl
Theoretical questions >> Sequence Analysis >> site-specific models
http://www.hyphy.org/cgi-bin/hyphy_forums/YaBB.pl?num=1229990147

Message started by gustavo on Dec 22nd, 2008 at 3:55pm

Title: site-specific models
Post by gustavo on Dec 22nd, 2008 at 3:55pm
HI,
I am using different models with a site-specific approach.  For example the following script assigns JTT model for each site:


  lk=0;
   for(i=0;i<len;i=i+1){
       ExecuteCommands("#include \"Jones1.mdl\";");
       ExecuteCommands("DataSetFilter class"+i+" = CreateFilter (data,1,\""+i+"\",\"\");");
       ExecuteCommands("Tree tree"+i+" = treestring;");
   }

lfDefCommand = "";
lfDefCommand * "LikelihoodFunction lk = (class0,tree0,";
for(i=1;i<len-1;i=i+1){
        lfDefCommand * ("class"+i+",tree"+i+", ");
}
i=len-1;
lfDefCommand * ("class"+i+",tree"+i+"); ");
lfDefCommand * 0;
ExecuteCommands (lfDefCommand);

   for(i=1;i<len;i=i+1){
       ExecuteCommands("ReplicateConstraint (\"this1.?.a:=this2.?.a\",tree"+i+",tree0);");
   }

   Optimize(params,lk);

   fprintf(jones.out,"\n Jones Log(L) con constraints lk = ", params[1][0]);



The result using this script for a given data is = -21177.3. This value agrees with the conventional use of JTT:

DataSetFilter myset = CreateFilter (data,1);
#include "Jones1.mdl";
Model JTT = (jones, jonesF,1);
Tree trees = treestring;
LikelihoodFunction lk = (myset,trees);
Optimize (params, lk);
fprintf(outs,"Jones = ", params[1][0],"\n");


In a second script that I am using, I tried to use site-specific residue frequencies as follows:

lk=0;
   for(i=0;i<len;i=i+1){
       ExecuteCommands("#include \"Jones1.mdl\";");
       ExecuteCommands("DataSetFilter class"+i+" = CreateFilter (data,1,\""+i+"\",\"\");");
       obsFreqs={20,1};
       obsFreqs=0;
       ExecuteCommands("HarvestFrequencies (obsFreqs, class"+i+", 1, 1, 1);");
       ExecuteCommands("Model JTT = (jones, obsFreqs,1);");
       ExecuteCommands("Tree tree"+i+" = treestring;");
   }

lfDefCommand = "";
lfDefCommand * "LikelihoodFunction lk = (class0,tree0,";
for(i=1;i<len-1;i=i+1){
        lfDefCommand * ("class"+i+",tree"+i+", ");
}
i=len-1;
lfDefCommand * ("class"+i+",tree"+i+"); ");
lfDefCommand * 0;
fprintf (stdout, "\nExecuting: ", lfDefCommand, "\n");
ExecuteCommands (lfDefCommand);

   for(i=1;i<len;i=i+1){
       ExecuteCommands("ReplicateConstraint (\"this1.?.a:=this2.?.a\",tree"+i+",tree0);");
   }

   Optimize(params,lk);

   fprintf(jones.out,"\n jones Log(L) + F con constraints lk = ", params[1][0]);

Using the same data set and topology,  the log lik in this case is -28016.2.
At first, I thought that the inclusion of site-specific frequencies will enhance the likelihood, but the results indicate the contrary.  Do you know why considering a site-specific frequency performs worst than using model equilibrium frequencies?
Thank you very much for your help.
Best regards,
gustavo


Title: Re: site-specific models
Post by Sergei on Dec 22nd, 2008 at 5:03pm
Dear gustavo,

In your second script, the model was actually overwritten with every site iteration and you ended up using just the model defined for the last site in the loop. Try using

[code]
for(i=0;i<len;i=i+1){
      ExecuteCommands("#include \"Jones1.mdl\";");
      ExecuteCommands("DataSetFilter class"+i+" = CreateFilter (data,1,\""+i+"\",\"\");");
      ExecuteCommands("HarvestFrequencies (obsFreqs"+i+", class"+i+", 1, 1, 1);");
      ExecuteCommands("Model JTT_"+i+" = (jones, obsFreqs"+i+",1);");
      ExecuteCommands("Tree tree"+i+" = treestring;");
  }
[/code]

Cheers,
Sergei

HyPhy message board » Powered by YaBB 2.5.2!
YaBB Forum Software © 2000-2024. All Rights Reserved.