Simon_UCT
YaBB Newbies
Offline
Feed your monkey!
Posts: 3
|
Hi Sergei,
Im working on the project that Ken has mentioned. The figure we got for the DnDs rate seemed peculiar (for e.g. out of about 2700 alignments looked at about 2200 of them all had the value 0.529849).
I have discovered what I think to be the cause of this result, perhaps you could confirm this? Data Source: The species in the analysis are all strains of M. tuberculosis, with genes restricted to only those for which there exists a known orthologue in all 5 TB strains.
Problem: After a reexamination of this data (by running CleanStopCodons.bf on each of the files containing the ortholog sequences ) it became clear that in most of these files there was actually only 1 of 2 unique sequences present, thus, if I understand correctly, the derivation of Dn/Ds on such sequences would fail to provide any statistically useful data (and should produce none in the case of submitting only 1 unique sequence - yet somehow even in such cases a figure was returned from somewhere?)
I actually realized this by manually running individual orthologue clusters on DataMonkey which immediately picked up that there were too few unique sequences to proceed.
What I dont get however is what HyPhy was doing? What are the numbers it generated and why did it not throw an error like DataMonkey did.
I attach below Ken's batch script which ran the analysis:
/*this wrapper assumes you have a file which lists all the aligned files you want to process but without a .fasta or .nex extension ie. file1 file2 ... fileN
where the actual data files are
file1.nex file2.nex ... fileN.nex
and trees are in files
file1.tree file2.tree ... fileN.tree
these can easily be generated in *nix
$ find `pwd` -name "*.fasta" | sed 's/.fasta//g' > files.list
have a look at the files.list that I've sent though the paths will not be correct
*/
ExecuteAFile (HYPHY_BASE_DIRECTORY + "TemplateBatchFiles" + DIRECTORY_SEPARATOR + "Utility" + DIRECTORY_SEPARATOR + "ReadDelimitedFiles.bf"); ExecuteAFile (HYPHY_BASE_DIRECTORY + "TemplateBatchFiles" + DIRECTORY_SEPARATOR + "Utility" + DIRECTORY_SEPARATOR + "GrabBag.bf");
/* you can replace the following two lines with a path to file. i.e.: fscanf ( "pathtofile", "Lines", _inDirectoryPaths );*/ SetDialogPrompt ( "Provide a list of files to process:" ); fscanf ( PROMPT_FOR_FILE, "Lines", _inDirectoryPaths );
fprintf (stdout, "[READ ", Columns (_inDirectoryPaths), " file path lines]\n");
/* these options would be passed to the gui/menu, some of which are options implemented in the QuickSelectionDetection.bf which I hacked for this analysis */ _options = {}; _options [ "00" ] = "Universal"; _options [ "01" ] = "New Analysis"; /* option2 is the aligned codon file, see for loop below */ _options [ "03" ] = "Default"; /* option4 is the tree file, see for loop below */ /* option5 is the save file for nucleotide model fit, see for loop below */ _options [ "06" ] = "Estimate";
for ( _fileLine = 0; _fileLine < Columns ( _inDirectoryPaths ); _fileLine = _fileLine + 1 ) { pathParts = splitFilePath(_inDirectoryPaths[_fileLine]); dir_prefix = pathParts["DIRECTORY"]; file_name = pathParts["FILENAME"]; fprintf ( stdout, "Processing ", file_name, "\n" ); _options [ "02" ] = _inDirectoryPaths[_fileLine] + ".aln.nuc"; /* note I am adding the file extension here - change to what is appropriate for you */ _options [ "04" ] = _inDirectoryPaths[_fileLine] + ".tree"; _options [ "05" ] = _inDirectoryPaths[_fileLine] + ".nuc.fit"; ExecuteAFile ( "globaldNdS.bf", _options ); outfile = ""; outfile * ( _inDirectoryPaths [_fileLine ] + ".globaldNdS" ); outfile * 0; fprintf ( outfile, CLEAR_FILE, dNdS, "\n" ); Delete (lf); /* removes the existing likelihood function */
}
thanks Simon
|