Dear Sergei,
Yes, again me.
I'm working on a parser of the GARD splits files and of the output of RecombinationProcessor.bf, to automagically cut genes showing breakpoint evidence into the corresponding fragments.
For many genes, in the HTML output, it is written "GARD found no evidence of recombination", while the splits files describe several partitions, usually two, with the first one reduced to few sites (1 to 10 sites). The problem is that if you run RecombinationProcessor.bf on those split files, some KH test are significant.
So, should I first apply a partition length cut-off on the split file? Obviously one solution would be not to run RecombinationProcessor.bf on those file, but I do not plan to read all the HTLM outputs. Another solution would be to parse the HTML files, but this is not very elegant!
Here is two examples:
Gene 1 splits file:
Code:0-0
((((((pyoM1:0,pyoSSI1:0):0,pyo9429:0):0,pyo8232:0):0,pyo6180:0):0,pyo5005:0)[...]
1-1292
(((((((aga2603:9.73301,canis:1.37088):1.15602,pyo6180:0.00680905):0,pyo8232:0.000802671):0,pyo10394:0):0.000796244,pyo10270:0.00412654):0,pyo2096:0):0,pyo9429:0[...]
Output of RecombinationProcessor.bf on Gene 1:
Code:[...]
KH Testing partition 1
Tree 2 base LRT = 1.77636e-15. p-value = 0.0001
[...]
KH Testing partition 2
Tree 1 base LRT = 2.00005e+06. p-value = 0.0001
Output of RecombinationProcessor.bf on Gene 2:
Code:0-3
((((((pyoM49591:0,pyoSSI1:0):0,pyoM1:0):0,pyo9429:0):0,pyo8232:0):0,pyo6180:0):0,pyo5005:0[...]
4-854
(((pyo10750:0.0757482,pyoSSI1:0):0,pyo6180:0):0,pyo315:0,(((aga2603:49.1025,canis:36.4356):43.8836,(pyo10394:0.024977,pyo8232:0):0.0250474):0,(((pyo10270:0[...]
Code:[...]
KH Testing partition 1
Tree 2 base LRT = 3.85687. p-value = 0.0001
[...]
KH Testing partition 2
Tree 1 base LRT = 9.99944e+06. p-value = 0.0057
Thanks,
-Tristan