A 2-stage detector was designed to find rho-independent transcription terminators in the Escherichia coli genome. The detector includes a stochastic context free grammar (SCFG) component and a support vector machine (SVM) component. To find terminators, the SCFG searches the intergenic regions of nucleotide sequence for local matches to a terminator grammar that was designed and trained utilizing examples of known terminators. The grammar selects sequences that are the best candidates for terminators and assigns them a prefix, stem-loop, suffix structure using the Cocke-Younger-Kasaami (CYK) algorithm, modified to incorporate energy effects of base pairing. The parameters from this inferred structure are passed to the SVM classifier, which distinguishes terminators from non-terminators that score high according to the terminator grammar. The SVM was trained with negative examples drawn from intergenic sequences that include both featureless and RNA gene regions (which were assigned prefix, stem-loop, suffix structure by the SCFG), so that it successfully distinguishes terminators from either of these. The classifier was found to be 96.4% successful during testing
Francis-Lyon, Patricia; Cristianini, Nello; and Holbrook, Stephen, "Terminator Detection by Support Vector Machine Utilizing a Stochastic Context-Free Grammar" (2007). Nursing and Health Professions Faculty Research and Publications. 163.