Search code documentation¶
The search submodule documentation.
search_hmm¶
Tool to search an HMM against a database of sequences
May be run as a module, e.g. python -m tral.search.search_hmm -h
- 
class tral.search.search_hmm.TralHit(id, prob, logodds, states)[source]¶
- Encapsulates key information about a TRAL search result. - This is similar to some objects from the tral.repeat package, but collects data together and provides serialization methods. - 
classmethod parse_line(line)[source]¶
- Parse a line from the TSV serialization to a TralHit - Inverse of to_line() - Args: - line (str): TSV-separated line representing one hit - Returns: A new TralHit instance 
 - 
to_line()[source]¶
- Converts this hit to a tab-delimited line - Does not include endline. - May be recreated by parse_line(line) - Return: (str) 
 - 
to_treks(sequence, hmm=None)[source]¶
- Converts this hit to T-REKS format - Parameters
- sequence (str or Bio.SeqRecord or Bio.Seq) – sequence for this hit 
- hmm (hmm.HMM) – (optional) the HMM used to generate this hit. If specified, results in more accurate set of states 
 
 - Return: (str) - See: tral.sequence.repeat_detection_io.treks_get_repeats 
 
- 
classmethod 
filter_hmm¶
Filter search_hmm results according to various statistics
May be run as a module, e.g. python -m tral.search.filter_hmm -h
Filtering is performed by filter_search_results. Other methods provide I/O and supporting roles.
- 
tral.search.filter_hmm.count_repeats(states, hmm_length=0)[source]¶
- Count the number of repeats matched by the hmm. - Partial repeats at the beginning and end of the hmm - For example, consider a 3-column HMM with the following states: - N N N M2 M3 M1 I1 M2 M1 M2 C C C - This would have .6+1+.6=2.3 repeats. - Parameters
- states (-) – list of states traversed by the hmm. Only match states (starting with ‘M’) are considered, and are assumed to be numbered sequentially from 1 
- hmm_length (-) – number of HMM states. If not given, guessed based on the observed states (may be inaccurate) 
 
- Returns
- (float) Number of repeats 
 
- 
tral.search.filter_hmm.filter_fasta(databasefile, outfile, hits, usedescription=True)[source]¶
- Filter a fasta file to IDs contained in the hits - Parameters
- databasefile – Name of input fasta. May be gzipped 
- outfile – Name of output fasta. Uncompressed. 
- hits – Iterable of TralHits 
- usedescription – Include full description (header line) from the input fasta database. If false, only the name (first word after the ‘>’) will be used. 
 
 
- 
tral.search.filter_hmm.filter_search_results(results, repeats=2, log_odds=8.0)[source]¶
- Get the set of hits passing the specified filter - Parameters
- results – TSV file with TRAL hits 
- repeats – minimum number of repeats 
- log_odds – minimum log odds score 
 
- Returns
- Generator for all hits 
 
- 
tral.search.filter_hmm.match_seqs(hits, fastafile)[source]¶
- Pair a series of hits with sequences - Requires loading fastafile into memory - Return: generator of (TralHit, SeqRecord) tuples 
- 
tral.search.filter_hmm.parse_hits(results)[source]¶
- Generator for hits from a search_hmm output TSV file 

