Search code documentation¶
The search
submodule documentation.
search_hmm¶
Tool to search an HMM against a database of sequences
May be run as a module, e.g. python -m tral.search.search_hmm -h
-
class
tral.search.search_hmm.
TralHit
(id, prob, logodds, states)[source]¶ Encapsulates key information about a TRAL search result.
This is similar to some objects from the tral.repeat package, but collects data together and provides serialization methods.
-
classmethod
parse_line
(line)[source]¶ Parse a line from the TSV serialization to a TralHit
Inverse of to_line()
Args: - line (str): TSV-separated line representing one hit
Returns: A new TralHit instance
-
to_line
()[source]¶ Converts this hit to a tab-delimited line
Does not include endline.
May be recreated by parse_line(line)
Return: (str)
-
to_treks
(sequence, hmm=None)[source]¶ Converts this hit to T-REKS format
- Parameters
sequence (str or Bio.SeqRecord or Bio.Seq) – sequence for this hit
hmm (hmm.HMM) – (optional) the HMM used to generate this hit. If specified, results in more accurate set of states
Return: (str)
See: tral.sequence.repeat_detection_io.treks_get_repeats
-
classmethod
filter_hmm¶
Filter search_hmm results according to various statistics
May be run as a module, e.g. python -m tral.search.filter_hmm -h
Filtering is performed by filter_search_results. Other methods provide I/O and supporting roles.
-
tral.search.filter_hmm.
count_repeats
(states, hmm_length=0)[source]¶ Count the number of repeats matched by the hmm.
Partial repeats at the beginning and end of the hmm
For example, consider a 3-column HMM with the following states:
N N N M2 M3 M1 I1 M2 M1 M2 C C C
This would have .6+1+.6=2.3 repeats.
- Parameters
states (-) – list of states traversed by the hmm. Only match states (starting with ‘M’) are considered, and are assumed to be numbered sequentially from 1
hmm_length (-) – number of HMM states. If not given, guessed based on the observed states (may be inaccurate)
- Returns
(float) Number of repeats
-
tral.search.filter_hmm.
filter_fasta
(databasefile, outfile, hits, usedescription=True)[source]¶ Filter a fasta file to IDs contained in the hits
- Parameters
databasefile – Name of input fasta. May be gzipped
outfile – Name of output fasta. Uncompressed.
hits – Iterable of TralHits
usedescription – Include full description (header line) from the input fasta database. If false, only the name (first word after the ‘>’) will be used.
-
tral.search.filter_hmm.
filter_search_results
(results, repeats=2, log_odds=8.0)[source]¶ Get the set of hits passing the specified filter
- Parameters
results – TSV file with TRAL hits
repeats – minimum number of repeats
log_odds – minimum log odds score
- Returns
Generator for all hits
-
tral.search.filter_hmm.
match_seqs
(hits, fastafile)[source]¶ Pair a series of hits with sequences
Requires loading fastafile into memory
Return: generator of (TralHit, SeqRecord) tuples
-
tral.search.filter_hmm.
parse_hits
(results)[source]¶ Generator for hits from a search_hmm output TSV file