Search code documentation¶

The search submodule documentation.

search_hmm¶

Tool to search an HMM against a database of sequences

May be run as a module, e.g. python -m tral.search.search_hmm -h

class tral.search.search_hmm.TralHit(id, prob, logodds, states)[source]¶

Encapsulates key information about a TRAL search result.

This is similar to some objects from the tral.repeat package, but collects data together and provides serialization methods.

classmethod parse_line(line)[source]¶

Parse a line from the TSV serialization to a TralHit

Inverse of to_line()

Args: - line (str): TSV-separated line representing one hit

Returns: A new TralHit instance

to_line()[source]¶

Converts this hit to a tab-delimited line

Does not include endline.

May be recreated by parse_line(line)

Return: (str)

to_treks(sequence, hmm=None)[source]¶

Converts this hit to T-REKS format

Parameters

sequence (str or Bio.SeqRecord or Bio.Seq) – sequence for this hit
hmm (hmm.HMM) – (optional) the HMM used to generate this hit. If specified, results in more accurate set of states

Return: (str)

See: tral.sequence.repeat_detection_io.treks_get_repeats

tral.search.search_hmm.main(args=None)[source]¶: search_hmm main method

tral.search.search_hmm.opengzip(filename)[source]¶

Open a file, which may optionally be gzip’d

Checks the magic bytes to see if the file was compressed.

Parameters: filename (-) – Filename
Returns: An open file handle

tral.search.search_hmm.shuffle_seq(seq)[source]¶

Randomly shuffle a sequence

Parameters: seq (-) – input sequence
Returns: Bio.Seq.MutableSeq with suffled characters

filter_hmm¶

Filter search_hmm results according to various statistics

May be run as a module, e.g. python -m tral.search.filter_hmm -h

Filtering is performed by filter_search_results. Other methods provide I/O and supporting roles.

tral.search.filter_hmm.count_repeats(states, hmm_length=0)[source]¶

Count the number of repeats matched by the hmm.

Partial repeats at the beginning and end of the hmm

For example, consider a 3-column HMM with the following states:

N N N M2 M3 M1 I1 M2 M1 M2 C C C

This would have .6+1+.6=2.3 repeats.

Parameters

states (-) – list of states traversed by the hmm. Only match states (starting with ‘M’) are considered, and are assumed to be numbered sequentially from 1
hmm_length (-) – number of HMM states. If not given, guessed based on the observed states (may be inaccurate)

Returns

(float) Number of repeats

tral.search.filter_hmm.filter_fasta(databasefile, outfile, hits, usedescription=True)[source]¶

Filter a fasta file to IDs contained in the hits

Parameters

databasefile – Name of input fasta. May be gzipped
outfile – Name of output fasta. Uncompressed.
hits – Iterable of TralHits
usedescription – Include full description (header line) from the input fasta database. If false, only the name (first word after the ‘>’) will be used.

tral.search.filter_hmm.filter_search_results(results, repeats=2, log_odds=8.0)[source]¶

Get the set of hits passing the specified filter

Parameters

results – TSV file with TRAL hits
repeats – minimum number of repeats
log_odds – minimum log odds score

Returns

Generator for all hits

tral.search.filter_hmm.main(args=None)[source]¶: filter_hmm main method

tral.search.filter_hmm.match_seqs(hits, fastafile)[source]¶

Pair a series of hits with sequences

Requires loading fastafile into memory

Return: generator of (TralHit, SeqRecord) tuples

tral.search.filter_hmm.parse_hits(results)[source]¶: Generator for hits from a search_hmm output TSV file

tral.search.filter_hmm.write_hits(hits, outfile)[source]¶

Write a collection of hits to a TSV file

Parameters

hits – iterable of TralHit
outfile – filename

tral.search.filter_hmm.write_treks(databasefile, outfile, hits, hmm=None)[source]¶

Write a collection of hits as a TREKS file

Parameters

hits – iterable of TralHit
outfile – filename

Table of Contents

Previous topic

Next topic

This Page

Search code documentation¶

search_hmm¶

filter_hmm¶