Search code documentation

The search submodule documentation.

search_hmm

Tool to search an HMM against a database of sequences

May be run as a module, e.g. python -m tral.search.search_hmm -h

class tral.search.search_hmm.TralHit(id, prob, logodds, states)[source]

Encapsulates key information about a TRAL search result.

This is similar to some objects from the tral.repeat package, but collects data together and provides serialization methods.

classmethod parse_line(line)[source]

Parse a line from the TSV serialization to a TralHit

Inverse of to_line()

Args: - line (str): TSV-separated line representing one hit

Returns: A new TralHit instance

to_line()[source]

Converts this hit to a tab-delimited line

Does not include endline.

May be recreated by parse_line(line)

Return: (str)

to_treks(sequence, hmm=None)[source]

Converts this hit to T-REKS format

Parameters
  • sequence (str or Bio.SeqRecord or Bio.Seq) – sequence for this hit

  • hmm (hmm.HMM) – (optional) the HMM used to generate this hit. If specified, results in more accurate set of states

Return: (str)

See: tral.sequence.repeat_detection_io.treks_get_repeats

tral.search.search_hmm.main(args=None)[source]

search_hmm main method

tral.search.search_hmm.opengzip(filename)[source]

Open a file, which may optionally be gzip’d

Checks the magic bytes to see if the file was compressed.

Parameters

filename (-) – Filename

Returns

An open file handle

tral.search.search_hmm.shuffle_seq(seq)[source]

Randomly shuffle a sequence

Parameters

seq (-) – input sequence

Returns

Bio.Seq.MutableSeq with suffled characters

filter_hmm

Filter search_hmm results according to various statistics

May be run as a module, e.g. python -m tral.search.filter_hmm -h

Filtering is performed by filter_search_results. Other methods provide I/O and supporting roles.

tral.search.filter_hmm.count_repeats(states, hmm_length=0)[source]

Count the number of repeats matched by the hmm.

Partial repeats at the beginning and end of the hmm

For example, consider a 3-column HMM with the following states:

N N N M2 M3 M1 I1 M2 M1 M2 C C C

This would have .6+1+.6=2.3 repeats.

Parameters
  • states (-) – list of states traversed by the hmm. Only match states (starting with ‘M’) are considered, and are assumed to be numbered sequentially from 1

  • hmm_length (-) – number of HMM states. If not given, guessed based on the observed states (may be inaccurate)

Returns

(float) Number of repeats

tral.search.filter_hmm.filter_fasta(databasefile, outfile, hits, usedescription=True)[source]

Filter a fasta file to IDs contained in the hits

Parameters
  • databasefile – Name of input fasta. May be gzipped

  • outfile – Name of output fasta. Uncompressed.

  • hits – Iterable of TralHits

  • usedescription – Include full description (header line) from the input fasta database. If false, only the name (first word after the ‘>’) will be used.

tral.search.filter_hmm.filter_search_results(results, repeats=2, log_odds=8.0)[source]

Get the set of hits passing the specified filter

Parameters
  • results – TSV file with TRAL hits

  • repeats – minimum number of repeats

  • log_odds – minimum log odds score

Returns

Generator for all hits

tral.search.filter_hmm.main(args=None)[source]

filter_hmm main method

tral.search.filter_hmm.match_seqs(hits, fastafile)[source]

Pair a series of hits with sequences

Requires loading fastafile into memory

Return: generator of (TralHit, SeqRecord) tuples

tral.search.filter_hmm.parse_hits(results)[source]

Generator for hits from a search_hmm output TSV file

tral.search.filter_hmm.write_hits(hits, outfile)[source]

Write a collection of hits to a TSV file

Parameters
  • hits – iterable of TralHit

  • outfile – filename

tral.search.filter_hmm.write_treks(databasefile, outfile, hits, hmm=None)[source]

Write a collection of hits as a TREKS file

Parameters
  • hits – iterable of TralHit

  • outfile – filename