Command-line interface

str_mut_signatures

STR mutation signature analysis from paired tumor–normal VCF files.

usage: str_mut_signatures [-h] [-v] [--version]
                          {extract,filter,nmf,project} ...

Positional Arguments

command

Possible choices: extract, filter, nmf, project

Subcommands

Named Arguments

-v, --verbose

Enable verbose logging

Default: False

--version

show program’s version number and exit

Sub-commands

extract

Extract somatic STR mutation counts from paired tumor–normal VCF files.

str_mut_signatures extract [-h] --vcf-dir VCF_DIR --out-matrix OUT_MATRIX
                           [--ru-length] [--ru {class,ru}] [--ref-length]
                           [--change]
Named Arguments
--vcf-dir

Directory with STR-annotated, paired tumor–normal VCF files.

--out-matrix

Path to output TSV file with samples as rows and STR mutation features as columns.

--ru-length

Include repeat-unit length as LEN{len(motif)} in feature labels.

Default: False

--ru

Possible choices: class, ru

How to include repeat-unit content in feature labels: ‘class’ (base class AT/GC/MX) or ‘ru’ (full repeat-unit sequence). If not specified, repeat-unit content is not included.

--ref-length

Include reference repeat length in feature labels.

Default: False

--change

Encode tumor–normal repeat-length change and restrict to somatic events.

Default: False

filter

Filter a STR mutation count matrix using different heuristics.

str_mut_signatures filter [-h] --matrix MATRIX --out-matrix OUT_MATRIX
                          [--feature-method {manual,elbow,percentile}]
                          [--min-feature-total MIN_FEATURE_TOTAL]
                          [--min-samples-with-feature MIN_SAMPLES_WITH_FEATURE]
                          [--min-sample-total MIN_SAMPLE_TOTAL]
                          [--feature-percentile FEATURE_PERCENTILE]
Named Arguments
--matrix

Input TSV count matrix (samples x STR mutation features).

--out-matrix

Output TSV path for the filtered matrix.

--feature-method

Possible choices: manual, elbow, percentile

Feature filtering method: ‘manual’ (use explicit thresholds), ‘elbow’ (elbow heuristic on feature totals), ‘percentile’ (keep features above a percentile of totals). Default: manual

Default: 'manual'

--min-feature-total

Minimum total count across all samples for a feature (manual mode).

Default: 10

--min-samples-with-feature

Minimum number of samples in which the feature must be non-zero.

Default: 3

--min-sample-total

Minimum total count per sample (rows with less are dropped).

Default: 0

--feature-percentile

Percentile (0–1) of feature totals used as threshold in feature-method=percentile. Default: 0.9

Default: 0.9

nmf

Run NMF-based STR mutation signature decomposition on a count matrix.

str_mut_signatures nmf [-h] --matrix MATRIX --outdir OUTDIR --n-signatures
                       N_SIGNATURES [--max-iter MAX_ITER]
                       [--random-state RANDOM_STATE] [--init INIT]
                       [--alpha-W ALPHA_W] [--alpha-H ALPHA_H]
                       [--l1-ratio L1_RATIO]
Named Arguments
--matrix

Input TSV count matrix (samples x STR mutation features).

--outdir

Output directory for NMF results (signatures.tsv, exposures.tsv, metadata.json).

--n-signatures

Number of signatures (rank) for NMF.

--max-iter

Maximum number of NMF iterations. Default: 200

Default: 200

--random-state

Random seed for NMF. Default: 0

Default: 0

--init

Initialization method for NMF (passed to sklearn.decomposition.NMF).

Default: 'nndsvd'

--alpha-W

L1/L2 regularization parameter for the W (exposure) matrix.

Default: 0.0

--alpha-H

L1/L2 regularization parameter for the H (signature) matrix.

Default: 0.0

--l1-ratio

Elastic-net mixing parameter (0 = L2, 1 = L1).

Default: 0.0

project

Given a new count matrix and an existing NMF result directory, compute exposures of new samples to the learned signatures.

str_mut_signatures project [-h] --matrix MATRIX --nmf-dir NMF_DIR
                           --out-exposures OUT_EXPOSURES
Named Arguments
--matrix

Input TSV count matrix for NEW samples (samples x features).

--nmf-dir

Directory with a saved NMF result (signatures.tsv, metadata.json, …).

--out-exposures

Output TSV for new sample exposures to the signatures.

Examples:

# 1) Extract somatic STR mutation counts from paired tumor–normal VCFs str_mut_signatures extract –vcf-dir data/vcfs –out-matrix counts_len.tsv –ru-length –ref-length –change

# 2) Filter a count matrix (feature-level filtering) str_mut_signatures filter

–matrix counts_len1.tsv –out-matrix counts_len1.filtered.tsv –feature-method elbow

# 3) Run NMF decomposition on a filtered matrix and save signatures/exposures str_mut_signatures nmf

–matrix counts_len1.filtered.tsv –outdir nmf_results –n-signatures 5

# 4) Project a new cohort onto pre-computed signatures str_mut_signatures project

–matrix new_counts.tsv –nmf-dir nmf_results –out-exposures new_exposures.tsv

# Enable verbose logging str_mut_signatures extract

–vcf-dir data/vcfs/ –out-matrix counts_len1.tsv –ru-length –ref-length –change –verbose