Command-line interface
str_mut_signatures
STR mutation signature analysis from paired tumor–normal VCF files.
usage: str_mut_signatures [-h] [-v] [--version]
{extract,filter,nmf,project} ...
Positional Arguments
- command
Possible choices: extract, filter, nmf, project
Subcommands
Named Arguments
- -v, --verbose
Enable verbose logging
Default:
False- --version
show program’s version number and exit
Sub-commands
extract
Extract somatic STR mutation counts from paired tumor–normal VCF files.
str_mut_signatures extract [-h] --vcf-dir VCF_DIR --out-matrix OUT_MATRIX
[--ru-length] [--ru {class,ru}] [--ref-length]
[--change]
Named Arguments
- --vcf-dir
Directory with STR-annotated, paired tumor–normal VCF files.
- --out-matrix
Path to output TSV file with samples as rows and STR mutation features as columns.
- --ru-length
Include repeat-unit length as LEN{len(motif)} in feature labels.
Default:
False- --ru
Possible choices: class, ru
How to include repeat-unit content in feature labels: ‘class’ (base class AT/GC/MX) or ‘ru’ (full repeat-unit sequence). If not specified, repeat-unit content is not included.
- --ref-length
Include reference repeat length in feature labels.
Default:
False- --change
Encode tumor–normal repeat-length change and restrict to somatic events.
Default:
False
filter
Filter a STR mutation count matrix using different heuristics.
str_mut_signatures filter [-h] --matrix MATRIX --out-matrix OUT_MATRIX
[--feature-method {manual,elbow,percentile}]
[--min-feature-total MIN_FEATURE_TOTAL]
[--min-samples-with-feature MIN_SAMPLES_WITH_FEATURE]
[--min-sample-total MIN_SAMPLE_TOTAL]
[--feature-percentile FEATURE_PERCENTILE]
Named Arguments
- --matrix
Input TSV count matrix (samples x STR mutation features).
- --out-matrix
Output TSV path for the filtered matrix.
- --feature-method
Possible choices: manual, elbow, percentile
Feature filtering method: ‘manual’ (use explicit thresholds), ‘elbow’ (elbow heuristic on feature totals), ‘percentile’ (keep features above a percentile of totals). Default: manual
Default:
'manual'- --min-feature-total
Minimum total count across all samples for a feature (manual mode).
Default:
10- --min-samples-with-feature
Minimum number of samples in which the feature must be non-zero.
Default:
3- --min-sample-total
Minimum total count per sample (rows with less are dropped).
Default:
0- --feature-percentile
Percentile (0–1) of feature totals used as threshold in feature-method=percentile. Default: 0.9
Default:
0.9
nmf
Run NMF-based STR mutation signature decomposition on a count matrix.
str_mut_signatures nmf [-h] --matrix MATRIX --outdir OUTDIR --n-signatures
N_SIGNATURES [--max-iter MAX_ITER]
[--random-state RANDOM_STATE] [--init INIT]
[--alpha-W ALPHA_W] [--alpha-H ALPHA_H]
[--l1-ratio L1_RATIO]
Named Arguments
- --matrix
Input TSV count matrix (samples x STR mutation features).
- --outdir
Output directory for NMF results (signatures.tsv, exposures.tsv, metadata.json).
- --n-signatures
Number of signatures (rank) for NMF.
- --max-iter
Maximum number of NMF iterations. Default: 200
Default:
200- --random-state
Random seed for NMF. Default: 0
Default:
0- --init
Initialization method for NMF (passed to sklearn.decomposition.NMF).
Default:
'nndsvd'- --alpha-W
L1/L2 regularization parameter for the W (exposure) matrix.
Default:
0.0- --alpha-H
L1/L2 regularization parameter for the H (signature) matrix.
Default:
0.0- --l1-ratio
Elastic-net mixing parameter (0 = L2, 1 = L1).
Default:
0.0
project
Given a new count matrix and an existing NMF result directory, compute exposures of new samples to the learned signatures.
str_mut_signatures project [-h] --matrix MATRIX --nmf-dir NMF_DIR
--out-exposures OUT_EXPOSURES
Named Arguments
- --matrix
Input TSV count matrix for NEW samples (samples x features).
- --nmf-dir
Directory with a saved NMF result (signatures.tsv, metadata.json, …).
- --out-exposures
Output TSV for new sample exposures to the signatures.
Examples:
# 1) Extract somatic STR mutation counts from paired tumor–normal VCFs str_mut_signatures extract –vcf-dir data/vcfs –out-matrix counts_len.tsv –ru-length –ref-length –change
# 2) Filter a count matrix (feature-level filtering) str_mut_signatures filter
–matrix counts_len1.tsv –out-matrix counts_len1.filtered.tsv –feature-method elbow
# 3) Run NMF decomposition on a filtered matrix and save signatures/exposures str_mut_signatures nmf
–matrix counts_len1.filtered.tsv –outdir nmf_results –n-signatures 5
# 4) Project a new cohort onto pre-computed signatures str_mut_signatures project
–matrix new_counts.tsv –nmf-dir nmf_results –out-exposures new_exposures.tsv
# Enable verbose logging str_mut_signatures extract
–vcf-dir data/vcfs/ –out-matrix counts_len1.tsv –ru-length –ref-length –change –verbose