SPICE: Selection Patterns In somatic Copy-number Events

Project description

SPICE: Selection Patterns In somatic Copy-number Events

SPICE, Selection Patterns In somatic Copy-number Events, is an event-level framework that infers discrete copy-number events from allele-specific profiles.

0. Installation

0.1. Prerequisites

Python >= 3.8
medicc2 (including openfst)

0.2. Install from pip/conda (recommended)

Coming soon!

0.3 Install from source

Install MEDICC2 using conda/mamba:

conda install -c bioconda -c conda-forge medicc2

Or better directly create a new conda environment with MEDICC2 inside of it

conda create -n spice_env -c conda-forge -c bioconda medicc2
conda activate spice_env

Clone the repository:

git clone git@bitbucket.org:schwarzlab/spice.git
cd spice

Install in development mode:

pip install -e .

This will install SPICE and all its dependencies, and make the spice command available in your shell.

0.4 Optional Dependencies

To use SPICE with Snakemake for parallel execution on computing clusters, install snakemake separately:

conda install bioconda::snakemake

1. Configuration

SPICE uses a configuration file for each run which are specified using the --config flag. This means you can keep multiple configs (e.g., in configs/) and select them at runtime.

Parameters and directories not specified in the provided config file are taken from the default config file default_config.yaml. Each config must specify name and directories.base_dir.

1.1 Minimal `config.yaml` override example

Each config must contain a name, a base directory, and the location of the input copy-number file like so:

name: example_run
directories:
   base_dir: /path/to/project
input_files:
   copynumber: data/example_data.tsv

For other parameters that can be modified, see default_config.yaml.

1.2 Relative vs absolute paths

directories.* entries (e.g., data_dir, results_dir, log_dir) as well as input files can be given as relative or absolute paths.
- If relative, SPICE resolves them against directories.base_dir.
- If absolute, SPICE uses them as-is.

2. Usage Overview

SPICE has four main modes:

event_inference: Infer discrete copy-number events from allele-specific profiles
loci_detection: Detect recurrent copy-number loci across samples
loci_assignment: Assign loci to samples based on detected loci patterns
plotting: Generate visualizations of inferred events and detected loci

2.1 Top-level execution examples

For event inference the example config configs/events_example.yaml can be used. For loci detection and assignment the example config configs/loci_example.yaml can be used.

# Event inference
spice event_inference --config configs/events_example.yaml

# Loci detection
spice loci_detection --config configs/loci_example.yaml

# Loci assignment
spice loci_assignment --config configs/loci_example.yaml

# Plotting
spice plotting --config <path/to/config> --plot-events-per-sample <SAMPLE_ID>

For large datasets, we recommend using Snakemake mode on a computing cluster (see respective sections below).

3. Event Inference

Event inference infers discrete copy-number events from allele-specific copy-number profiles by enumerating valid evolutionary paths through the copy-number landscape and selecting the most likely path using k-nearest neighbors or MCMC sampling.

Note that spice automatically deletes previous runs of the same name when it is rerun.

3.1 Pipeline Overview

The event inference pipeline runs 6 steps:

preprocessing: Extra preprocessing (filling telomeres, phasing, etc.)
split: Split haplotypes and preprocess input
all_solutions: Enumerate all valid evolutionary paths
disambiguate: Select best path using k-nearest neighbors
large_chroms: Use MCMC sampling for chromosomes with many events
combine: Combine all events into the final output

For each step, nonWGD and WGD samples are treated separately and samples are split by chromosome and allele to give the file IDs "sample:chrom:allele". For each step, each ID is calculated separately and stored as separate files.

Intermediate files can be removed using

spice event_inference --clean --config <path/to/config>

3.2 Expected Input

SPICE expects tab-separated input files with copy-number segments. See example file data/example_data.tsv.

Required columns:

sample: Sample identifier
chrom: Chromosome name
start: Segment start position
end: Segment end position
cn_a: Copy number for allele A (haplotype-specific)
cn_b: Copy number for allele B (haplotype-specific)

Optional files:

wgd_status: TSV with WGD status per sample (see section 1.3)
xy_samples: TSV with sex status per sample (see section 1.4)

Total copy-number mode can be enabled by setting params.total_cn: True in the config file.

3.2.1 WGD Detection

SPICE supports two ways to determine WGD (whole genome duplication) status per sample. The pipeline branches on WGD status and uses different FSTs and neutral CN values accordingly.

Provided status via wgd_status file:
- Set input_files.wgd_status in your config to a TSV file.
- The file must have two columns: first column is the sample identifier (used as index), second column named wgd with boolean values (True/False).
- Example:
```
sample_id	wgd
SA123	True
SA456	False
```
Inferred WGD status:
- If input_files.wgd_status is missing or empty, SPICE infers WGD using copy-number data and the method specified by params.wgd_inference_method.
- Supported values:
  - major_cn: heuristic whether at least half of the major copy-number is greater or equal to 2
  - ploidy_loh: PCAWG-style rule combining ploidy and LOH fraction

Notes

WGD status impacts neutral CN values and constraint solving throughout the pipeline, so ensure this is set or inferred correctly.
For haplotype-specific data, neutral CN is 1 (noWGD) vs 2 (WGD); for total CN, 2 vs 4 respectively.

3.2.2 Sex (XY/XX) Detection

SPICE supports resolving sample sex (XY vs XX) either via a provided file or automatic inference. This affects handling of chrX and chrY in preprocessing and splitting.

Provided status via xy_samples file:
- Set input_files.xy_samples in your config to a TSV file.
- The file must have two columns: first column is the sample identifier (used as index), second column named xy with boolean values (True/False) indicating XY (male) vs XX (female).
- Example:
```
sample_id	xy
SA123	True
SA456	False
```
Inferred XY status:
- If input_files.xy_samples is missing or empty, SPICE infers XY by checking if any segments exist on chromosome chrY for a sample.

Effects

For XY samples with haplotype-specific CN, the minor copy number of chrX and chrY is set to 0 during preprocessing and splitting.
For XX samples, chrY is excluded (no segments on chrY).

3.3 Expected Output

Results are saved in results/{name}/

Main outputs:

final_events.tsv: Summary of inferred events per sample/chromosome/allele with event types, coordinates, and validation metrics
events_summary.tsv: Summary statistics for each ID (sample, chromosome, allele combination), including number of events and path selection method

Intermediate files (with separate directories for WGD and non-WGD profiles):

chrom_data_full/: Preprocessed chromosome data
full_paths_single_solution/: Chromosomes with unique solutions
full_paths_multiple_solutions/: Chromosomes requiring kNN selection
knn_solved_chroms/: Results from kNN selection
mcmc_solved_chroms_large/: Results from MCMC sampling

Intermediate files can be removed using

spice event_inference --clean --config <path/to/config>

3.4 Preprocessing Step Details

The preprocessing step runs only when --run-preprocessing is provided and prepares the input for robust event inference. It performs:

Data normalization: ensures chromosome names use chr prefix; converts starts/ends to integers and adjusts starts to 0-based.
CN capping and filtering: caps copy numbers at 8; removes segments shorter than 1kb.
WGD resolution: loads from wgd_status.tsv or infers as described in section 1.3.
Sex resolution: loads from xy_samples.tsv or infers by presence of chrY; for XY samples with haplotype-specific CN, sets minor CN of chrX and chrY to 0.
Neighbor merging: merges adjacent segments with identical CNs to reduce fragmentation.
Telomeres and centromeres: fills telomeric regions and optionally bins/unifies centromeres (can be skipped with --pre-skip-centromeres).
MEDICC2 phasing: optional phasing of haplotypes; can be skipped with --pre-skip-phasing.
Short arms and bounds: handles short arms and aligns segment ends to reference chromosome lengths.

Run control:

Use --run-preprocessing to enable this step (default is to skip and proceed directly to split).

3.5 Parallel Processing

Use multiple cores for event inference:

# Use 8 cores
spice event_inference --config <path/to/config> --cores 8

While using multiple cores can technically make execution faster (especially in the case when spice takes a long time for single runs), it can also slow down execution when there are many entries to loop over. We usually recommend to only use multiple cores for the large_chroms pipeline step as it takes the longest per sample.

Note that parallel processing will disable logging for the different subprocesses.

3.6 Snakemake Execution

For parallel execution on computing clusters, use the Snakemake workflow.

Note: Snakemake must be installed separately:

conda install bioconda::snakemake

Coming soon, not fully implemented yet

Note: If you get a LockException run spice --config configs/events_example.yaml --unlock to remove the lock.

3.7 Logging Output

Control where logging output is sent with the --log flag:

--log terminal (default): Writes logs to terminal only
--log file: Writes logs to file only
--log both: Writes logs to both terminal and file

When using --log file or --log both, logs are saved to the configured log directory from the config with a filename pattern: {name}_{timestamp}.log

4. Loci Detection

Loci detection identifies recurrently gained or lost copy-number loci across a cohort of samples.

NOTE that SPICE requires a large cohort for de-novo loci calling and it will likely not produce good results for cohorts with less than 1000 samples

4.1 Pipeline Overview

Coming soon!

4.2 Expected Input

Loci detection requires:

Event inference results: final_events.tsv produced by the event_inference pipeline

4.3 Expected Output

Results are saved in results/{name}

Main outputs:

detected_loci.tsv: List of detected recurrent loci with coordinates and occurrence statistics
loci_summary.tsv: Summary statistics for each detected locus

Intermediate files are saved in results/{name}/events

5. Loci Assignment

Loci assignment assigns predetermined loci to a cohort. This is recommended for smaller cohorts where de-novo loci detection is prohibited.

5.1 Pipeline Overview

Coming soon!

5.2 Expected Input

Loci assignment requires:

Reference loci: objects/reference_loci_position.tsv reference loci set created on TCGA data
Event inference results: final_events.tsv produced by the event_inference pipeline

5.3 Expected Output

Results are saved in results/{name}/

Main outputs:

loci_assignments.tsv: Assignment of loci to samples with presence/absence or quantitative scores
loci_sample_matrix.tsv: Binary or weighted matrix of loci (rows) by samples (columns)

6. Plotting

Plotting generates visualizations of inferred events and detected loci to aid in manual inspection and interpretation of results.

6.1 Event Visualization

Plotting inferred events can be done on the sample or ID (sample, chromosome, allele) level.

# Plot inferred events per sample
spice plotting --config <path/to/config> --plot-events-per-sample <SAMPLE_ID>
spice plotting --config <path/to/config> --plot-events-per-sample <SAMPLE_ID> --plot-unit-size

# Plot per ID (format: sample:chr:cn_a|cn_b)
spice plotting --config <path/to/config> --plot-events-per-id <sample:chr:allele>

Requirements:

Plotting requires final_events.tsv.
Output PNGs are saved to plot_dir/{name}/ (see directories.plot_dir in config; defaults to plots/).
--plot-unit-size switches per-sample plots to unit-size segments.

For interactive exploration, see notebooks/events_plotting.ipynb.

6.2 Loci Visualization

Plotting detected or assigned loci can be done on the chromosome or loci level.

# Plot detected/assigned loci for chromosome 1
spice plotting --config <path/to/config> --plot-loci-on-chrom chr1 --loci-mode detection
spice plotting --config <path/to/config> --plot-loci-on-chrom chr1 --loci-mode assignment

# Plot the detected locus "3" (corresponds to the index in the final_loci_detection.tsv file)
spice plotting --config <path/to/config> --plot-single-locus 3 --loci-mode detection

Requirements:

Plotting requires final_loci_detection.tsv or final_loci_assignment.tsv.
Output PNGs are saved to plot_dir/{name}/ (see directories.plot_dir in config; defaults to plots/).

For interactive exploration, see notebooks/loci_plotting.ipynb.

7. Advanced Usage

7.1 Python API

You can also import and use SPICE functions directly in Python. Note that it is important to run spice.load_config(config_file) before any other spice imports

config_file = 'configs/events_example.yaml'
import spice
spice.load_config(config_file);

See also the example notebooks for how to use the API.

8. Citation

If you use SPICE in your research, please cite: [TODO]

9. License

GNU GENERAL PUBLIC LICENSE

10. Contact

For questions and issues, please contact: tom.kaufmann@iccb-cologne.org

Project details

Release history Release notifications | RSS feed

0.1.1

Jun 25, 2026

This version

0.1.0

Feb 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scna_spice-0.1.0.tar.gz (1.6 MB view details)

Uploaded Feb 16, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

scna_spice-0.1.0-py3-none-any.whl (1.6 MB view details)

Uploaded Feb 16, 2026 Python 3

File details

Details for the file scna_spice-0.1.0.tar.gz.

File metadata

Download URL: scna_spice-0.1.0.tar.gz
Upload date: Feb 16, 2026
Size: 1.6 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for scna_spice-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5dacad73df5397b95dc129d80ee18d598f42d7f6134fdc21ed5e6fd90c483bf3`
MD5	`924693e302d932f71e6bdacaa8259177`
BLAKE2b-256	`6cefbf33715252533ad6647f00610c47a0dc30054e5cd514d006743c400d5dca`

See more details on using hashes here.

File details

Details for the file scna_spice-0.1.0-py3-none-any.whl.

File metadata

Download URL: scna_spice-0.1.0-py3-none-any.whl
Upload date: Feb 16, 2026
Size: 1.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.14

File hashes

Hashes for scna_spice-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`395302f9286b6b384de7bc3b518feeb103ca465b3549ed922de91c3bed7726bc`
MD5	`389556be71cd8d1fbdec45975037f870`
BLAKE2b-256	`43bcd243cf75c28ff139d7199eaf0162b32d0191a7422819a0ac094c67caf663`

See more details on using hashes here.

scna-spice 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

SPICE: Selection Patterns In somatic Copy-number Events

0. Installation

0.1. Prerequisites

0.2. Install from pip/conda (recommended)

0.3 Install from source

0.4 Optional Dependencies

1. Configuration

1.1 Minimal config.yaml override example

1.2 Relative vs absolute paths

2. Usage Overview

2.1 Top-level execution examples

3. Event Inference

3.1 Pipeline Overview

3.2 Expected Input

3.2.1 WGD Detection

3.2.2 Sex (XY/XX) Detection

3.3 Expected Output

3.4 Preprocessing Step Details

3.5 Parallel Processing

3.6 Snakemake Execution

3.7 Logging Output

4. Loci Detection

4.1 Pipeline Overview

4.2 Expected Input

4.3 Expected Output

5. Loci Assignment

5.1 Pipeline Overview

5.2 Expected Input

5.3 Expected Output

6. Plotting

6.1 Event Visualization

6.2 Loci Visualization

7. Advanced Usage

7.1 Python API

8. Citation

9. License

10. Contact

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1.1 Minimal `config.yaml` override example