SPICE: Selection Patterns In somatic Copy-number Events
Project description
SPICE: Selection Patterns In somatic Copy-number Events
SPICE, Selection Patterns In somatic Copy-number Events, is an event-level framework that infers discrete copy-number events from allele-specific profiles.
0. Installation
0.1. Prerequisites
- Python >= 3.8
- medicc2 (including openfst)
0.2. Install from pip/conda (recommended)
Coming soon!
0.3 Install from source
- Install MEDICC2 using conda/mamba:
conda install -c bioconda -c conda-forge medicc2
Or better directly create a new conda environment with MEDICC2 inside of it
conda create -n spice_env -c conda-forge -c bioconda medicc2
conda activate spice_env
- Clone the repository:
git clone git@bitbucket.org:schwarzlab/spice.git
cd spice
- Install in development mode:
pip install -e .
This will install SPICE and all its dependencies, and make the spice command available in your shell.
0.4 Optional Dependencies
To use SPICE with Snakemake for parallel execution on computing clusters, install snakemake separately:
conda install bioconda::snakemake
1. Configuration
SPICE uses a configuration file for each run which are specified using the --config flag.
This means you can keep multiple configs (e.g., in configs/) and select them at runtime.
Parameters and directories not specified in the provided config file are taken from the default config file default_config.yaml.
Each config must specify name and directories.base_dir.
1.1 Minimal config.yaml override example
Each config must contain a name, a base directory, and the location of the input copy-number file like so:
name: example_run
directories:
base_dir: /path/to/project
input_files:
copynumber: data/example_data.tsv
For other parameters that can be modified, see default_config.yaml.
1.2 Relative vs absolute paths
directories.*entries (e.g.,data_dir,results_dir,log_dir) as well as input files can be given as relative or absolute paths.- If relative, SPICE resolves them against
directories.base_dir. - If absolute, SPICE uses them as-is.
- If relative, SPICE resolves them against
2. Usage Overview
SPICE has four main modes:
- event_inference: Infer discrete copy-number events from allele-specific profiles
- loci_detection: Detect recurrent copy-number loci across samples
- loci_assignment: Assign loci to samples based on detected loci patterns
- plotting: Generate visualizations of inferred events and detected loci
2.1 Top-level execution examples
For event inference the example config configs/events_example.yaml can be used.
For loci detection and assignment the example config configs/loci_example.yaml can be used.
# Event inference
spice event_inference --config configs/events_example.yaml
# Loci detection
spice loci_detection --config configs/loci_example.yaml
# Loci assignment
spice loci_assignment --config configs/loci_example.yaml
# Plotting
spice plotting --config <path/to/config> --plot-events-per-sample <SAMPLE_ID>
For large datasets, we recommend using Snakemake mode on a computing cluster (see respective sections below).
3. Event Inference
Event inference infers discrete copy-number events from allele-specific copy-number profiles by enumerating valid evolutionary paths through the copy-number landscape and selecting the most likely path using k-nearest neighbors or MCMC sampling.
Note that spice automatically deletes previous runs of the same name when it is rerun.
3.1 Pipeline Overview
The event inference pipeline runs 6 steps:
preprocessing: Extra preprocessing (filling telomeres, phasing, etc.)split: Split haplotypes and preprocess inputall_solutions: Enumerate all valid evolutionary pathsdisambiguate: Select best path using k-nearest neighborslarge_chroms: Use MCMC sampling for chromosomes with many eventscombine: Combine all events into the final output
For each step, nonWGD and WGD samples are treated separately and samples are split by chromosome and allele to give the file IDs "sample:chrom:allele". For each step, each ID is calculated separately and stored as separate files.
Intermediate files can be removed using
spice event_inference --clean --config <path/to/config>
3.2 Expected Input
SPICE expects tab-separated input files with copy-number segments. See example file data/example_data.tsv.
Required columns:
sample: Sample identifierchrom: Chromosome namestart: Segment start positionend: Segment end positioncn_a: Copy number for allele A (haplotype-specific)cn_b: Copy number for allele B (haplotype-specific)
Optional files:
wgd_status: TSV with WGD status per sample (see section 1.3)xy_samples: TSV with sex status per sample (see section 1.4)
Total copy-number mode can be enabled by setting params.total_cn: True in the config file.
3.2.1 WGD Detection
SPICE supports two ways to determine WGD (whole genome duplication) status per sample. The pipeline branches on WGD status and uses different FSTs and neutral CN values accordingly.
-
Provided status via
wgd_statusfile:- Set
input_files.wgd_statusin your config to a TSV file. - The file must have two columns: first column is the sample identifier (used as index), second column named
wgdwith boolean values (True/False). - Example:
sample_id wgd SA123 True SA456 False
- Set
-
Inferred WGD status:
- If
input_files.wgd_statusis missing or empty, SPICE infers WGD using copy-number data and the method specified byparams.wgd_inference_method. - Supported values:
major_cn: heuristic whether at least half of the major copy-number is greater or equal to 2ploidy_loh: PCAWG-style rule combining ploidy and LOH fraction
- If
Notes
- WGD status impacts neutral CN values and constraint solving throughout the pipeline, so ensure this is set or inferred correctly.
- For haplotype-specific data, neutral CN is 1 (noWGD) vs 2 (WGD); for total CN, 2 vs 4 respectively.
3.2.2 Sex (XY/XX) Detection
SPICE supports resolving sample sex (XY vs XX) either via a provided file or automatic inference. This affects handling of chrX and chrY in preprocessing and splitting.
-
Provided status via
xy_samplesfile:- Set
input_files.xy_samplesin your config to a TSV file. - The file must have two columns: first column is the sample identifier (used as index), second column named
xywith boolean values (True/False) indicating XY (male) vs XX (female). - Example:
sample_id xy SA123 True SA456 False
- Set
-
Inferred XY status:
- If
input_files.xy_samplesis missing or empty, SPICE infers XY by checking if any segments exist on chromosomechrYfor a sample.
- If
Effects
- For XY samples with haplotype-specific CN, the minor copy number of
chrXandchrYis set to 0 during preprocessing and splitting. - For XX samples,
chrYis excluded (no segments onchrY).
3.3 Expected Output
Results are saved in results/{name}/
Main outputs:
final_events.tsv: Summary of inferred events per sample/chromosome/allele with event types, coordinates, and validation metricsevents_summary.tsv: Summary statistics for each ID (sample, chromosome, allele combination), including number of events and path selection method
Intermediate files (with separate directories for WGD and non-WGD profiles):
chrom_data_full/: Preprocessed chromosome datafull_paths_single_solution/: Chromosomes with unique solutionsfull_paths_multiple_solutions/: Chromosomes requiring kNN selectionknn_solved_chroms/: Results from kNN selectionmcmc_solved_chroms_large/: Results from MCMC sampling
Intermediate files can be removed using
spice event_inference --clean --config <path/to/config>
3.4 Preprocessing Step Details
The preprocessing step runs only when --run-preprocessing is provided and prepares the input for robust event inference. It performs:
- Data normalization: ensures chromosome names use
chrprefix; converts starts/ends to integers and adjusts starts to 0-based. - CN capping and filtering: caps copy numbers at 8; removes segments shorter than 1kb.
- WGD resolution: loads from
wgd_status.tsvor infers as described in section 1.3. - Sex resolution: loads from
xy_samples.tsvor infers by presence ofchrY; for XY samples with haplotype-specific CN, sets minor CN ofchrXandchrYto 0. - Neighbor merging: merges adjacent segments with identical CNs to reduce fragmentation.
- Telomeres and centromeres: fills telomeric regions and optionally bins/unifies centromeres (can be skipped with
--pre-skip-centromeres). - MEDICC2 phasing: optional phasing of haplotypes; can be skipped with
--pre-skip-phasing. - Short arms and bounds: handles short arms and aligns segment ends to reference chromosome lengths.
Run control:
- Use
--run-preprocessingto enable this step (default is to skip and proceed directly tosplit).
3.5 Parallel Processing
Use multiple cores for event inference:
# Use 8 cores
spice event_inference --config <path/to/config> --cores 8
While using multiple cores can technically make execution faster (especially in the case when spice takes a long time for single runs), it can also slow down execution when there are many entries to loop over.
We usually recommend to only use multiple cores for the large_chroms pipeline step as it takes the longest per sample.
Note that parallel processing will disable logging for the different subprocesses.
3.6 Snakemake Execution
For parallel execution on computing clusters, use the Snakemake workflow.
Note: Snakemake must be installed separately:
conda install bioconda::snakemake
Coming soon, not fully implemented yet
Note: If you get a LockException run spice --config configs/events_example.yaml --unlock to remove the lock.
3.7 Logging Output
Control where logging output is sent with the --log flag:
--log terminal(default): Writes logs to terminal only--log file: Writes logs to file only--log both: Writes logs to both terminal and file
When using --log file or --log both, logs are saved to the configured log directory from the config with a filename pattern: {name}_{timestamp}.log
4. Loci Detection
Loci detection identifies recurrently gained or lost copy-number loci across a cohort of samples.
NOTE that SPICE requires a large cohort for de-novo loci calling and it will likely not produce good results for cohorts with less than 1000 samples
4.1 Pipeline Overview
Coming soon!
4.2 Expected Input
Loci detection requires:
- Event inference results:
final_events.tsvproduced by the event_inference pipeline
4.3 Expected Output
Results are saved in results/{name}
Main outputs:
detected_loci.tsv: List of detected recurrent loci with coordinates and occurrence statisticsloci_summary.tsv: Summary statistics for each detected locus
Intermediate files are saved in results/{name}/events
5. Loci Assignment
Loci assignment assigns predetermined loci to a cohort. This is recommended for smaller cohorts where de-novo loci detection is prohibited.
5.1 Pipeline Overview
Coming soon!
5.2 Expected Input
Loci assignment requires:
- Reference loci:
objects/reference_loci_position.tsvreference loci set created on TCGA data - Event inference results:
final_events.tsvproduced by the event_inference pipeline
5.3 Expected Output
Results are saved in results/{name}/
Main outputs:
loci_assignments.tsv: Assignment of loci to samples with presence/absence or quantitative scoresloci_sample_matrix.tsv: Binary or weighted matrix of loci (rows) by samples (columns)
6. Plotting
Plotting generates visualizations of inferred events and detected loci to aid in manual inspection and interpretation of results.
6.1 Event Visualization
Plotting inferred events can be done on the sample or ID (sample, chromosome, allele) level.
# Plot inferred events per sample
spice plotting --config <path/to/config> --plot-events-per-sample <SAMPLE_ID>
spice plotting --config <path/to/config> --plot-events-per-sample <SAMPLE_ID> --plot-unit-size
# Plot per ID (format: sample:chr:cn_a|cn_b)
spice plotting --config <path/to/config> --plot-events-per-id <sample:chr:allele>
Requirements:
- Plotting requires
final_events.tsv. - Output PNGs are saved to
plot_dir/{name}/(seedirectories.plot_dirin config; defaults toplots/). --plot-unit-sizeswitches per-sample plots to unit-size segments.
For interactive exploration, see notebooks/events_plotting.ipynb.
6.2 Loci Visualization
Plotting detected or assigned loci can be done on the chromosome or loci level.
# Plot detected/assigned loci for chromosome 1
spice plotting --config <path/to/config> --plot-loci-on-chrom chr1 --loci-mode detection
spice plotting --config <path/to/config> --plot-loci-on-chrom chr1 --loci-mode assignment
# Plot the detected locus "3" (corresponds to the index in the final_loci_detection.tsv file)
spice plotting --config <path/to/config> --plot-single-locus 3 --loci-mode detection
Requirements:
- Plotting requires
final_loci_detection.tsvorfinal_loci_assignment.tsv. - Output PNGs are saved to
plot_dir/{name}/(seedirectories.plot_dirin config; defaults toplots/).
For interactive exploration, see notebooks/loci_plotting.ipynb.
7. Advanced Usage
7.1 Python API
You can also import and use SPICE functions directly in Python. Note that it is important to run spice.load_config(config_file) before any other spice imports
config_file = 'configs/events_example.yaml'
import spice
spice.load_config(config_file);
See also the example notebooks for how to use the API.
8. Citation
If you use SPICE in your research, please cite: [TODO]
9. License
GNU GENERAL PUBLIC LICENSE
10. Contact
For questions and issues, please contact: tom.kaufmann@iccb-cologne.org
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scna_spice-0.1.0.tar.gz.
File metadata
- Download URL: scna_spice-0.1.0.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dacad73df5397b95dc129d80ee18d598f42d7f6134fdc21ed5e6fd90c483bf3
|
|
| MD5 |
924693e302d932f71e6bdacaa8259177
|
|
| BLAKE2b-256 |
6cefbf33715252533ad6647f00610c47a0dc30054e5cd514d006743c400d5dca
|
File details
Details for the file scna_spice-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scna_spice-0.1.0-py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
395302f9286b6b384de7bc3b518feeb103ca465b3549ed922de91c3bed7726bc
|
|
| MD5 |
389556be71cd8d1fbdec45975037f870
|
|
| BLAKE2b-256 |
43bcd243cf75c28ff139d7199eaf0162b32d0191a7422819a0ac094c67caf663
|