Python bindings for the piscem k-mer mapping engine
Project description
piscem
Python bindings for piscem, a fast and accurate tool for k-mer-based read mapping against a reference transcriptome or genome. piscem lets you load a pre-built index, map individual reads or read pairs, and perform low-level k-mer queries — all from Python, without running full CLI pipelines.
Installation
pip install piscem
To build from source (requires Rust and maturin):
cd crates/py-piscem
maturin develop --release
Quick start
import piscem
# Load a pre-built piscem index
index = piscem.ReferenceIndex.load("path/to/index_prefix")
print(f"k={index.k}, {index.num_refs} references, {index.num_contigs} unitigs")
Mapping reads
Create a MappingEngine from the index, then map reads one at a time. Each call returns a MappingResult containing the mapping type and a list of hits.
Single-end mapping
engine = index.mapping_engine()
result = engine.map_read(b"ACGTACGT...")
if result.is_mapped:
for hit in result.hits:
print(f"{hit.ref_name} pos={hit.pos} fw={hit.is_fw} score={hit.score}")
Paired-end mapping
result = engine.map_read_pair(read1_seq, read2_seq)
if result.mapping_type == "mapped_pair":
hit = result.hits[0]
print(f"{hit.ref_name} frag_len={hit.fragment_length}")
The mapping_type field is one of "unmapped", "single_mapped", "mapped_pair", "mapped_first_orphan", or "mapped_second_orphan".
Mapping strategy
Two strategies are available, matching the piscem CLI:
engine = index.mapping_engine(strategy="permissive") # default — faster, skips along unitigs
engine = index.mapping_engine(strategy="strict") # queries every k-mer position
You can also tune the maximum k-mer occurrence threshold and maximum mappings per read:
engine = index.mapping_engine(max_hit_occ=512, max_read_occ=5000)
Virtual colors mode (binned mapping)
For scATAC-seq and other binned genomic mapping workflows, create a virtual-color engine that accumulates hits per genomic bin:
vc_engine = index.vcolor_engine(bin_size=2000, overlap=400, thr=0.7)
result = vc_engine.map_read_pair(r1, r2)
for hit in result.hits:
print(f"bin={hit.bin_id} tid={hit.tid} pos={hit.pos}")
Streaming k-mer queries
For low-level access, a StreamingQuery slides a k-mer window across a sequence and resolves each k-mer against the index, returning unitig and reference coordinates. Consecutive k-mers on the same unitig are resolved by extension rather than a full dictionary lookup.
sq = index.streaming_query()
hits = sq.query_sequence(b"ACGTACGT...")
for i, hit in enumerate(hits):
if hit is not None:
for rp in hit.ref_positions:
print(f"kmer {i}: ref {rp.tid} pos {rp.pos} fw={rp.is_fw}")
print(f"{sq.num_searches} full lookups, {sq.num_extensions} extensions")
Index metadata
index.k # k-mer size
index.m # minimizer length
index.num_refs # number of reference sequences
index.num_contigs # number of unitigs
index.has_ec_table # equivalence class table loaded?
index.has_poison_table # poison k-mer table loaded?
index.ref_name(0) # name of the first reference
index.ref_len(0) # length of the first reference
index.ref_names() # list of all reference names
index.ref_lengths() # list of all reference lengths
Building an index
You can build a new index from cuttlefish output directly from Python:
index = piscem.ReferenceIndex.build(
"path/to/cuttlefish_prefix", # .cf_seg, .cf_seq, .json
"path/to/output_prefix", # output index files
k=31,
m=19,
threads=8,
)
To build an index with a poison k-mer table (for filtering spurious mappings near decoy boundaries), pass one or more decoy FASTA files:
index = piscem.ReferenceIndex.build(
"path/to/cuttlefish_prefix",
"path/to/output_prefix",
k=31,
m=19,
threads=8,
decoys=["path/to/decoys.fa.gz"],
)
print(f"Poison table: {index.has_poison_table}") # True
Thread safety
ReferenceIndexis immutable and can be shared freely across threads.MappingEngineandStreamingQueryhold mutable per-read state — each thread should create its own viaindex.mapping_engine()orindex.streaming_query().
from concurrent.futures import ThreadPoolExecutor
def map_batch(reads):
eng = index.mapping_engine()
return [eng.map_read(r) for r in reads]
with ThreadPoolExecutor(max_workers=4) as pool:
results = list(pool.map(map_batch, read_batches))
API reference
| Class | Description |
|---|---|
ReferenceIndex |
Load/build indices, access metadata, create engines |
MappingEngine |
Map individual reads or read pairs |
MappingResult |
Mapping output: type + list of hits |
MappingHit |
Single hit: reference ID, position, orientation, score, fragment info |
StreamingQuery |
Low-level sliding-window k-mer queries |
KmerHit |
K-mer lookup result: unitig coordinates + reference positions |
RefPos |
Position on a reference (tid, pos, orientation) |
License
BSD 3-Clause
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file piscem-0.1.4.tar.gz.
File metadata
- Download URL: piscem-0.1.4.tar.gz
- Upload date:
- Size: 176.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea13d442764488929ff8edea4e97f8609eb0f94132f9373dd78946b26b6c930b
|
|
| MD5 |
3a1331384fcf121c6dac9f8941e7a680
|
|
| BLAKE2b-256 |
0b65ef401867b375ede5bb75d6b3fae33cc8be2608e66fba8013171b17752175
|
Provenance
The following attestation bundles were made for piscem-0.1.4.tar.gz:
Publisher:
publish-py-piscem.yml on COMBINE-lab/piscem-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
piscem-0.1.4.tar.gz -
Subject digest:
ea13d442764488929ff8edea4e97f8609eb0f94132f9373dd78946b26b6c930b - Sigstore transparency entry: 1115516429
- Sigstore integration time:
-
Permalink:
COMBINE-lab/piscem-rs@cb47908df30812f37948919f3d43fe5ecc8ec5ce -
Branch / Tag:
refs/tags/py-piscem-v0.1.4 - Owner: https://github.com/COMBINE-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-py-piscem.yml@cb47908df30812f37948919f3d43fe5ecc8ec5ce -
Trigger Event:
push
-
Statement type:
File details
Details for the file piscem-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: piscem-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 1.7 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
286b570a13f2101fe2294b5cc7b9ff9c859c2fce9b49b532a97c4bc1b4ea8131
|
|
| MD5 |
0ed1f9d719c90c980f2d5fa1cac841fb
|
|
| BLAKE2b-256 |
ad48242767e145b938b1d36b32b481b4e3ab6d28fef4ca88ad2153dd3a25f80d
|
Provenance
The following attestation bundles were made for piscem-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
publish-py-piscem.yml on COMBINE-lab/piscem-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
piscem-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
286b570a13f2101fe2294b5cc7b9ff9c859c2fce9b49b532a97c4bc1b4ea8131 - Sigstore transparency entry: 1115516456
- Sigstore integration time:
-
Permalink:
COMBINE-lab/piscem-rs@cb47908df30812f37948919f3d43fe5ecc8ec5ce -
Branch / Tag:
refs/tags/py-piscem-v0.1.4 - Owner: https://github.com/COMBINE-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-py-piscem.yml@cb47908df30812f37948919f3d43fe5ecc8ec5ce -
Trigger Event:
push
-
Statement type:
File details
Details for the file piscem-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: piscem-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 1.5 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f0ce93f1cf54cd7484871de039c728af079b992e18d53a4b1185b43149689102
|
|
| MD5 |
0ffa2afcc5a211f4fc301c0baf2dd706
|
|
| BLAKE2b-256 |
5c4ef79aabeb6a723af2807c5d624e63299aa367a42580d7f40a577739d5fa78
|
Provenance
The following attestation bundles were made for piscem-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
publish-py-piscem.yml on COMBINE-lab/piscem-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
piscem-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
f0ce93f1cf54cd7484871de039c728af079b992e18d53a4b1185b43149689102 - Sigstore transparency entry: 1115516449
- Sigstore integration time:
-
Permalink:
COMBINE-lab/piscem-rs@cb47908df30812f37948919f3d43fe5ecc8ec5ce -
Branch / Tag:
refs/tags/py-piscem-v0.1.4 - Owner: https://github.com/COMBINE-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-py-piscem.yml@cb47908df30812f37948919f3d43fe5ecc8ec5ce -
Trigger Event:
push
-
Statement type:
File details
Details for the file piscem-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: piscem-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 2.9 MB
- Tags: CPython 3.8+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44433d6055e1dd2294626a63ff7ed11211e0e6a1e058e923fc89e3e57a2c3a9b
|
|
| MD5 |
c78acd8df65aa68b4fe80d17a6931ee4
|
|
| BLAKE2b-256 |
d418fa4e5bb0393ea2bdef4be7ce2ee893bdfca766d1d547e0b565366c4d0975
|
Provenance
The following attestation bundles were made for piscem-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:
Publisher:
publish-py-piscem.yml on COMBINE-lab/piscem-rs
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
piscem-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl -
Subject digest:
44433d6055e1dd2294626a63ff7ed11211e0e6a1e058e923fc89e3e57a2c3a9b - Sigstore transparency entry: 1115516439
- Sigstore integration time:
-
Permalink:
COMBINE-lab/piscem-rs@cb47908df30812f37948919f3d43fe5ecc8ec5ce -
Branch / Tag:
refs/tags/py-piscem-v0.1.4 - Owner: https://github.com/COMBINE-lab
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-py-piscem.yml@cb47908df30812f37948919f3d43fe5ecc8ec5ce -
Trigger Event:
push
-
Statement type: