Python bindings for the piscem k-mer mapping engine

These details have not been verified by PyPI

Project description

piscem

Python bindings for piscem, a fast and accurate tool for k-mer-based read mapping against a reference transcriptome or genome. piscem lets you load a pre-built index, map individual reads or read pairs, and perform low-level k-mer queries — all from Python, without running full CLI pipelines.

Installation

pip install piscem

To build from source (requires Rust and maturin):

cd crates/py-piscem
maturin develop --release

Quick start

import piscem

# Load a pre-built piscem index
index = piscem.ReferenceIndex.load("path/to/index_prefix")

print(f"k={index.k}, {index.num_refs} references, {index.num_contigs} unitigs")

Mapping reads

Create a MappingEngine from the index, then map reads one at a time. Each call returns a MappingResult containing the mapping type and a list of hits.

Single-end mapping

engine = index.mapping_engine()

result = engine.map_read(b"ACGTACGT...")
if result.is_mapped:
    for hit in result.hits:
        print(f"{hit.ref_name}  pos={hit.pos}  fw={hit.is_fw}  score={hit.score}")

Paired-end mapping

result = engine.map_read_pair(read1_seq, read2_seq)

if result.mapping_type == "mapped_pair":
    hit = result.hits[0]
    print(f"{hit.ref_name}  frag_len={hit.fragment_length}")

The mapping_type field is one of "unmapped", "single_mapped", "mapped_pair", "mapped_first_orphan", or "mapped_second_orphan".

Mapping strategy

Two strategies are available, matching the piscem CLI:

engine = index.mapping_engine(strategy="permissive")  # default — faster, skips along unitigs
engine = index.mapping_engine(strategy="strict")       # queries every k-mer position

You can also tune the maximum k-mer occurrence threshold and maximum mappings per read:

engine = index.mapping_engine(max_hit_occ=512, max_read_occ=5000)

Virtual colors mode (binned mapping)

For scATAC-seq and other binned genomic mapping workflows, create a virtual-color engine that accumulates hits per genomic bin:

vc_engine = index.vcolor_engine(bin_size=2000, overlap=400, thr=0.7)

result = vc_engine.map_read_pair(r1, r2)
for hit in result.hits:
    print(f"bin={hit.bin_id}  tid={hit.tid}  pos={hit.pos}")

Streaming k-mer queries

For low-level access, a StreamingQuery slides a k-mer window across a sequence and resolves each k-mer against the index, returning unitig and reference coordinates. Consecutive k-mers on the same unitig are resolved by extension rather than a full dictionary lookup.

sq = index.streaming_query()

hits = sq.query_sequence(b"ACGTACGT...")
for i, hit in enumerate(hits):
    if hit is not None:
        for rp in hit.ref_positions:
            print(f"kmer {i}: ref {rp.tid} pos {rp.pos} fw={rp.is_fw}")

print(f"{sq.num_searches} full lookups, {sq.num_extensions} extensions")

Index metadata

index.k               # k-mer size
index.m               # minimizer length
index.num_refs         # number of reference sequences
index.num_contigs      # number of unitigs
index.has_ec_table     # equivalence class table loaded?
index.has_poison_table # poison k-mer table loaded?

index.ref_name(0)      # name of the first reference
index.ref_len(0)       # length of the first reference
index.ref_names()      # list of all reference names
index.ref_lengths()    # list of all reference lengths

Building an index

You can build a new index from cuttlefish output directly from Python:

index = piscem.ReferenceIndex.build(
    "path/to/cuttlefish_prefix",   # .cf_seg, .cf_seq, .json
    "path/to/output_prefix",       # output index files
    k=31,
    m=19,
    threads=8,
)

To build an index with a poison k-mer table (for filtering spurious mappings near decoy boundaries), pass one or more decoy FASTA files:

index = piscem.ReferenceIndex.build(
    "path/to/cuttlefish_prefix",
    "path/to/output_prefix",
    k=31,
    m=19,
    threads=8,
    decoys=["path/to/decoys.fa.gz"],
)
print(f"Poison table: {index.has_poison_table}")  # True

Thread safety

ReferenceIndex is immutable and can be shared freely across threads.
MappingEngine and StreamingQuery hold mutable per-read state — each thread should create its own via index.mapping_engine() or index.streaming_query().

from concurrent.futures import ThreadPoolExecutor

def map_batch(reads):
    eng = index.mapping_engine()
    return [eng.map_read(r) for r in reads]

with ThreadPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(map_batch, read_batches))

API reference

Class	Description
`ReferenceIndex`	Load/build indices, access metadata, create engines
`MappingEngine`	Map individual reads or read pairs
`MappingResult`	Mapping output: type + list of hits
`MappingHit`	Single hit: reference ID, position, orientation, score, fragment info
`StreamingQuery`	Low-level sliding-window k-mer queries
`KmerHit`	K-mer lookup result: unitig coordinates + reference positions
`RefPos`	Position on a reference (tid, pos, orientation)

License

BSD 3-Clause

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

May 7, 2026

0.1.4

Mar 17, 2026

0.1.3

Feb 23, 2026

0.1.2

Feb 22, 2026

0.1.1

Feb 22, 2026

0.1.0

Feb 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

piscem-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl (1.7 MB view details)

Uploaded May 7, 2026 CPython 3.8+manylinux: glibc 2.28+ x86-64

File details

Details for the file piscem-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl.

File metadata

Download URL: piscem-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl
Upload date: May 7, 2026
Size: 1.7 MB
Tags: CPython 3.8+, manylinux: glibc 2.28+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for piscem-0.2.0-cp38-abi3-manylinux_2_28_x86_64.whl
Algorithm	Hash digest
SHA256	`ca40af4d3acf4e9bb4e0ba4e0209865020148eb98f88a370d39ab79a267d7227`
MD5	`6a50f89352adcacde1a42478083e7b12`
BLAKE2b-256	`1d74e002db6377be6d7f91b3fb9b25afe5e07cdaff3c809dc5391872953aefff`

See more details on using hashes here.

piscem 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

piscem

Installation

Quick start

Mapping reads

Single-end mapping

Paired-end mapping

Mapping strategy

Virtual colors mode (binned mapping)

Streaming k-mer queries

Index metadata

Building an index

Thread safety

API reference

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes