Skip to main content

Python bindings for the piscem k-mer mapping engine

Project description

piscem

Python bindings for piscem, a fast and accurate tool for k-mer-based read mapping against a reference transcriptome or genome. piscem lets you load a pre-built index, map individual reads or read pairs, and perform low-level k-mer queries — all from Python, without running full CLI pipelines.

Installation

pip install piscem

To build from source (requires Rust and maturin):

cd crates/py-piscem
maturin develop --release

Quick start

import piscem

# Load a pre-built piscem index
index = piscem.ReferenceIndex.load("path/to/index_prefix")

print(f"k={index.k}, {index.num_refs} references, {index.num_contigs} unitigs")

Mapping reads

Create a MappingEngine from the index, then map reads one at a time. Each call returns a MappingResult containing the mapping type and a list of hits.

Single-end mapping

engine = index.mapping_engine()

result = engine.map_read(b"ACGTACGT...")
if result.is_mapped:
    for hit in result.hits:
        print(f"{hit.ref_name}  pos={hit.pos}  fw={hit.is_fw}  score={hit.score}")

Paired-end mapping

result = engine.map_read_pair(read1_seq, read2_seq)

if result.mapping_type == "mapped_pair":
    hit = result.hits[0]
    print(f"{hit.ref_name}  frag_len={hit.fragment_length}")

The mapping_type field is one of "unmapped", "single_mapped", "mapped_pair", "mapped_first_orphan", or "mapped_second_orphan".

Mapping strategy

Two strategies are available, matching the piscem CLI:

engine = index.mapping_engine(strategy="permissive")  # default — faster, skips along unitigs
engine = index.mapping_engine(strategy="strict")       # queries every k-mer position

You can also tune the maximum k-mer occurrence threshold and maximum mappings per read:

engine = index.mapping_engine(max_hit_occ=512, max_read_occ=5000)

Virtual colors mode (binned mapping)

For scATAC-seq and other binned genomic mapping workflows, create a virtual-color engine that accumulates hits per genomic bin:

vc_engine = index.vcolor_engine(bin_size=2000, overlap=400, thr=0.7)

result = vc_engine.map_read_pair(r1, r2)
for hit in result.hits:
    print(f"bin={hit.bin_id}  tid={hit.tid}  pos={hit.pos}")

Streaming k-mer queries

For low-level access, a StreamingQuery slides a k-mer window across a sequence and resolves each k-mer against the index, returning unitig and reference coordinates. Consecutive k-mers on the same unitig are resolved by extension rather than a full dictionary lookup.

sq = index.streaming_query()

hits = sq.query_sequence(b"ACGTACGT...")
for i, hit in enumerate(hits):
    if hit is not None:
        for rp in hit.ref_positions:
            print(f"kmer {i}: ref {rp.tid} pos {rp.pos} fw={rp.is_fw}")

print(f"{sq.num_searches} full lookups, {sq.num_extensions} extensions")

Index metadata

index.k               # k-mer size
index.m               # minimizer length
index.num_refs         # number of reference sequences
index.num_contigs      # number of unitigs
index.has_ec_table     # equivalence class table loaded?
index.has_poison_table # poison k-mer table loaded?

index.ref_name(0)      # name of the first reference
index.ref_len(0)       # length of the first reference
index.ref_names()      # list of all reference names
index.ref_lengths()    # list of all reference lengths

Building an index

You can build a new index from cuttlefish output directly from Python:

index = piscem.ReferenceIndex.build(
    "path/to/cuttlefish_prefix",   # .cf_seg, .cf_seq, .json
    "path/to/output_prefix",       # output index files
    k=31,
    m=19,
    threads=8,
)

To build an index with a poison k-mer table (for filtering spurious mappings near decoy boundaries), pass one or more decoy FASTA files:

index = piscem.ReferenceIndex.build(
    "path/to/cuttlefish_prefix",
    "path/to/output_prefix",
    k=31,
    m=19,
    threads=8,
    decoys=["path/to/decoys.fa.gz"],
)
print(f"Poison table: {index.has_poison_table}")  # True

Thread safety

  • ReferenceIndex is immutable and can be shared freely across threads.
  • MappingEngine and StreamingQuery hold mutable per-read state — each thread should create its own via index.mapping_engine() or index.streaming_query().
from concurrent.futures import ThreadPoolExecutor

def map_batch(reads):
    eng = index.mapping_engine()
    return [eng.map_read(r) for r in reads]

with ThreadPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(map_batch, read_batches))

API reference

Class Description
ReferenceIndex Load/build indices, access metadata, create engines
MappingEngine Map individual reads or read pairs
MappingResult Mapping output: type + list of hits
MappingHit Single hit: reference ID, position, orientation, score, fragment info
StreamingQuery Low-level sliding-window k-mer queries
KmerHit K-mer lookup result: unitig coordinates + reference positions
RefPos Position on a reference (tid, pos, orientation)

License

BSD 3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piscem-0.1.2.tar.gz (173.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

piscem-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

piscem-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.4 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

piscem-0.1.2-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (2.6 MB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file piscem-0.1.2.tar.gz.

File metadata

  • Download URL: piscem-0.1.2.tar.gz
  • Upload date:
  • Size: 173.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for piscem-0.1.2.tar.gz
Algorithm Hash digest
SHA256 d3a9669865f942543e8cceb4330edd4ac1537b81cf16ceb7b6eee65e6c9d9344
MD5 be625312fd5c356b2ff6c5b1fa4b4bb8
BLAKE2b-256 b5ab95adbe8f7d73687cbbcc2aae0b1eb5fb90b16f97bae5b55cc4488e75a661

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.2.tar.gz:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for piscem-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 176faabf3b7082b3651df9728bfd16fd26fa824b5830436967786bc6ac80dcca
MD5 d961838f1eff0cdffbf7d27b7d919e18
BLAKE2b-256 81bf4bec9c1f4dcc395b8811a04102e9c9dded697e2494aca24bb262a9c997b2

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.2-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for piscem-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 0c31aa62d6ddff233d391751072aa23e94c0587e646d4bc4f7a1e2dc6c0f3293
MD5 1ab31d1e27c9fc518110befe0f20623d
BLAKE2b-256 df47d5b4f5d3da25fc0d56f704bb0d4647534a9c0d4f73b15fe5751b76f1d818

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.2-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for piscem-0.1.2-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 7845b5434cc1c04ed6243867470fc99b13bdbfdee4d429de159c7c351629bc5c
MD5 41c1b80b3fd0f66611d037ad2202ff5d
BLAKE2b-256 165c4fbb76fd2500c405bc89150f4ac216f6547cce40e2c469a50d35072f8fce

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.2-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page