Skip to main content

Python bindings for the piscem k-mer mapping engine

Project description

piscem

Python bindings for piscem, a fast and accurate tool for k-mer-based read mapping against a reference transcriptome or genome. piscem lets you load a pre-built index, map individual reads or read pairs, and perform low-level k-mer queries — all from Python, without running full CLI pipelines.

Installation

pip install piscem

To build from source (requires Rust and maturin):

cd crates/py-piscem
maturin develop --release

Quick start

import piscem

# Load a pre-built piscem index
index = piscem.ReferenceIndex.load("path/to/index_prefix")

print(f"k={index.k}, {index.num_refs} references, {index.num_contigs} unitigs")

Mapping reads

Create a MappingEngine from the index, then map reads one at a time. Each call returns a MappingResult containing the mapping type and a list of hits.

Single-end mapping

engine = index.mapping_engine()

result = engine.map_read(b"ACGTACGT...")
if result.is_mapped:
    for hit in result.hits:
        print(f"{hit.ref_name}  pos={hit.pos}  fw={hit.is_fw}  score={hit.score}")

Paired-end mapping

result = engine.map_read_pair(read1_seq, read2_seq)

if result.mapping_type == "mapped_pair":
    hit = result.hits[0]
    print(f"{hit.ref_name}  frag_len={hit.fragment_length}")

The mapping_type field is one of "unmapped", "single_mapped", "mapped_pair", "mapped_first_orphan", or "mapped_second_orphan".

Mapping strategy

Two strategies are available, matching the piscem CLI:

engine = index.mapping_engine(strategy="permissive")  # default — faster, skips along unitigs
engine = index.mapping_engine(strategy="strict")       # queries every k-mer position

You can also tune the maximum k-mer occurrence threshold and maximum mappings per read:

engine = index.mapping_engine(max_hit_occ=512, max_read_occ=5000)

Virtual colors mode (binned mapping)

For scATAC-seq and other binned genomic mapping workflows, create a virtual-color engine that accumulates hits per genomic bin:

vc_engine = index.vcolor_engine(bin_size=2000, overlap=400, thr=0.7)

result = vc_engine.map_read_pair(r1, r2)
for hit in result.hits:
    print(f"bin={hit.bin_id}  tid={hit.tid}  pos={hit.pos}")

Streaming k-mer queries

For low-level access, a StreamingQuery slides a k-mer window across a sequence and resolves each k-mer against the index, returning unitig and reference coordinates. Consecutive k-mers on the same unitig are resolved by extension rather than a full dictionary lookup.

sq = index.streaming_query()

hits = sq.query_sequence(b"ACGTACGT...")
for i, hit in enumerate(hits):
    if hit is not None:
        for rp in hit.ref_positions:
            print(f"kmer {i}: ref {rp.tid} pos {rp.pos} fw={rp.is_fw}")

print(f"{sq.num_searches} full lookups, {sq.num_extensions} extensions")

Index metadata

index.k               # k-mer size
index.m               # minimizer length
index.num_refs         # number of reference sequences
index.num_contigs      # number of unitigs
index.has_ec_table     # equivalence class table loaded?
index.has_poison_table # poison k-mer table loaded?

index.ref_name(0)      # name of the first reference
index.ref_len(0)       # length of the first reference
index.ref_names()      # list of all reference names
index.ref_lengths()    # list of all reference lengths

Building an index

You can build a new index from cuttlefish output directly from Python:

index = piscem.ReferenceIndex.build(
    "path/to/cuttlefish_prefix",   # .cf_seg, .cf_seq, .json
    "path/to/output_prefix",       # output index files
    k=31,
    m=19,
    threads=8,
)

Thread safety

  • ReferenceIndex is immutable and can be shared freely across threads.
  • MappingEngine and StreamingQuery hold mutable per-read state — each thread should create its own via index.mapping_engine() or index.streaming_query().
from concurrent.futures import ThreadPoolExecutor

def map_batch(reads):
    eng = index.mapping_engine()
    return [eng.map_read(r) for r in reads]

with ThreadPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(map_batch, read_batches))

API reference

Class Description
ReferenceIndex Load/build indices, access metadata, create engines
MappingEngine Map individual reads or read pairs
MappingResult Mapping output: type + list of hits
MappingHit Single hit: reference ID, position, orientation, score, fragment info
StreamingQuery Low-level sliding-window k-mer queries
KmerHit K-mer lookup result: unitig coordinates + reference positions
RefPos Position on a reference (tid, pos, orientation)

License

BSD 3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piscem-0.1.1.tar.gz (172.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

piscem-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

piscem-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.0 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

piscem-0.1.1-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (2.0 MB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file piscem-0.1.1.tar.gz.

File metadata

  • Download URL: piscem-0.1.1.tar.gz
  • Upload date:
  • Size: 172.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for piscem-0.1.1.tar.gz
Algorithm Hash digest
SHA256 371cc0c3efc02a6ce3c5463061dc0131c28c412318e4d7468465b861a342feeb
MD5 6c3d7a1c89b5dab6672903cf62196173
BLAKE2b-256 fd5c1b39cbee59f6d343763cdc28e7eb3e19de014ad07feab1d10195251ae5cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.1.tar.gz:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for piscem-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 44888adae7575b304261345f689b5a99f6014b9ddfe560679c381b079fd7c329
MD5 0d6f196d35740257548336c84cee429a
BLAKE2b-256 5a62fca2b92279e4c2ca772b2780ef93c2ca6baf9f52762c7fa848fd1a20660a

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.1-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for piscem-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 defb310d31933bca8c382ebd5a9d8c7d8fbee52acaf7d98e93cbfd8e0ee1463e
MD5 1fcdfcf9d7be9cd02c130edac56d11a2
BLAKE2b-256 7a9a7362eb4a89857b566a024d30e2184b45730f988685a462480ae8239e4fe9

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.1-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.1-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for piscem-0.1.1-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 65efacbbd2d53b0fcb7353798f9f13f46869b6e2bf0a052d83b12800545055fd
MD5 0accd69e49a1c0c8393041cb13719090
BLAKE2b-256 ea5fe3813e7bc0ad89b77d1ec27c5d2a8b3ace0b05a90f50e6be23ed03bb3c33

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.1-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page