Skip to main content

Python bindings for the piscem k-mer mapping engine

Project description

piscem

Python bindings for piscem, a fast and accurate tool for k-mer-based read mapping against a reference transcriptome or genome. piscem lets you load a pre-built index, map individual reads or read pairs, and perform low-level k-mer queries — all from Python, without running full CLI pipelines.

Installation

pip install piscem

To build from source (requires Rust and maturin):

cd crates/py-piscem
maturin develop --release

Quick start

import piscem

# Load a pre-built piscem index
index = piscem.ReferenceIndex.load("path/to/index_prefix")

print(f"k={index.k}, {index.num_refs} references, {index.num_contigs} unitigs")

Mapping reads

Create a MappingEngine from the index, then map reads one at a time. Each call returns a MappingResult containing the mapping type and a list of hits.

Single-end mapping

engine = index.mapping_engine()

result = engine.map_read(b"ACGTACGT...")
if result.is_mapped:
    for hit in result.hits:
        print(f"{hit.ref_name}  pos={hit.pos}  fw={hit.is_fw}  score={hit.score}")

Paired-end mapping

result = engine.map_read_pair(read1_seq, read2_seq)

if result.mapping_type == "mapped_pair":
    hit = result.hits[0]
    print(f"{hit.ref_name}  frag_len={hit.fragment_length}")

The mapping_type field is one of "unmapped", "single_mapped", "mapped_pair", "mapped_first_orphan", or "mapped_second_orphan".

Mapping strategy

Two strategies are available, matching the piscem CLI:

engine = index.mapping_engine(strategy="permissive")  # default — faster, skips along unitigs
engine = index.mapping_engine(strategy="strict")       # queries every k-mer position

You can also tune the maximum k-mer occurrence threshold and maximum mappings per read:

engine = index.mapping_engine(max_hit_occ=512, max_read_occ=5000)

Virtual colors mode (binned mapping)

For scATAC-seq and other binned genomic mapping workflows, create a virtual-color engine that accumulates hits per genomic bin:

vc_engine = index.vcolor_engine(bin_size=2000, overlap=400, thr=0.7)

result = vc_engine.map_read_pair(r1, r2)
for hit in result.hits:
    print(f"bin={hit.bin_id}  tid={hit.tid}  pos={hit.pos}")

Streaming k-mer queries

For low-level access, a StreamingQuery slides a k-mer window across a sequence and resolves each k-mer against the index, returning unitig and reference coordinates. Consecutive k-mers on the same unitig are resolved by extension rather than a full dictionary lookup.

sq = index.streaming_query()

hits = sq.query_sequence(b"ACGTACGT...")
for i, hit in enumerate(hits):
    if hit is not None:
        for rp in hit.ref_positions:
            print(f"kmer {i}: ref {rp.tid} pos {rp.pos} fw={rp.is_fw}")

print(f"{sq.num_searches} full lookups, {sq.num_extensions} extensions")

Index metadata

index.k               # k-mer size
index.m               # minimizer length
index.num_refs         # number of reference sequences
index.num_contigs      # number of unitigs
index.has_ec_table     # equivalence class table loaded?
index.has_poison_table # poison k-mer table loaded?

index.ref_name(0)      # name of the first reference
index.ref_len(0)       # length of the first reference
index.ref_names()      # list of all reference names
index.ref_lengths()    # list of all reference lengths

Building an index

You can build a new index from cuttlefish output directly from Python:

index = piscem.ReferenceIndex.build(
    "path/to/cuttlefish_prefix",   # .cf_seg, .cf_seq, .json
    "path/to/output_prefix",       # output index files
    k=31,
    m=19,
    threads=8,
)

To build an index with a poison k-mer table (for filtering spurious mappings near decoy boundaries), pass one or more decoy FASTA files:

index = piscem.ReferenceIndex.build(
    "path/to/cuttlefish_prefix",
    "path/to/output_prefix",
    k=31,
    m=19,
    threads=8,
    decoys=["path/to/decoys.fa.gz"],
)
print(f"Poison table: {index.has_poison_table}")  # True

Thread safety

  • ReferenceIndex is immutable and can be shared freely across threads.
  • MappingEngine and StreamingQuery hold mutable per-read state — each thread should create its own via index.mapping_engine() or index.streaming_query().
from concurrent.futures import ThreadPoolExecutor

def map_batch(reads):
    eng = index.mapping_engine()
    return [eng.map_read(r) for r in reads]

with ThreadPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(map_batch, read_batches))

API reference

Class Description
ReferenceIndex Load/build indices, access metadata, create engines
MappingEngine Map individual reads or read pairs
MappingResult Mapping output: type + list of hits
MappingHit Single hit: reference ID, position, orientation, score, fragment info
StreamingQuery Low-level sliding-window k-mer queries
KmerHit K-mer lookup result: unitig coordinates + reference positions
RefPos Position on a reference (tid, pos, orientation)

License

BSD 3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piscem-0.1.3.tar.gz (174.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

piscem-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.6 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

piscem-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

piscem-0.1.3-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (2.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file piscem-0.1.3.tar.gz.

File metadata

  • Download URL: piscem-0.1.3.tar.gz
  • Upload date:
  • Size: 174.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for piscem-0.1.3.tar.gz
Algorithm Hash digest
SHA256 e157b1b284e6e7be10395267223e0420daffa8663877dbbf56502a12f5b4eb6e
MD5 aca3d5fbe9d54b2fba9f6356576ee0f6
BLAKE2b-256 28831df402ca049d33b5373936276c30a0aa434358233eded3eef4a985aa2aa1

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.3.tar.gz:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for piscem-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 f1c8689bfbad997e0ea603c9af0e0cdbfc628e648ca546641127d10720ab9bea
MD5 7bbf49fc9cdc1f301ed64363e373e994
BLAKE2b-256 7cc8ff25f30d690df888a08bb4cc7dde135fbcd8b395a62bcfdcc8bf512ddd44

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.3-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for piscem-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 1945621657607b86b4cfd87f52b6f579fb98482684458f9c9efb1ec007df7cc6
MD5 e3ab70c17cfdd31c04effd2b22d5e185
BLAKE2b-256 b15636920932127d4ac62f88ff61e0b4bbb31ba85f73ad5af900afe12513298c

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.3-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.3-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for piscem-0.1.3-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 8d7badf0dab7661059810bc408e13d23df8736f14f1b06c5cc666dd3e0f763d8
MD5 3b6812728bfb9cfa6350747cc267d6df
BLAKE2b-256 4080b79d50ff4f1930fe7055e61f3e7dfc25a8bcdd7297192da6f8f5d8874d40

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.3-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page