Skip to main content

Python bindings for the piscem k-mer mapping engine

Project description

piscem

Python bindings for piscem, a fast and accurate tool for k-mer-based read mapping against a reference transcriptome or genome. piscem lets you load a pre-built index, map individual reads or read pairs, and perform low-level k-mer queries — all from Python, without running full CLI pipelines.

Installation

pip install piscem

To build from source (requires Rust and maturin):

cd crates/py-piscem
maturin develop --release

Quick start

import piscem

# Load a pre-built piscem index
index = piscem.ReferenceIndex.load("path/to/index_prefix")

print(f"k={index.k}, {index.num_refs} references, {index.num_contigs} unitigs")

Mapping reads

Create a MappingEngine from the index, then map reads one at a time. Each call returns a MappingResult containing the mapping type and a list of hits.

Single-end mapping

engine = index.mapping_engine()

result = engine.map_read(b"ACGTACGT...")
if result.is_mapped:
    for hit in result.hits:
        print(f"{hit.ref_name}  pos={hit.pos}  fw={hit.is_fw}  score={hit.score}")

Paired-end mapping

result = engine.map_read_pair(read1_seq, read2_seq)

if result.mapping_type == "mapped_pair":
    hit = result.hits[0]
    print(f"{hit.ref_name}  frag_len={hit.fragment_length}")

The mapping_type field is one of "unmapped", "single_mapped", "mapped_pair", "mapped_first_orphan", or "mapped_second_orphan".

Mapping strategy

Two strategies are available, matching the piscem CLI:

engine = index.mapping_engine(strategy="permissive")  # default — faster, skips along unitigs
engine = index.mapping_engine(strategy="strict")       # queries every k-mer position

You can also tune the maximum k-mer occurrence threshold and maximum mappings per read:

engine = index.mapping_engine(max_hit_occ=512, max_read_occ=5000)

Virtual colors mode (binned mapping)

For scATAC-seq and other binned genomic mapping workflows, create a virtual-color engine that accumulates hits per genomic bin:

vc_engine = index.vcolor_engine(bin_size=2000, overlap=400, thr=0.7)

result = vc_engine.map_read_pair(r1, r2)
for hit in result.hits:
    print(f"bin={hit.bin_id}  tid={hit.tid}  pos={hit.pos}")

Streaming k-mer queries

For low-level access, a StreamingQuery slides a k-mer window across a sequence and resolves each k-mer against the index, returning unitig and reference coordinates. Consecutive k-mers on the same unitig are resolved by extension rather than a full dictionary lookup.

sq = index.streaming_query()

hits = sq.query_sequence(b"ACGTACGT...")
for i, hit in enumerate(hits):
    if hit is not None:
        for rp in hit.ref_positions:
            print(f"kmer {i}: ref {rp.tid} pos {rp.pos} fw={rp.is_fw}")

print(f"{sq.num_searches} full lookups, {sq.num_extensions} extensions")

Index metadata

index.k               # k-mer size
index.m               # minimizer length
index.num_refs         # number of reference sequences
index.num_contigs      # number of unitigs
index.has_ec_table     # equivalence class table loaded?
index.has_poison_table # poison k-mer table loaded?

index.ref_name(0)      # name of the first reference
index.ref_len(0)       # length of the first reference
index.ref_names()      # list of all reference names
index.ref_lengths()    # list of all reference lengths

Building an index

You can build a new index from cuttlefish output directly from Python:

index = piscem.ReferenceIndex.build(
    "path/to/cuttlefish_prefix",   # .cf_seg, .cf_seq, .json
    "path/to/output_prefix",       # output index files
    k=31,
    m=19,
    threads=8,
)

To build an index with a poison k-mer table (for filtering spurious mappings near decoy boundaries), pass one or more decoy FASTA files:

index = piscem.ReferenceIndex.build(
    "path/to/cuttlefish_prefix",
    "path/to/output_prefix",
    k=31,
    m=19,
    threads=8,
    decoys=["path/to/decoys.fa.gz"],
)
print(f"Poison table: {index.has_poison_table}")  # True

Thread safety

  • ReferenceIndex is immutable and can be shared freely across threads.
  • MappingEngine and StreamingQuery hold mutable per-read state — each thread should create its own via index.mapping_engine() or index.streaming_query().
from concurrent.futures import ThreadPoolExecutor

def map_batch(reads):
    eng = index.mapping_engine()
    return [eng.map_read(r) for r in reads]

with ThreadPoolExecutor(max_workers=4) as pool:
    results = list(pool.map(map_batch, read_batches))

API reference

Class Description
ReferenceIndex Load/build indices, access metadata, create engines
MappingEngine Map individual reads or read pairs
MappingResult Mapping output: type + list of hits
MappingHit Single hit: reference ID, position, orientation, score, fragment info
StreamingQuery Low-level sliding-window k-mer queries
KmerHit K-mer lookup result: unitig coordinates + reference positions
RefPos Position on a reference (tid, pos, orientation)

License

BSD 3-Clause

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piscem-0.1.4.tar.gz (176.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

piscem-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

piscem-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (1.5 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

piscem-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (2.9 MB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file piscem-0.1.4.tar.gz.

File metadata

  • Download URL: piscem-0.1.4.tar.gz
  • Upload date:
  • Size: 176.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for piscem-0.1.4.tar.gz
Algorithm Hash digest
SHA256 ea13d442764488929ff8edea4e97f8609eb0f94132f9373dd78946b26b6c930b
MD5 3a1331384fcf121c6dac9f8941e7a680
BLAKE2b-256 0b65ef401867b375ede5bb75d6b3fae33cc8be2608e66fba8013171b17752175

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.4.tar.gz:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for piscem-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 286b570a13f2101fe2294b5cc7b9ff9c859c2fce9b49b532a97c4bc1b4ea8131
MD5 0ed1f9d719c90c980f2d5fa1cac841fb
BLAKE2b-256 ad48242767e145b938b1d36b32b481b4e3ab6d28fef4ca88ad2153dd3a25f80d

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.4-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for piscem-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 f0ce93f1cf54cd7484871de039c728af079b992e18d53a4b1185b43149689102
MD5 0ffa2afcc5a211f4fc301c0baf2dd706
BLAKE2b-256 5c4ef79aabeb6a723af2807c5d624e63299aa367a42580d7f40a577739d5fa78

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.4-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file piscem-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for piscem-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 44433d6055e1dd2294626a63ff7ed11211e0e6a1e058e923fc89e3e57a2c3a9b
MD5 c78acd8df65aa68b4fe80d17a6931ee4
BLAKE2b-256 d418fa4e5bb0393ea2bdef4be7ce2ee893bdfca766d1d547e0b565366c4d0975

See more details on using hashes here.

Provenance

The following attestation bundles were made for piscem-0.1.4-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: publish-py-piscem.yml on COMBINE-lab/piscem-rs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page