Skip to main content

Python interface for misha genomic databases with C++ streaming backends

Project description

PyMisha

PyPI CI

Python interface for misha genomic databases. PyMisha provides full read/write access to misha track databases with C++ streaming backends for genome-scale operations.

PyMisha

Features

  • 1D and 2D track support: Dense, sparse, and 2D (rectangle/point) tracks with full CRUD operations.
  • C++ streaming backends: Extraction, summary, quantiles, distribution, lookup, segmentation, Wilcoxon tests, correlation, and sampling all stream through C++ for performance.
  • Virtual tracks: Computed-on-the-fly track views with filtering, shifting, and 30+ aggregation functions.
  • Interval operations: Union, intersection, difference, canonicalization, neighbors, annotation, normalization, random generation, and liftover.
  • Sequence analysis: Extraction, k-mer counting, PWM/PSSM scoring, and Markov-chain synthesis (gsynth).
  • Database management: Create, link, convert, and manage misha-compatible genomic databases.
  • R misha compatibility: Reads and writes the same on-disk formats as R misha (123/145 R exports covered).

Installation

pip install pymisha

Pre-built wheels are available for Linux (x86_64) and macOS (x86_64 and arm64), Python 3.10-3.12.

To install from source (requires a C++17 compiler and numpy):

pip install -e ".[dev]"

Quick start

PyMisha ships with a built-in examples database so you can start exploring immediately -- no external data needed:

import pymisha as pm

# Option 1: one-liner to load the bundled examples database
pm.gdb_init_examples()

# Option 2: equivalent explicit form
pm.gsetroot(pm.gdb_examples_path())

# List available tracks and extract data
print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

To connect to your own misha database, use gsetroot:

import pymisha as pm

# Initialize the database
pm.gsetroot("/path/to/misha_db")

# Create intervals and extract data
intervals = pm.gintervals_from_strings(["chr1:0-1000", "chr1:2000-2600"])
out = pm.gextract("track1", intervals, iterator=100)

# Filter and summarize
filtered = pm.gscreen("track1 > 0.5", intervals)
stats = pm.gsummary("track1", intervals)

Thread safety

PyMisha inherits R misha's single-threaded design. Keep the following constraints in mind:

  • Not thread-safe. All module-level state (_GROOT, _UROOT, _VTRACKS, CONFIG) is process-global and unsynchronized. Do not call PyMisha from multiple threads concurrently.
  • One database per process. You cannot have two databases open simultaneously; gsetroot() replaces the active database globally.
  • CONFIG is global. Changing settings like max_processes affects every subsequent operation in the process.
  • Multiprocessing uses fork(). The C++ backend parallelizes via fork() with shared memory (mmap) and semaphores. This is transparent to the caller but means PyMisha should not be used inside already-forked worker processes or with fork-unsafe libraries.

Examples

Using the built-in example database:

import pymisha as pm

# Quickest way to get started
pm.gdb_init_examples()

# Or equivalently, using gsetroot with the examples path
pm.gsetroot(pm.gdb_examples_path())

print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

Creating a genome database

PyMisha ships prebuilt genome databases for common assemblies. Download and set up with a single call:

import pymisha as pm

# Download a prebuilt genome (mm9, mm10, mm39, hg19, hg38)
pm.gdb_create_genome("hg38", path="/data/genomes")   # creates /data/genomes/hg38/
pm.gsetroot("/data/genomes/hg38")

pm.gchrom_sizes()  # verify it worked

To build a database from your own FASTA files (e.g. a custom assembly):

pm.gdb_create("/data/my_genome", "genome.fa.gz", verbose=True)
pm.gsetroot("/data/my_genome")

See the Creating Genome Databases tutorial for UCSC download workflows and advanced options.

Optional dependencies

  • pyBigWig: For BigWig import in gtrack_import.
  • pyreadr + Rscript: For loading R-serialized big interval sets.
  • PyYAML: For richer gdataset_info metadata parsing.

Using pymisha with an LLM agent

LLM coding agents (Claude Code, Copilot, Cursor) writing pymisha analysis code can pre-load these reference docs into context for fewer hallucinated APIs and more idiomatic recipes:

For agents that fetch context by URL (rather than from a cloned repo), drop these raw URLs into the system prompt:

https://raw.githubusercontent.com/tanaylab/pymisha/main/agent-guides/pymisha-core.md
https://raw.githubusercontent.com/tanaylab/pymisha/main/agent-guides/pymisha-advanced.md
https://raw.githubusercontent.com/tanaylab/pymisha/main/agent-guides/pymisha-anti-patterns.md
https://raw.githubusercontent.com/tanaylab/pymisha/main/agent-guides/skills/importing-tracks/SKILL.md

The guides mirror the equivalent set in R misha — same section numbering, same recipes, translated to the pymisha API.

Missing features

Compared to R misha, the following are not yet implemented:

  • Track Arrays: gtrack.array.* and gvtrack.array.slice.
  • Legacy Conversion: gtrack.convert (for migrating old 2D formats).

License

MIT. See LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymisha-0.2.3.tar.gz (1.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymisha-0.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.3 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pymisha-0.2.3-cp312-cp312-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pymisha-0.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pymisha-0.2.3-cp311-cp311-macosx_11_0_arm64.whl (1.4 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pymisha-0.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (11.2 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file pymisha-0.2.3.tar.gz.

File metadata

  • Download URL: pymisha-0.2.3.tar.gz
  • Upload date:
  • Size: 1.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymisha-0.2.3.tar.gz
Algorithm Hash digest
SHA256 043eb3c9e165421401af60abbe9c5287e30768084cbac8577f04fb21bcd6a286
MD5 b0d0e0eea0b999b3bac7f636434a50c1
BLAKE2b-256 7d5dc7829fc9d67a6cd020d9430a8eab3f4e8b2ae4aed6dbb33ae469fd08d69e

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.2.3.tar.gz:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b883f67f24ea9add8ef60b53ccece9a28c57efceff851372713c465bbc2f366d
MD5 41a418722134432d66f3fa76e4f4b04d
BLAKE2b-256 70ba7587c90de03f5690d00ddd80acb52ee43b964984d7ce026a1165ed4fbd28

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.2.3-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.2.3-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.2.3-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 fb71a42a29a535f0e880dd4933078464bf49b95ade37d3de5bae6d6d8010264a
MD5 587c84dca77d269c999a0c7c60cc3cb2
BLAKE2b-256 3f852cfff4cece231829aeb60692a69af18a28d14cda0665f716f613c040a6d9

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.2.3-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 a082808ade92e55f7fafc5b03477a80e0657d0d93efaad0201b38e62f76035c9
MD5 44d685c7b307c25e1b8bf6c340952a5c
BLAKE2b-256 5cba994b3fb2b9c36ab1172ab86b0733715d446c45cc16872f6a8423a97fe01a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.2.3-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.2.3-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.2.3-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 dbd5e6e020c5cb606a68c801d8926117c1a427a36026367e83d33562499da335
MD5 7a13766e77f094d99b73c27601d634e0
BLAKE2b-256 694243d32f5143a183277f0acd45f2d101fbc8b03372888c99ebf23edc0a3569

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.2.3-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c36114652308ca20ff7055cf58151c8a5af454ec98e5737b148af15a4f3dd42c
MD5 a468f0fae99300a0c8c06eadad3a8181
BLAKE2b-256 ff5885ee40ba8dfb55f81efcfa8fc452d724a4d220b33dd12a093f3651c5c05a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.2.3-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page