Skip to main content

Python interface for misha genomic databases with C++ streaming backends

Project description

PyMisha

PyPI CI

Python interface for misha genomic databases. PyMisha provides full read/write access to misha track databases with C++ streaming backends for genome-scale operations.

PyMisha

Features

  • 1D and 2D track support: Dense, sparse, and 2D (rectangle/point) tracks with full CRUD operations.
  • C++ streaming backends: Extraction, summary, quantiles, distribution, lookup, segmentation, Wilcoxon tests, correlation, and sampling all stream through C++ for performance.
  • Virtual tracks: Computed-on-the-fly track views with filtering, shifting, and 30+ aggregation functions.
  • Interval operations: Union, intersection, difference, canonicalization, neighbors, annotation, normalization, random generation, and liftover.
  • Sequence analysis: Extraction, k-mer counting, PWM/PSSM scoring, and Markov-chain synthesis (gsynth).
  • Database management: Create, link, convert, and manage misha-compatible genomic databases.
  • R misha compatibility: Reads and writes the same on-disk formats as R misha (123/145 R exports covered).

Installation

pip install pymisha

Pre-built wheels are available for Linux (x86_64) and macOS (x86_64 and arm64), Python 3.10-3.12.

To install from source (requires a C++17 compiler and numpy):

pip install -e ".[dev]"

Quick start

PyMisha ships with a built-in examples database so you can start exploring immediately -- no external data needed:

import pymisha as pm

# Option 1: one-liner to load the bundled examples database
pm.gdb_init_examples()

# Option 2: equivalent explicit form
pm.gsetroot(pm.gdb_examples_path())

# List available tracks and extract data
print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

To connect to your own misha database, use gsetroot:

import pymisha as pm

# Initialize the database
pm.gsetroot("/path/to/misha_db")

# Create intervals and extract data
intervals = pm.gintervals_from_strings(["chr1:0-1000", "chr1:2000-2600"])
out = pm.gextract("track1", intervals, iterator=100)

# Filter and summarize
filtered = pm.gscreen("track1 > 0.5", intervals)
stats = pm.gsummary("track1", intervals)

Thread safety

PyMisha inherits R misha's single-threaded design. Keep the following constraints in mind:

  • Not thread-safe. All module-level state (_GROOT, _UROOT, _VTRACKS, CONFIG) is process-global and unsynchronized. Do not call PyMisha from multiple threads concurrently.
  • One database per process. You cannot have two databases open simultaneously; gsetroot() replaces the active database globally.
  • CONFIG is global. Changing settings like max_processes affects every subsequent operation in the process.
  • Multiprocessing uses fork(). The C++ backend parallelizes via fork() with shared memory (mmap) and semaphores. This is transparent to the caller but means PyMisha should not be used inside already-forked worker processes or with fork-unsafe libraries.

Examples

Using the built-in example database:

import pymisha as pm

# Quickest way to get started
pm.gdb_init_examples()

# Or equivalently, using gsetroot with the examples path
pm.gsetroot(pm.gdb_examples_path())

print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

Creating a genome database

PyMisha ships prebuilt genome databases for common assemblies. Download and set up with a single call:

import pymisha as pm

# Download a prebuilt genome (mm9, mm10, mm39, hg19, hg38)
pm.gdb_create_genome("hg38", path="/data/genomes")   # creates /data/genomes/hg38/
pm.gsetroot("/data/genomes/hg38")

pm.gchrom_sizes()  # verify it worked

To build a database from your own FASTA files (e.g. a custom assembly):

pm.gdb_create("/data/my_genome", "genome.fa.gz", verbose=True)
pm.gsetroot("/data/my_genome")

See the Creating Genome Databases tutorial for UCSC download workflows and advanced options.

Optional dependencies

  • pyBigWig: For BigWig import in gtrack_import.
  • pyreadr + Rscript: For loading R-serialized big interval sets.
  • PyYAML: For richer gdataset_info metadata parsing.

Missing features

Compared to R misha, the following are not yet implemented:

  • Track Arrays: gtrack.array.* and gvtrack.array.slice.
  • Legacy Conversion: gtrack.convert (for migrating old 2D formats).

License

MIT. See LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymisha-0.1.66.tar.gz (1.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymisha-0.1.66-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pymisha-0.1.66-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pymisha-0.1.66-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pymisha-0.1.66-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pymisha-0.1.66-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.5 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file pymisha-0.1.66.tar.gz.

File metadata

  • Download URL: pymisha-0.1.66.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymisha-0.1.66.tar.gz
Algorithm Hash digest
SHA256 65b6dc2a0f35c2d0d51bb66dd9c49dcc46269a5d3e1998351ff222aa19ffdb03
MD5 0f046f4bdf3ed1d1d5dcbc6d35ed38aa
BLAKE2b-256 acd459187d7f86f0272443b9afe8bc748872c5903dc0e5bbbfee3911e5576e77

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.66.tar.gz:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.66-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.66-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 70fe3cd1ba1285f87ecbc87d3a006d93cc51ceaa93b16c6d9da271a89ddc2a8e
MD5 f9c130abcb242a52abcde53f0be85831
BLAKE2b-256 d796c110434d2efd2df5d017cfe5cab41e36fefbcac96bb43ff000d5cbcb1a31

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.66-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.66-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.66-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0bdcb271f84927d8374e444b517c31cd91ac3f42812ee354a204d6b10a1c8a75
MD5 f0e5453740ad74634ba5e338bd376773
BLAKE2b-256 3d0aaaaf18b73b5e1d06e7d65bbf7c787b833f177cc4a2be835d1cf5ff3455b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.66-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.66-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.66-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8196e47177ed521d282237aefcdf56d048d362839731ce39c938f0244a77f464
MD5 37cd550dc264382ebe0a1aa1e0a67e86
BLAKE2b-256 b0f20ddd623afeb02e036b492c763f25468fb3f9648b026c8cf859fa3716dbfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.66-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.66-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.66-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4282ef1deb2b808ce0007e864d629077fa9443f1ce33469fe7fd730ed2705f7f
MD5 600a2e1b08b82f3004594771ee0d85ce
BLAKE2b-256 74ad2d521f3ade6f69f3d3c8b394bc853a98d692802c0d61818c9a7266aa99d8

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.66-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.66-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.66-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fd588e60b84925e48923ecf3d37a67b23a37c42cd85b8bae9491838e243742fe
MD5 b32424b345bdde53f9de99e87596a392
BLAKE2b-256 1565e42d3adcc269865e3c6a948938ef31b9498c739a84349d6b8ddc11e1c372

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.66-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page