Skip to main content

Python interface for misha genomic databases with C++ streaming backends

Project description

PyMisha

PyPI CI

Python interface for misha genomic databases. PyMisha provides full read/write access to misha track databases with C++ streaming backends for genome-scale operations.

PyMisha

Features

  • 1D and 2D track support: Dense, sparse, and 2D (rectangle/point) tracks with full CRUD operations.
  • C++ streaming backends: Extraction, summary, quantiles, distribution, lookup, segmentation, Wilcoxon tests, correlation, and sampling all stream through C++ for performance.
  • Virtual tracks: Computed-on-the-fly track views with filtering, shifting, and 30+ aggregation functions.
  • Interval operations: Union, intersection, difference, canonicalization, neighbors, annotation, normalization, random generation, and liftover.
  • Sequence analysis: Extraction, k-mer counting, PWM/PSSM scoring, and Markov-chain synthesis (gsynth).
  • Database management: Create, link, convert, and manage misha-compatible genomic databases.
  • R misha compatibility: Reads and writes the same on-disk formats as R misha (123/145 R exports covered).

Installation

pip install pymisha

Pre-built wheels are available for Linux (x86_64) and macOS (x86_64 and arm64), Python 3.10-3.12.

To install from source (requires a C++17 compiler and numpy):

pip install -e ".[dev]"

Quick start

PyMisha ships with a built-in examples database so you can start exploring immediately -- no external data needed:

import pymisha as pm

# Option 1: one-liner to load the bundled examples database
pm.gdb_init_examples()

# Option 2: equivalent explicit form
pm.gsetroot(pm.gdb_examples_path())

# List available tracks and extract data
print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

To connect to your own misha database, use gsetroot:

import pymisha as pm

# Initialize the database
pm.gsetroot("/path/to/misha_db")

# Create intervals and extract data
intervals = pm.gintervals_from_strings(["chr1:0-1000", "chr1:2000-2600"])
out = pm.gextract("track1", intervals, iterator=100)

# Filter and summarize
filtered = pm.gscreen("track1 > 0.5", intervals)
stats = pm.gsummary("track1", intervals)

Thread safety

PyMisha inherits R misha's single-threaded design. Keep the following constraints in mind:

  • Not thread-safe. All module-level state (_GROOT, _UROOT, _VTRACKS, CONFIG) is process-global and unsynchronized. Do not call PyMisha from multiple threads concurrently.
  • One database per process. You cannot have two databases open simultaneously; gsetroot() replaces the active database globally.
  • CONFIG is global. Changing settings like max_processes affects every subsequent operation in the process.
  • Multiprocessing uses fork(). The C++ backend parallelizes via fork() with shared memory (mmap) and semaphores. This is transparent to the caller but means PyMisha should not be used inside already-forked worker processes or with fork-unsafe libraries.

Examples

Using the built-in example database:

import pymisha as pm

# Quickest way to get started
pm.gdb_init_examples()

# Or equivalently, using gsetroot with the examples path
pm.gsetroot(pm.gdb_examples_path())

print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

Creating a genome database

PyMisha ships prebuilt genome databases for common assemblies. Download and set up with a single call:

import pymisha as pm

# Download a prebuilt genome (mm9, mm10, mm39, hg19, hg38)
pm.gdb_create_genome("hg38", path="/data/genomes")   # creates /data/genomes/hg38/
pm.gsetroot("/data/genomes/hg38")

pm.gchrom_sizes()  # verify it worked

To build a database from your own FASTA files (e.g. a custom assembly):

pm.gdb_create("/data/my_genome", "genome.fa.gz", verbose=True)
pm.gsetroot("/data/my_genome")

See the Creating Genome Databases tutorial for UCSC download workflows and advanced options.

Optional dependencies

  • pyBigWig: For BigWig import in gtrack_import.
  • pyreadr + Rscript: For loading R-serialized big interval sets.
  • PyYAML: For richer gdataset_info metadata parsing.

Missing features

Compared to R misha, the following are not yet implemented:

  • Track Arrays: gtrack.array.* and gvtrack.array.slice.
  • Legacy Conversion: gtrack.convert (for migrating old 2D formats).

License

MIT. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymisha-0.1.28.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymisha-0.1.28-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pymisha-0.1.28-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pymisha-0.1.28-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pymisha-0.1.28-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pymisha-0.1.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file pymisha-0.1.28.tar.gz.

File metadata

  • Download URL: pymisha-0.1.28.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymisha-0.1.28.tar.gz
Algorithm Hash digest
SHA256 cdd1861834d6ecfe559d25dd569804276dbc06973fae5d797cb671196d49fc9b
MD5 a3aa3eb0424901cf1e3739b5cc3df2bc
BLAKE2b-256 d3a727098a3f1cdae60c2f37faafa2925dcbe05d3a1c9c43c631caf87418a622

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.28.tar.gz:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.28-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.28-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9f591cbddfb6fbd8548943613d377ec378b5a22ace3163de52385375b16933f8
MD5 023f145db697236320896d2cd1bc8b79
BLAKE2b-256 052129b7cd1faa9696d343176516ad7f212a4acf4c87a12e9c97f9a3fae3ab73

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.28-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.28-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.28-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 4b1f3ded2da133f47cace09efbea930b7c085a2a30dd9ad9c048522c4b733e92
MD5 bef3d1b52ed4beef49c68af117ed64c9
BLAKE2b-256 202eced5ca83588a1ec7a500d304501c89d546379946967b9ad815133b012261

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.28-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.28-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.28-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e41d696ae1d4f6335856f96e6326100a871f4c907495d6c3f121fd1c67c9a97d
MD5 6f898308683bedabc66543e516195c43
BLAKE2b-256 70ec9aa45e4b4b992acd87e7d9bb213d4d8f7b413c72b26606b51dafac08fb7d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.28-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.28-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.28-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b39aa854f634c5b258bec0c4ca3559ead5a5e486f0078e9546d1abd02fb5db49
MD5 be7a4a89729a31ea5533a8d474a1b85f
BLAKE2b-256 4d0527c191081409de343d6f1d01a2bcb993192c7bc76df8b97159a22ead62e8

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.28-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2d8628e7cf287a0259fe3e781cd4605e56ca28283abf73593abd408fcbc74f09
MD5 f13045160c977f27baf435624acfc1ed
BLAKE2b-256 771b5827b1b21549fa08081ee9f98cff3db668187bb3c4d9e3ea6c5a5c7ef475

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.28-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page