Skip to main content

Python interface for misha genomic databases with C++ streaming backends

Project description

PyMisha

PyPI CI

Python interface for misha genomic databases. PyMisha provides full read/write access to misha track databases with C++ streaming backends for genome-scale operations.

PyMisha

Features

  • 1D and 2D track support: Dense, sparse, and 2D (rectangle/point) tracks with full CRUD operations.
  • C++ streaming backends: Extraction, summary, quantiles, distribution, lookup, segmentation, Wilcoxon tests, correlation, and sampling all stream through C++ for performance.
  • Virtual tracks: Computed-on-the-fly track views with filtering, shifting, and 30+ aggregation functions.
  • Interval operations: Union, intersection, difference, canonicalization, neighbors, annotation, normalization, random generation, and liftover.
  • Sequence analysis: Extraction, k-mer counting, PWM/PSSM scoring, and Markov-chain synthesis (gsynth).
  • Database management: Create, link, convert, and manage misha-compatible genomic databases.
  • R misha compatibility: Reads and writes the same on-disk formats as R misha (123/145 R exports covered).

Installation

pip install pymisha

Pre-built wheels are available for Linux (x86_64) and macOS (x86_64 and arm64), Python 3.10-3.12.

To install from source (requires a C++17 compiler and numpy):

pip install -e ".[dev]"

Quick start

PyMisha ships with a built-in examples database so you can start exploring immediately -- no external data needed:

import pymisha as pm

# Option 1: one-liner to load the bundled examples database
pm.gdb_init_examples()

# Option 2: equivalent explicit form
pm.gsetroot(pm.gdb_examples_path())

# List available tracks and extract data
print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

To connect to your own misha database, use gsetroot:

import pymisha as pm

# Initialize the database
pm.gsetroot("/path/to/misha_db")

# Create intervals and extract data
intervals = pm.gintervals_from_strings(["chr1:0-1000", "chr1:2000-2600"])
out = pm.gextract("track1", intervals, iterator=100)

# Filter and summarize
filtered = pm.gscreen("track1 > 0.5", intervals)
stats = pm.gsummary("track1", intervals)

Thread safety

PyMisha inherits R misha's single-threaded design. Keep the following constraints in mind:

  • Not thread-safe. All module-level state (_GROOT, _UROOT, _VTRACKS, CONFIG) is process-global and unsynchronized. Do not call PyMisha from multiple threads concurrently.
  • One database per process. You cannot have two databases open simultaneously; gsetroot() replaces the active database globally.
  • CONFIG is global. Changing settings like max_processes affects every subsequent operation in the process.
  • Multiprocessing uses fork(). The C++ backend parallelizes via fork() with shared memory (mmap) and semaphores. This is transparent to the caller but means PyMisha should not be used inside already-forked worker processes or with fork-unsafe libraries.

Examples

Using the built-in example database:

import pymisha as pm

# Quickest way to get started
pm.gdb_init_examples()

# Or equivalently, using gsetroot with the examples path
pm.gsetroot(pm.gdb_examples_path())

print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

Creating a genome database

PyMisha ships prebuilt genome databases for common assemblies. Download and set up with a single call:

import pymisha as pm

# Download a prebuilt genome (mm9, mm10, mm39, hg19, hg38)
pm.gdb_create_genome("hg38", path="/data/genomes")   # creates /data/genomes/hg38/
pm.gsetroot("/data/genomes/hg38")

pm.gchrom_sizes()  # verify it worked

To build a database from your own FASTA files (e.g. a custom assembly):

pm.gdb_create("/data/my_genome", "genome.fa.gz", verbose=True)
pm.gsetroot("/data/my_genome")

See the Creating Genome Databases tutorial for UCSC download workflows and advanced options.

Optional dependencies

  • pyBigWig: For BigWig import in gtrack_import.
  • pyreadr + Rscript: For loading R-serialized big interval sets.
  • PyYAML: For richer gdataset_info metadata parsing.

Missing features

Compared to R misha, the following are not yet implemented:

  • Track Arrays: gtrack.array.* and gvtrack.array.slice.
  • Legacy Conversion: gtrack.convert (for migrating old 2D formats).

License

MIT. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymisha-0.1.32.tar.gz (1.2 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymisha-0.1.32-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pymisha-0.1.32-cp312-cp312-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pymisha-0.1.32-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pymisha-0.1.32-cp311-cp311-macosx_11_0_arm64.whl (1.1 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pymisha-0.1.32-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (7.8 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file pymisha-0.1.32.tar.gz.

File metadata

  • Download URL: pymisha-0.1.32.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymisha-0.1.32.tar.gz
Algorithm Hash digest
SHA256 df3eeec5687c57e24663629ed04913aeb13c578d8114de84b36448c4c2e575f7
MD5 896b95d268f5641848ad52fd27c2d83f
BLAKE2b-256 bd187f54dbe217a27c5b9c84cb893da9261392788e3ea010eb24f25d88312be3

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.32.tar.gz:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.32-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.32-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c8b4fa337c2397f80f4c6caea791cbd9e6b0d7c80bf5fe6b568e4d730c52f3d0
MD5 9623e357c3469ede5ee4efc100f70d57
BLAKE2b-256 75314bc40c4e63e2041a19a3643ad85c2cdc85ad2af72b38fca1477c709111f9

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.32-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.32-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.32-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1794df56ed6fc19d7fdf801d1063051740e3dc26bf3897aa80c00c081e22ca6e
MD5 2c1442259ce82ae11ca0c5d076f47ebe
BLAKE2b-256 c470f1bb25c54f3c5b3c5896d02b1bfe4d85e3fbf93d74b3b043b9f08523045f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.32-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.32-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.32-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d506db75de569e5544f1cb2958c7ae13c695e4613187002d344d367f29961c56
MD5 540c0012b745e97d903c337a5da4af02
BLAKE2b-256 05df6b8b5b4fa074cd3abf22296e4f1e54fc16af0a7c78637592637f6278725a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.32-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.32-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.32-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9df4f007f395f2ebcedfa596852b34cdb87b974c114e3a8251b6d4c128468a6f
MD5 687db7638a166b782f9f7be6345d4226
BLAKE2b-256 dd16d05e08e45856c47ff217fc3472d5d2f3981ef63826a7c18c04a8117663e9

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.32-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.32-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.32-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e42533c2ac6089ba450d58f5603227024c0a1fbe33fbbf647f0ef074bae2c6b8
MD5 449551d36f24acf77300806cfea67be2
BLAKE2b-256 68f7af5b51c35461c2547520e268100d5e0748340dc8ff71c7316879e56b5b70

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.32-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page