Skip to main content

Python interface for misha genomic databases with C++ streaming backends

Project description

PyMisha

PyPI CI

Python interface for misha genomic databases. PyMisha provides full read/write access to misha track databases with C++ streaming backends for genome-scale operations.

PyMisha

Features

  • 1D and 2D track support: Dense, sparse, and 2D (rectangle/point) tracks with full CRUD operations.
  • C++ streaming backends: Extraction, summary, quantiles, distribution, lookup, segmentation, Wilcoxon tests, correlation, and sampling all stream through C++ for performance.
  • Virtual tracks: Computed-on-the-fly track views with filtering, shifting, and 30+ aggregation functions.
  • Interval operations: Union, intersection, difference, canonicalization, neighbors, annotation, normalization, random generation, and liftover.
  • Sequence analysis: Extraction, k-mer counting, PWM/PSSM scoring, and Markov-chain synthesis (gsynth).
  • Database management: Create, link, convert, and manage misha-compatible genomic databases.
  • R misha compatibility: Reads and writes the same on-disk formats as R misha (123/145 R exports covered).

Installation

pip install pymisha

Pre-built wheels are available for Linux (x86_64) and macOS (x86_64 and arm64), Python 3.10-3.12.

To install from source (requires a C++17 compiler and numpy):

pip install -e ".[dev]"

Quick start

PyMisha ships with a built-in examples database so you can start exploring immediately -- no external data needed:

import pymisha as pm

# Option 1: one-liner to load the bundled examples database
pm.gdb_init_examples()

# Option 2: equivalent explicit form
pm.gsetroot(pm.gdb_examples_path())

# List available tracks and extract data
print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

To connect to your own misha database, use gsetroot:

import pymisha as pm

# Initialize the database
pm.gsetroot("/path/to/misha_db")

# Create intervals and extract data
intervals = pm.gintervals_from_strings(["chr1:0-1000", "chr1:2000-2600"])
out = pm.gextract("track1", intervals, iterator=100)

# Filter and summarize
filtered = pm.gscreen("track1 > 0.5", intervals)
stats = pm.gsummary("track1", intervals)

Thread safety

PyMisha inherits R misha's single-threaded design. Keep the following constraints in mind:

  • Not thread-safe. All module-level state (_GROOT, _UROOT, _VTRACKS, CONFIG) is process-global and unsynchronized. Do not call PyMisha from multiple threads concurrently.
  • One database per process. You cannot have two databases open simultaneously; gsetroot() replaces the active database globally.
  • CONFIG is global. Changing settings like max_processes affects every subsequent operation in the process.
  • Multiprocessing uses fork(). The C++ backend parallelizes via fork() with shared memory (mmap) and semaphores. This is transparent to the caller but means PyMisha should not be used inside already-forked worker processes or with fork-unsafe libraries.

Examples

Using the built-in example database:

import pymisha as pm

# Quickest way to get started
pm.gdb_init_examples()

# Or equivalently, using gsetroot with the examples path
pm.gsetroot(pm.gdb_examples_path())

print(pm.gtrack_ls())
print(pm.gextract("dense_track", pm.gintervals("chr1", 0, 1000)))

Creating a genome database

PyMisha ships prebuilt genome databases for common assemblies. Download and set up with a single call:

import pymisha as pm

# Download a prebuilt genome (mm9, mm10, mm39, hg19, hg38)
pm.gdb_create_genome("hg38", path="/data/genomes")   # creates /data/genomes/hg38/
pm.gsetroot("/data/genomes/hg38")

pm.gchrom_sizes()  # verify it worked

To build a database from your own FASTA files (e.g. a custom assembly):

pm.gdb_create("/data/my_genome", "genome.fa.gz", verbose=True)
pm.gsetroot("/data/my_genome")

See the Creating Genome Databases tutorial for UCSC download workflows and advanced options.

Optional dependencies

  • pyBigWig: For BigWig import in gtrack_import.
  • pyreadr + Rscript: For loading R-serialized big interval sets.
  • PyYAML: For richer gdataset_info metadata parsing.

Missing features

Compared to R misha, the following are not yet implemented:

  • Track Arrays: gtrack.array.* and gvtrack.array.slice.
  • Legacy Conversion: gtrack.convert (for migrating old 2D formats).

License

MIT. See LICENSE for details.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymisha-0.1.62.tar.gz (1.3 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pymisha-0.1.62-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.4 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

pymisha-0.1.62-cp312-cp312-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

pymisha-0.1.62-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.4 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

pymisha-0.1.62-cp311-cp311-macosx_11_0_arm64.whl (1.2 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

pymisha-0.1.62-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.4 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

File details

Details for the file pymisha-0.1.62.tar.gz.

File metadata

  • Download URL: pymisha-0.1.62.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymisha-0.1.62.tar.gz
Algorithm Hash digest
SHA256 6e64884f5938dceebfe4ffd81c99a12c899404cb35b918723ccb5577e6c7bb0d
MD5 ecc870906060eff3af00cc5852f34942
BLAKE2b-256 6eee8f9aba5e8eaa0c213f73051f145f7cfe7f567ee86f38ea61177b61cd5a86

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.62.tar.gz:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.62-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.62-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1f9923a30d3612f7b637367a4027b93c9df8c37f7c4e547f83b2030d493908c3
MD5 5af89e3484755cc9f102c9aa97a0c832
BLAKE2b-256 34e4c1e51143d76e2331efe4d8a987958223ae2411f516b96563f3be548966d3

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.62-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.62-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.62-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e9a7b3d672c57bad938d0bb73142b75a0a3657dbb7515de0fbf018d076f4b31d
MD5 03db727ed958291a3c71cc41603dbeca
BLAKE2b-256 787eade25b3b1f4961980275dd97e5623d9da57b8556ec3bbf41801bc353420a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.62-cp312-cp312-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.62-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.62-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3eb0d4c89721410dd3f597617e1d168cd252958343b6a1da7707c44fa5ed71e6
MD5 71da981c44e258c8368a29cf627c966a
BLAKE2b-256 a152cf67906074edeaec8198cdde7cc273b3bda80b26fdd837957f18e0dee59f

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.62-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.62-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.62-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 e8d4d293406dfb1b785e06551365ac4dd68983a93ed2a16f15dfd8a1147288c3
MD5 9710164a9bc5d84c2d4f94c95606e61e
BLAKE2b-256 2ae93bad3deb5bb8dec0839c21bc7cad54f66b9f844354ae52d752438d36cb1d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.62-cp311-cp311-macosx_11_0_arm64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymisha-0.1.62-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pymisha-0.1.62-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c951345b3bbbd9b8df29fffb6bedc1eb23f8475e30701d7a121bc2305d662dce
MD5 591b1b7d3ec4d5fe6a23d56f9e759d02
BLAKE2b-256 ef70adcc3b0588357e3bcbc13da196d6a8809f9f31813447d381c5f7551e6c02

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymisha-0.1.62-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: publish.yml on tanaylab/pymisha

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page