Skip to main content

Search, download, and load SCMORA .h5mu datasets from Hugging Face.

Project description

scmora-db

scmora-db is a small Python package for searching, downloading, and loading SCMORA .h5mu datasets from the Hugging Face dataset repository shiny321/genome-db.

The package ships with a lightweight metadata catalog. Large .h5mu files stay on Hugging Face and are downloaded only when requested.

Installation

Python 3.10 or newer is required.

pip install scmora-db

This installs everything needed to search, download, and load .h5mu files as MuData objects.

For local development:

cd path/to/pkg
pip install -e ".[dev]"

Python API

from scmora_db import search_datasets, download_datasets, load_datasets

catalog = search_datasets(dataset_id="GSM5085810_GM12878_rep1")

paths = download_datasets(
    detailed_condition="Control",
    usage_tag="control",
)

mdata = load_datasets(
    dataset_id="GSM5085810_GM12878_rep1",
    backed="r",
)

Supported filters:

  • dataset_id
  • dataset_uid
  • gse_id
  • detailed_condition
  • usage_tag
  • detail_source
  • condition
  • sample_type
  • species
  • reference

dataset_uid is the safest unique identifier and is formatted as GSE_id/dataset_id.

Multi-Match Rule

download_datasets() and load_datasets() use this default rule:

  • one match: return one path or one MuData object
  • two to five matches: return a list
  • more than five matches: stop and report all matched dataset_uid values

This prevents accidental large downloads.

Command Line

scmora-db search --usage-tag control
scmora-db search --detailed-condition Control
scmora-db search --detail-source "GM12878 (Cell Line)"
scmora-db download --dataset-id GSM5085810_GM12878_rep1
scmora-db load --dataset-id GSM5085810_GM12878_rep1 --backed r

List available metadata values:

scmora-db list dataset-ids
scmora-db list dataset-uids
scmora-db list gse-ids
scmora-db list usage-tags
scmora-db list groups
scmora-db list condition
scmora-db list detailed-conditions
scmora-db list detail-sources
scmora-db list sample-types
scmora-db list species
scmora-db list references

Useful options:

scmora-db download --cache-dir ./hf-cache --local-dir ./data
scmora-db search --prefer-remote
scmora-db download --revision main
scmora-db download --token hf_xxx

Metadata

The bundled metadata file contains 277 datasets and these core columns:

  • dataset_uid
  • dataset_id
  • gse_id
  • file_path
  • file_name
  • species
  • reference
  • group
  • usage_primary
  • usage_tags
  • sample_type
  • detail_source
  • condition
  • detailed_condition

The file_path values are relative to shiny321/genome-db, for example:

GSE166797/GSM5085810_GM12878_rep1.h5mu

Build and Publish

python -m pip install -U build twine
python -m build
python -m twine check dist/*
python -m twine upload --repository testpypi dist/*

After testing on TestPyPI:

python -m twine upload dist/*

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scmora_db-0.1.0.tar.gz (25.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scmora_db-0.1.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file scmora_db-0.1.0.tar.gz.

File metadata

  • Download URL: scmora_db-0.1.0.tar.gz
  • Upload date:
  • Size: 25.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scmora_db-0.1.0.tar.gz
Algorithm Hash digest
SHA256 65cfb2d74225e2f16afcdb4d07ee8096bdcedc02d408f62f7d9a7dc0fa8214d3
MD5 6faa1caa9193b6cb27a9fe2448a9a79c
BLAKE2b-256 5445d392082ca2210904f69829dc1f12bcd27a121ef9b52b86678f8db61f5357

See more details on using hashes here.

Provenance

The following attestation bundles were made for scmora_db-0.1.0.tar.gz:

Publisher: publish.yml on wangyangtang404-web/scmora-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scmora_db-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scmora_db-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scmora_db-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3ac70c3af3062bec87ed9a2a9f606695e85258e5b305e837268531828f0eec71
MD5 262e01ed6e8bb63cd75de3416e3f934c
BLAKE2b-256 84ea22366d0a20d4e76c1256be2b61a5d994d6fd04aca6075b1b9a189cf8f01b

See more details on using hashes here.

Provenance

The following attestation bundles were made for scmora_db-0.1.0-py3-none-any.whl:

Publisher: publish.yml on wangyangtang404-web/scmora-db

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page