Search, download, and load SCMORA .h5mu datasets from Hugging Face.
Project description
scmora-db
scmora-db is a small Python package for searching, downloading, and loading
SCMORA .h5mu datasets from the Hugging Face dataset repository
shiny321/genome-db.
The package ships with a lightweight metadata catalog. Large .h5mu files stay
on Hugging Face and are downloaded only when requested.
Installation
Python 3.10 or newer is required.
pip install scmora-db
This installs everything needed to search, download, and load .h5mu files as
MuData objects.
For local development:
cd path/to/pkg
pip install -e ".[dev]"
Python API
from scmora_db import search_datasets, download_datasets, load_datasets
catalog = search_datasets(dataset_id="GSM5085810_GM12878_rep1")
paths = download_datasets(
detailed_condition="Control",
usage_tag="control",
)
mdata = load_datasets(
dataset_id="GSM5085810_GM12878_rep1",
backed="r",
)
Supported filters:
dataset_iddataset_uidgse_iddetailed_conditionusage_tagdetail_sourceconditionsample_typespeciesreference
dataset_uid is the safest unique identifier and is formatted as
GSE_id/dataset_id.
Multi-Match Rule
download_datasets() and load_datasets() use this default rule:
- one match: return one path or one MuData object
- two to five matches: return a list
- more than five matches: stop and report all matched
dataset_uidvalues
This prevents accidental large downloads.
Command Line
scmora-db search --usage-tag control
scmora-db search --detailed-condition Control
scmora-db search --detail-source "GM12878 (Cell Line)"
scmora-db download --dataset-id GSM5085810_GM12878_rep1
scmora-db load --dataset-id GSM5085810_GM12878_rep1 --backed r
List available metadata values:
scmora-db list dataset-ids
scmora-db list dataset-uids
scmora-db list gse-ids
scmora-db list usage-tags
scmora-db list groups
scmora-db list condition
scmora-db list detailed-conditions
scmora-db list detail-sources
scmora-db list sample-types
scmora-db list species
scmora-db list references
Useful options:
scmora-db download --cache-dir ./hf-cache --local-dir ./data
scmora-db search --prefer-remote
scmora-db download --revision main
scmora-db download --token hf_xxx
Metadata
The bundled metadata file contains 277 datasets and these core columns:
dataset_uiddataset_idgse_idfile_pathfile_namespeciesreferencegroupusage_primaryusage_tagssample_typedetail_sourceconditiondetailed_condition
The file_path values are relative to shiny321/genome-db, for example:
GSE166797/GSM5085810_GM12878_rep1.h5mu
Build and Publish
python -m pip install -U build twine
python -m build
python -m twine check dist/*
python -m twine upload --repository testpypi dist/*
After testing on TestPyPI:
python -m twine upload dist/*
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scmora_db-0.1.0.tar.gz.
File metadata
- Download URL: scmora_db-0.1.0.tar.gz
- Upload date:
- Size: 25.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
65cfb2d74225e2f16afcdb4d07ee8096bdcedc02d408f62f7d9a7dc0fa8214d3
|
|
| MD5 |
6faa1caa9193b6cb27a9fe2448a9a79c
|
|
| BLAKE2b-256 |
5445d392082ca2210904f69829dc1f12bcd27a121ef9b52b86678f8db61f5357
|
Provenance
The following attestation bundles were made for scmora_db-0.1.0.tar.gz:
Publisher:
publish.yml on wangyangtang404-web/scmora-db
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scmora_db-0.1.0.tar.gz -
Subject digest:
65cfb2d74225e2f16afcdb4d07ee8096bdcedc02d408f62f7d9a7dc0fa8214d3 - Sigstore transparency entry: 1848794096
- Sigstore integration time:
-
Permalink:
wangyangtang404-web/scmora-db@5824e256830e8ee5aa791c804ee9bb89286e9cbc -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/wangyangtang404-web
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5824e256830e8ee5aa791c804ee9bb89286e9cbc -
Trigger Event:
release
-
Statement type:
File details
Details for the file scmora_db-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scmora_db-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ac70c3af3062bec87ed9a2a9f606695e85258e5b305e837268531828f0eec71
|
|
| MD5 |
262e01ed6e8bb63cd75de3416e3f934c
|
|
| BLAKE2b-256 |
84ea22366d0a20d4e76c1256be2b61a5d994d6fd04aca6075b1b9a189cf8f01b
|
Provenance
The following attestation bundles were made for scmora_db-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on wangyangtang404-web/scmora-db
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scmora_db-0.1.0-py3-none-any.whl -
Subject digest:
3ac70c3af3062bec87ed9a2a9f606695e85258e5b305e837268531828f0eec71 - Sigstore transparency entry: 1848794321
- Sigstore integration time:
-
Permalink:
wangyangtang404-web/scmora-db@5824e256830e8ee5aa791c804ee9bb89286e9cbc -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/wangyangtang404-web
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@5824e256830e8ee5aa791c804ee9bb89286e9cbc -
Trigger Event:
release
-
Statement type: