Add your description here
Project description
anndata-metadata
anndata-metadata is a Python library and CLI tool for extracting metadata from AnnData .h5ad files, both locally and on S3. When extracting metadata from S3, it uses partial downloads to dramatically speed up extraction.
It provides utilities to summarize cell, gene, and matrix information, and supports batch processing of directories.
It can create a .parquet index of the metadata for all of the files in a directory (S3 or local).
Library Overview
The core library is in src/anndata_metadata/ and provides:
- Metadata extraction: Functions to extract key metadata (cell count, gene count, matrix format, group contents, etc.) from AnnData
.h5adfiles. - S3 and local support: Utilities to process files both on local disk and in S3 buckets.
- JSON-serializable output: All metadata is returned as Python dictionaries with native types.
Installing
pip install anndata-metadata
CLI Usage
Usage:
usage: anndata-metadata [-h] [-o OBS] [-c COUNT] input_path output
Extract AnnData metadata from file(s) or S3 object(s).
positional arguments:
input_path Input file, directory, S3 URI, or S3 directory URI
output Output filename (JSON for single file, Parquet for directory,
'-' for stdout)
options:
-h, --help show this help message and exit
-o OBS, --obs OBS Observation column to count (can be specified multiple times)
-c COUNT, --count COUNT
Maximum number of files to process (for directories/S3
directories)
Examples:
anndata-metadata data/myfile.h5ad metadata.json
anndata-metadata data/ metadata.parquet
anndata-metadata s3://my-bucket/ metadata.parquet
Development
Setup
This project uses uv for fast Python environment management.
-
Install dependencies:
uv sync # this gets the dependenceis you need to run the command uv sync --group dev # this gets the dev dependencies for testing and formatting
-
Run tests:
uv run pytest
-
Format code:
uv run yapf --recursive . --in-place
-
Type check (mypy):
uv run mypy
-
Run CLI
PYTHONPATH=src uv run python -m anndata_metadata
-
Build and test the wheel
uv run python -m build
and test it using
python -m venv testenv source testenv/bin/activate pip install dist/anndata_metadata-*.whl --force-reinstall
you will now be able to run the cli command like this
anndata-metadata
Project Structure
.
├── src/
│ └── anndata_metadata/
│ ├── extract.py # Core metadata extraction logic
│ └── main.py # CLI entry point
├── test/ # Unit tests for extraction functions and CLI
├── README.md # Project documentation
└── pyproject.toml # Project metadata and dependencies
TODO
- add mypy support
- add a wheel and submit to pypy
- CI/CD pipeline for updating pyp
- write partial results and skip previously written values
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anndata_metadata-0.1.2.tar.gz.
File metadata
- Download URL: anndata_metadata-0.1.2.tar.gz
- Upload date:
- Size: 9.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3ac76e38ff875d24f989d08127d0ed156ae7313536465253097a1a136be2b081
|
|
| MD5 |
63df2e953031313b09f8b005feef42c8
|
|
| BLAKE2b-256 |
848a26bd132f17869361cca839a1c1ebf51ca8cc2a833ca3f4144d4348db1a54
|
File details
Details for the file anndata_metadata-0.1.2-py3-none-any.whl.
File metadata
- Download URL: anndata_metadata-0.1.2-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
647852f1da4c888740b7f7705f8fdd9bb4daa95f08beff855115c3d63d259d0f
|
|
| MD5 |
dd6c548e04229a53f6b622a915cb4c41
|
|
| BLAKE2b-256 |
f167be2482b56c9d257c93cf7ef4b19424727a6c0880c9d61cdbe4cf84f3cd32
|