Skip to main content

Add your description here

Project description

anndata-metadata

anndata-metadata is a Python library and CLI tool for extracting metadata from AnnData .h5ad files, both locally and on S3. When extracting metadata from S3, it uses partial downloads to dramatically speed up extraction.

It provides utilities to summarize cell, gene, and matrix information, and supports batch processing of directories.

It can create a .parquet index of the metadata for all of the files in a directory (S3 or local).

Library Overview

The core library is in src/anndata_metadata/ and provides:

  • Metadata extraction: Functions to extract key metadata (cell count, gene count, matrix format, group contents, etc.) from AnnData .h5ad files.
  • S3 and local support: Utilities to process files both on local disk and in S3 buckets.
  • JSON-serializable output: All metadata is returned as Python dictionaries with native types.

Installing

pip install anndata-metadata

CLI Usage

Usage:

usage: anndata-metadata [-h] [-o OBS] [-c COUNT] input_path output

Extract AnnData metadata from file(s) or S3 object(s).

positional arguments:
  input_path            Input file, directory, S3 URI, or S3 directory URI
  output                Output filename (JSON for single file, Parquet for directory,
                        '-' for stdout)

options:
  -h, --help            show this help message and exit
  -o OBS, --obs OBS     Observation column to count (can be specified multiple times)
  -c COUNT, --count COUNT
                        Maximum number of files to process (for directories/S3
                        directories)

Examples:

anndata-metadata data/myfile.h5ad metadata.json
anndata-metadata data/ metadata.parquet
anndata-metadata s3://my-bucket/ metadata.parquet

Development

Setup

This project uses uv for fast Python environment management.

  1. Install dependencies:

    uv sync # this gets the dependenceis you need to run the command
    uv sync --group dev # this gets the dev dependencies for testing and formatting
    
  2. Run tests:

    uv run pytest
    
  3. Format code:

    uv run yapf --recursive . --in-place
    
  4. Type check (mypy):

    uv run mypy
    
  5. Run CLI

    PYTHONPATH=src uv run python -m anndata_metadata
    
  6. Build and test the wheel

    uv run python -m build
    

    and test it using

     python -m venv testenv
     source testenv/bin/activate
     pip install dist/anndata_metadata-*.whl --force-reinstall   
    

    you will now be able to run the cli command like this

     anndata-metadata
    

Project Structure

.
├── src/
│ └── anndata_metadata/
│   ├── extract.py # Core metadata extraction logic
│   └── main.py # CLI entry point
├── test/ # Unit tests for extraction functions and CLI
├── README.md # Project documentation
└── pyproject.toml # Project metadata and dependencies

TODO

  • add mypy support
  • add a wheel and submit to pypy
  • CI/CD pipeline for updating pyp
  • write partial results and skip previously written values

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anndata_metadata-0.1.2.tar.gz (9.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anndata_metadata-0.1.2-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file anndata_metadata-0.1.2.tar.gz.

File metadata

  • Download URL: anndata_metadata-0.1.2.tar.gz
  • Upload date:
  • Size: 9.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.0

File hashes

Hashes for anndata_metadata-0.1.2.tar.gz
Algorithm Hash digest
SHA256 3ac76e38ff875d24f989d08127d0ed156ae7313536465253097a1a136be2b081
MD5 63df2e953031313b09f8b005feef42c8
BLAKE2b-256 848a26bd132f17869361cca839a1c1ebf51ca8cc2a833ca3f4144d4348db1a54

See more details on using hashes here.

File details

Details for the file anndata_metadata-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for anndata_metadata-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 647852f1da4c888740b7f7705f8fdd9bb4daa95f08beff855115c3d63d259d0f
MD5 dd6c548e04229a53f6b622a915cb4c41
BLAKE2b-256 f167be2482b56c9d257c93cf7ef4b19424727a6c0880c9d61cdbe4cf84f3cd32

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page