Skip to main content

Supporting simple queries over ontology annotations in dataframes, using UberGraph queries.

Project description

Pandasaurus

Pandasaurus supports simple queries over ontology annotations in dataframes, powered by Ubergraph SPARQL queries. It keeps dependencies light while still offering CURIE validation, enrichment utilities, and graph exports for downstream tooling.

Features

  • Validate and update seed CURIEs, catching obsoleted terms with replacement suggestions.
  • Enrich seed lists via simple, minimal, full, contextual, and ancestor-based strategies.
  • Build tabular outputs (pandas.DataFrame) and transitive-reduced graphs (rdflib.Graph) for visualization.
  • Batched SPARQL queries and deterministic tests with built-in mocking examples.

Installation

pip install pandasaurus

or with Poetry:

poetry add pandasaurus

Requires Python 3.9–3.11.

Quick Example

from pandasaurus.curie_validator import CurieValidator
from pandasaurus.query import Query

seeds = ["CL:0000084", "CL:0000787", "CL:0000636"]

terms = CurieValidator.construct_term_list(seeds)
CurieValidator.get_validation_report(terms)  # raises if invalid or obsoleted

query = Query(seeds, force_fail=True)
df = query.simple_enrichment()
print(df.head())

See the Quick Start guide for a step-by-step workflow.

Documentation

Full documentation (quick start, recipes, developer guide, and API reference) lives under docs/ and is published from the gh-pages branch:

To build docs locally:

poetry install -E docs
poetry run sphinx-build -b html docs docs/_build/html

Contributing

Pull requests are welcome! See docs/guides/contributing.rst for details on environment setup, testing, linting, and the release workflow. Pandasaurus aims to remain a small, focused library; please open an issue before introducing large new features.

Background

The first planned use case is to provide enrichment/query tooling for AnnData Cell x Gene matrices following the CZ single cell curation standard.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandasaurus-1.0.0.tar.gz (17.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandasaurus-1.0.0-py3-none-any.whl (20.9 kB view details)

Uploaded Python 3

File details

Details for the file pandasaurus-1.0.0.tar.gz.

File metadata

  • Download URL: pandasaurus-1.0.0.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pandasaurus-1.0.0.tar.gz
Algorithm Hash digest
SHA256 a3276b8fc521246a6c9b7ba4ad2aa5bf3d4106da57dbe06be471bf51a618d201
MD5 3719c39e1b9d72d80101ad5e17d3d6b1
BLAKE2b-256 83ce370c1aadea3775cc31b863b62a6d9efda29e2583cb98716bed1444b5428c

See more details on using hashes here.

File details

Details for the file pandasaurus-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pandasaurus-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 20.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pandasaurus-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 38bcb84bb529e85672c0d074ee6a8b0d4a5fee557524a4f81eefd400367a32fd
MD5 3d6c7f6d698dd39a71ef4af02db64487
BLAKE2b-256 d6c85f92e3254a6bb7039fff58755837efb94d15d919e6dab9ef4ade713e35f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page