Skip to main content

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.

Project description

Tablassert

PyPI Python License Docs

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution built in and optional quality control.

pip install tablassert
tablassert build config.yaml

Full Documentation — installation guides, tutorials, configuration reference, and API docs.

Installation

pip install tablassert

Base install includes web and Excel support. Optional extras are available for CPU compatibility and QC runtime selection:

pip install "tablassert[rt]"       # Polars build for CPUs without required instructions
pip install "tablassert[qc]"       # Enable QC with CPU ONNX Runtime
pip install "tablassert[qc-cuda]"  # Enable QC with CUDA ONNX Runtime on GPU 0

QC is disabled by default at the graph level. Set qc: true in a graph config to enable the audit stage.

Docker
docker pull ghcr.io/skyeav/tablassert:latest

docker run --rm \
  -v /path/to/config:/data \
  -v /path/to/datassert:/datassert \
  ghcr.io/skyeav/tablassert:latest \
  build /data/graph-config.yaml

Quick Demo

from pathlib import Path
from tablassert.lib import resolve_many

# Resolve gene names to CURIEs against a datassert database
results = resolve_many(
    col="gene",
    entities=["TP53", "BRCA1", "EGFR"],
    datassert=Path("/path/to/datassert"),
    taxon="9606",
)

for row in results:
    print(f"{row['original gene']}{row['gene']} ({row['gene name']})")
# TP53 → HGNC:11998 (TP53)
# BRCA1 → HGNC:1100 (BRCA1)
# EGFR → HGNC:3236 (EGFR)

Point resolve_many() at a datassert database and resolve any iterable of entity strings to CURIEs — no LazyFrame setup, NLP preprocessing, or DuckDB connection management required. For full pipeline builds with YAML configuration, use tablassert build config.yaml.

Key Features

  • Declarative Configuration — YAML-based, no code required
  • Entity Resolution — Maps text to biological entities (genes, diseases, chemicals)
  • Quality Control — Optional three-stage validation (exact → fuzzy → BERT embeddings)
  • KGX Compliance — NCATS Translator-compatible NDJSON output
  • Performance — Lazy evaluation pipelines with Polars and DuckDB-accelerated entity resolution

Contributing

See CONTRIBUTING.md for development setup, code style, and pull request guidelines.

License

Apache License 2.0

Contributors

Skye Lane Goetz — Institute for Systems Biology, CalPoly SLO

Gwênlyn Glusman — Institute for Systems Biology

Jared C. Roach — Institute for Systems Biology

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablassert-7.4.8.tar.gz (238.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablassert-7.4.8-py3-none-any.whl (38.5 kB view details)

Uploaded Python 3

File details

Details for the file tablassert-7.4.8.tar.gz.

File metadata

  • Download URL: tablassert-7.4.8.tar.gz
  • Upload date:
  • Size: 238.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassert-7.4.8.tar.gz
Algorithm Hash digest
SHA256 aadc39a468076b336ebe735a812ab3263161793e1de5fb6965de088eb8739615
MD5 25217399bfc7eef53e6ac2d64c92d6e7
BLAKE2b-256 79c76827965de6676752a0edc4bd31045f2a4de0f4ea27e72f63ffe726bbb25d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassert-7.4.8.tar.gz:

Publisher: pipy.yml on SkyeAv/Tablassert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tablassert-7.4.8-py3-none-any.whl.

File metadata

  • Download URL: tablassert-7.4.8-py3-none-any.whl
  • Upload date:
  • Size: 38.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassert-7.4.8-py3-none-any.whl
Algorithm Hash digest
SHA256 7afe237b7fa4a0934f791d8279b1b8ae3233b0038240fd6c69b9462bb717d81f
MD5 49b5bda9efacab0084b1de7bf234d14c
BLAKE2b-256 afaa649468f3b7b1fead1b40b0f279988bfcad88b0db535ea7373d7250a44945

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassert-7.4.8-py3-none-any.whl:

Publisher: pipy.yml on SkyeAv/Tablassert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page