Skip to main content

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.

Project description

Tablassert

PyPI Python License Docs

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution built in and optional quality control.

pip install tablassert
tablassert build config.yaml

Full Documentation — installation guides, tutorials, configuration reference, and API docs.

Installation

pip install tablassert

Base install includes web and Excel support. Optional extras are available for CPU compatibility and QC runtime selection:

pip install "tablassert[rt]"       # Polars build for CPUs without required instructions
pip install "tablassert[qc]"       # Enable QC with CPU ONNX Runtime
pip install "tablassert[qc-cuda]"  # Enable QC with CUDA ONNX Runtime on GPU 0

QC is disabled by default at the graph level. Set qc: true in a graph config to enable the audit stage.

Docker
docker pull ghcr.io/skyeav/tablassert:latest

docker run --rm \
  -v /path/to/config:/data \
  -v /path/to/datassert:/datassert \
  ghcr.io/skyeav/tablassert:latest \
  build /data/graph-config.yaml

Quick Demo

from pathlib import Path
from tablassert.lib import resolve_many

# Resolve gene names to CURIEs against a datassert database
results = resolve_many(
    col="gene",
    entities=["TP53", "BRCA1", "EGFR"],
    datassert=Path("/path/to/datassert"),
    taxon="9606",
)

for row in results:
    print(f"{row['original gene']}{row['gene']} ({row['gene name']})")
# TP53 → HGNC:11998 (TP53)
# BRCA1 → HGNC:1100 (BRCA1)
# EGFR → HGNC:3236 (EGFR)

Point resolve_many() at a datassert database and resolve any iterable of entity strings to CURIEs — no LazyFrame setup, NLP preprocessing, or DuckDB connection management required. For full pipeline builds with YAML configuration, use tablassert build config.yaml.

Key Features

  • Declarative Configuration — YAML-based, no code required
  • Entity Resolution — Maps text to biological entities (genes, diseases, chemicals)
  • Quality Control — Optional three-stage validation (exact → fuzzy → BERT embeddings)
  • KGX Compliance — NCATS Translator-compatible NDJSON output
  • Performance — Lazy evaluation pipelines with Polars and DuckDB-accelerated entity resolution

Contributing

See CONTRIBUTING.md for development setup, code style, and pull request guidelines.

License

Apache License 2.0

Contributors

Skye Lane Goetz — Institute for Systems Biology, CalPoly SLO

Gwênlyn Glusman — Institute for Systems Biology

Jared C. Roach — Institute for Systems Biology

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablassert-7.4.6.tar.gz (238.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablassert-7.4.6-py3-none-any.whl (38.2 kB view details)

Uploaded Python 3

File details

Details for the file tablassert-7.4.6.tar.gz.

File metadata

  • Download URL: tablassert-7.4.6.tar.gz
  • Upload date:
  • Size: 238.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassert-7.4.6.tar.gz
Algorithm Hash digest
SHA256 87569406715d9d1e279497b775d5c0cb1a784446969c1437fbb654efd24b5593
MD5 b251af7fbfcf43de7e47b7cd97ba78b4
BLAKE2b-256 72b0f802108179de430473913d5cf1b36dffe449e6abf2b01803597d5d2869ec

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassert-7.4.6.tar.gz:

Publisher: pipy.yml on SkyeAv/Tablassert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tablassert-7.4.6-py3-none-any.whl.

File metadata

  • Download URL: tablassert-7.4.6-py3-none-any.whl
  • Upload date:
  • Size: 38.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassert-7.4.6-py3-none-any.whl
Algorithm Hash digest
SHA256 d04b7e50386f3366d388e431b0884607a066a1b78c8fa838c90b229e8a897dbf
MD5 2ba47db31a3661539095dbec15df27ba
BLAKE2b-256 6d0774d6fe93833c8e919ab00047b17dc5d9ee7d60bfeb4a95f9635c78ff7b05

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassert-7.4.6-py3-none-any.whl:

Publisher: pipy.yml on SkyeAv/Tablassert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page