Skip to main content

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.

Project description

Tablassert

PyPI Python License Docs

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution built in and optional quality control.

pip install tablassert
tablassert build config.yaml

Full Documentation — installation guides, tutorials, configuration reference, and API docs.

Installation

pip install tablassert

Base install includes web and Excel support. Optional extras are available for CPU compatibility and QC runtime selection:

pip install "tablassert[rt]"       # Polars build for CPUs without required instructions
pip install "tablassert[qc]"       # Enable QC with CPU ONNX Runtime
pip install "tablassert[qc-cuda]"  # Enable QC with CUDA ONNX Runtime on GPU 0

QC is disabled by default at the graph level. Set qc: true in a graph config to enable the audit stage.

Docker
docker pull ghcr.io/skyeav/tablassert:latest

docker run --rm \
  -v /path/to/config:/data \
  -v /path/to/datassert:/datassert \
  ghcr.io/skyeav/tablassert:latest \
  build /data/graph-config.yaml

Quick Demo

from pathlib import Path
from tablassert.lib import resolve_many

# Resolve gene names to CURIEs against a datassert database
results = resolve_many(
    col="gene",
    entities=["TP53", "BRCA1", "EGFR"],
    datassert=Path("/path/to/datassert"),
    taxon="9606",
)

for row in results:
    print(f"{row['original gene']}{row['gene']} ({row['gene name']})")
# TP53 → HGNC:11998 (TP53)
# BRCA1 → HGNC:1100 (BRCA1)
# EGFR → HGNC:3236 (EGFR)

Point resolve_many() at a datassert database and resolve any iterable of entity strings to CURIEs — no LazyFrame setup, NLP preprocessing, or DuckDB connection management required. For full pipeline builds with YAML configuration, use tablassert build config.yaml.

Key Features

  • Declarative Configuration — YAML-based, no code required
  • Entity Resolution — Maps text to biological entities (genes, diseases, chemicals)
  • Quality Control — Optional three-stage validation (exact → fuzzy → BERT embeddings)
  • KGX Compliance — NCATS Translator-compatible NDJSON output
  • Performance — Lazy evaluation pipelines with Polars and DuckDB-accelerated entity resolution

Contributing

See CONTRIBUTING.md for development setup, code style, and pull request guidelines.

License

Apache License 2.0

Contributors

Skye Lane Goetz — Institute for Systems Biology, CalPoly SLO

Gwênlyn Glusman — Institute for Systems Biology

Jared C. Roach — Institute for Systems Biology

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablassert-7.4.2.tar.gz (237.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablassert-7.4.2-py3-none-any.whl (38.0 kB view details)

Uploaded Python 3

File details

Details for the file tablassert-7.4.2.tar.gz.

File metadata

  • Download URL: tablassert-7.4.2.tar.gz
  • Upload date:
  • Size: 237.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassert-7.4.2.tar.gz
Algorithm Hash digest
SHA256 75c5d3a006ed13bf99dd0923c5dbab1cc9a6e91d2c71f509d81bf476f80e6d42
MD5 1b1dbe8cf0e22fe2cc5fa9b856e21d4f
BLAKE2b-256 87e9514807b91df62039d66725937968ff2c7f96a23bc32c74556d6c37d29889

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassert-7.4.2.tar.gz:

Publisher: pipy.yml on SkyeAv/Tablassert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tablassert-7.4.2-py3-none-any.whl.

File metadata

  • Download URL: tablassert-7.4.2-py3-none-any.whl
  • Upload date:
  • Size: 38.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassert-7.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e4459f93ccdefb79effeb0cc3b1e19fddc5c47011837c69f8e13321d321f16c2
MD5 a45b6214f8c7c9a20f4fe43b43dc9a33
BLAKE2b-256 8c38dff4629db5a1d12e736a5957db68c6d060bab7adf1c936f7c33efb584710

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassert-7.4.2-py3-none-any.whl:

Publisher: pipy.yml on SkyeAv/Tablassert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page