Skip to main content

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.

Project description

Tablassert

PyPI Python License Docs

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution built in and optional quality control.

pip install tablassert
tablassert build config.yaml

Full Documentation — installation guides, tutorials, configuration reference, and API docs.

Installation

pip install tablassert

Base install includes web and Excel support. Optional extras are available for CPU compatibility and QC runtime selection:

pip install "tablassert[rt]"       # Polars build for CPUs without required instructions
pip install "tablassert[qc]"       # Enable QC with CPU ONNX Runtime
pip install "tablassert[qc-cuda]"  # Enable QC with CUDA ONNX Runtime on GPU 0

QC is disabled by default at the graph level. Set qc: true in a graph config to enable the audit stage.

Docker
docker pull ghcr.io/skyeav/tablassert:latest

docker run --rm \
  -v /path/to/config:/data \
  -v /path/to/datassert:/datassert \
  ghcr.io/skyeav/tablassert:latest \
  build /data/graph-config.yaml

Quick Demo

from pathlib import Path
from tablassert.lib import resolve_many

# Resolve gene names to CURIEs against a datassert database
results = resolve_many(
    col="gene",
    entities=["TP53", "BRCA1", "EGFR"],
    datassert=Path("/path/to/datassert"),
    taxon="9606",
)

for row in results:
    print(f"{row['original gene']}{row['gene']} ({row['gene name']})")
# TP53 → HGNC:11998 (TP53)
# BRCA1 → HGNC:1100 (BRCA1)
# EGFR → HGNC:3236 (EGFR)

Point resolve_many() at a datassert database and resolve any iterable of entity strings to CURIEs — no LazyFrame setup, NLP preprocessing, or DuckDB connection management required. For full pipeline builds with YAML configuration, use tablassert build config.yaml.

Key Features

  • Declarative Configuration — YAML-based, no code required
  • Entity Resolution — Maps text to biological entities (genes, diseases, chemicals)
  • Quality Control — Optional three-stage validation (exact → fuzzy → BERT embeddings)
  • KGX Compliance — NCATS Translator-compatible NDJSON output
  • Performance — Lazy evaluation pipelines with Polars and DuckDB-accelerated entity resolution

Contributing

See CONTRIBUTING.md for development setup, code style, and pull request guidelines.

License

Apache License 2.0

Contributors

Skye Lane Goetz — Institute for Systems Biology, CalPoly SLO

Gwênlyn Glusman — Institute for Systems Biology

Jared C. Roach — Institute for Systems Biology

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablassert-7.4.0.tar.gz (236.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablassert-7.4.0-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file tablassert-7.4.0.tar.gz.

File metadata

  • Download URL: tablassert-7.4.0.tar.gz
  • Upload date:
  • Size: 236.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassert-7.4.0.tar.gz
Algorithm Hash digest
SHA256 0a9e15424057586b369508c1fd679b2e420d354fb1b3a0c8ce92f6c3794a8d37
MD5 d5d13527b45adf1cd325402e311a1552
BLAKE2b-256 e7af7d14d1ab75c5dffc717a2b71ea9ee78076826daec8c29c3b01aff0dd992a

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassert-7.4.0.tar.gz:

Publisher: pipy.yml on SkyeAv/Tablassert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tablassert-7.4.0-py3-none-any.whl.

File metadata

  • Download URL: tablassert-7.4.0-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassert-7.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 26b4dba7d75b9c66871466c19e8ade32586925c590445915b190a48225c63a90
MD5 8d6792944efb9f3a13ae065c0e000ecf
BLAKE2b-256 e2cc1065eabc7d4e8b079ff1bc84ff54e0649c13b44a696d3cae1a18080f146d

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassert-7.4.0-py3-none-any.whl:

Publisher: pipy.yml on SkyeAv/Tablassert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page