Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.
Project description
Tablassert
Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution built in and optional quality control.
pip install tablassert
tablassert build config.yaml
Full Documentation — installation guides, tutorials, configuration reference, and API docs.
Installation
pip install tablassert
Base install includes web and Excel support. Optional extras are available for CPU compatibility and QC runtime selection:
pip install "tablassert[rt]" # Polars build for CPUs without required instructions
pip install "tablassert[qc]" # Enable QC with CPU ONNX Runtime
pip install "tablassert[qc-cuda]" # Enable QC with CUDA ONNX Runtime on GPU 0
QC is disabled by default at the graph level. Set qc: true in a graph config to enable the audit stage.
Docker
docker pull ghcr.io/skyeav/tablassert:latest
docker run --rm \
-v /path/to/config:/data \
-v /path/to/datassert:/datassert \
ghcr.io/skyeav/tablassert:latest \
build /data/graph-config.yaml
Quick Demo
from pathlib import Path
from tablassert.lib import resolve_many
# Resolve gene names to CURIEs against a datassert database
results = resolve_many(
col="gene",
entities=["TP53", "BRCA1", "EGFR"],
datassert=Path("/path/to/datassert"),
taxon="9606",
)
for row in results:
print(f"{row['original gene']} → {row['gene']} ({row['gene name']})")
# TP53 → HGNC:11998 (TP53)
# BRCA1 → HGNC:1100 (BRCA1)
# EGFR → HGNC:3236 (EGFR)
Point resolve_many() at a datassert database and resolve any iterable of entity strings to CURIEs — no LazyFrame setup, NLP preprocessing, or DuckDB connection management required. For full pipeline builds with YAML configuration, use tablassert build config.yaml.
Key Features
- Declarative Configuration — YAML-based, no code required
- Entity Resolution — Maps text to biological entities (genes, diseases, chemicals)
- Quality Control — Optional three-stage validation (exact → fuzzy → BERT embeddings)
- KGX Compliance — NCATS Translator-compatible NDJSON output
- Performance — Lazy evaluation pipelines with Polars and DuckDB-accelerated entity resolution
Contributing
See CONTRIBUTING.md for development setup, code style, and pull request guidelines.
License
Contributors
Skye Lane Goetz — Institute for Systems Biology, CalPoly SLO
Gwênlyn Glusman — Institute for Systems Biology
Jared C. Roach — Institute for Systems Biology
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tablassert-7.4.0.tar.gz.
File metadata
- Download URL: tablassert-7.4.0.tar.gz
- Upload date:
- Size: 236.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0a9e15424057586b369508c1fd679b2e420d354fb1b3a0c8ce92f6c3794a8d37
|
|
| MD5 |
d5d13527b45adf1cd325402e311a1552
|
|
| BLAKE2b-256 |
e7af7d14d1ab75c5dffc717a2b71ea9ee78076826daec8c29c3b01aff0dd992a
|
Provenance
The following attestation bundles were made for tablassert-7.4.0.tar.gz:
Publisher:
pipy.yml on SkyeAv/Tablassert
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tablassert-7.4.0.tar.gz -
Subject digest:
0a9e15424057586b369508c1fd679b2e420d354fb1b3a0c8ce92f6c3794a8d37 - Sigstore transparency entry: 1444974375
- Sigstore integration time:
-
Permalink:
SkyeAv/Tablassert@f5f0107d9ffd0e4d32a29cf94951a4c7e6be11fb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/SkyeAv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pipy.yml@f5f0107d9ffd0e4d32a29cf94951a4c7e6be11fb -
Trigger Event:
push
-
Statement type:
File details
Details for the file tablassert-7.4.0-py3-none-any.whl.
File metadata
- Download URL: tablassert-7.4.0-py3-none-any.whl
- Upload date:
- Size: 37.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26b4dba7d75b9c66871466c19e8ade32586925c590445915b190a48225c63a90
|
|
| MD5 |
8d6792944efb9f3a13ae065c0e000ecf
|
|
| BLAKE2b-256 |
e2cc1065eabc7d4e8b079ff1bc84ff54e0649c13b44a696d3cae1a18080f146d
|
Provenance
The following attestation bundles were made for tablassert-7.4.0-py3-none-any.whl:
Publisher:
pipy.yml on SkyeAv/Tablassert
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
tablassert-7.4.0-py3-none-any.whl -
Subject digest:
26b4dba7d75b9c66871466c19e8ade32586925c590445915b190a48225c63a90 - Sigstore transparency entry: 1444974575
- Sigstore integration time:
-
Permalink:
SkyeAv/Tablassert@f5f0107d9ffd0e4d32a29cf94951a4c7e6be11fb -
Branch / Tag:
refs/heads/main - Owner: https://github.com/SkyeAv
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pipy.yml@f5f0107d9ffd0e4d32a29cf94951a4c7e6be11fb -
Trigger Event:
push
-
Statement type: