Skip to main content

AI-assisted table configuration generation for Tablassert — entity resolution, YAML validation, and Biolink documentation lookup.

Project description

Tablassist CLI

PyPI Python License

Python CLI tool for AI-assisted Tablassert table configuration generation — entity resolution, YAML validation, and Biolink documentation lookup.

Tablassist ships with two document extraction modes:

  • extract-text for fast raw extraction via Textract
  • extract-text-semantic for richer Docling-backed semantic extraction with Markdown output and ocr=auto by default

Installation

pip install tablassist

The base install now includes Docling, so extract-text-semantic works without a separate helper script or optional extra.

An optional extra is available for CPU compatibility:

pip install "tablassist[rtcompat]"  # Polars build for CPUs without required instructions

Requirements

  • Python >= 3.13
  • Environment variables TABLASSIST_USERNAME and TABLASSIST_API_KEY for API-accessing commands

Usage

# Fetch table configuration documentation
tablassist docs-table-config

Entity Resolution

# Search for entity CURIEs by term
tablassist search-curies "breast cancer"

# Search gene CURIEs within an NCBI taxon
tablassist search-gene-curies "BRCA1" --ncbi-taxon 9606

# Resolve an NCBI Taxon ID from an organism name
tablassist resolve-taxon-id "Homo sapiens"

Biolink Reference

# List all supported categories, predicates, or qualifiers
tablassist list-categories
tablassist list-predicates
tablassist list-qualifiers

# Fetch documentation for a specific Biolink element
tablassist docs-category "Gene"
tablassist docs-predicate "interacts_with"
tablassist docs-qualifier "qualified_predicate"

YAML Validation

Full config validation requires template: as the top-level key, with optional sections:. Use validate-section-str only for individual section mappings, not for whole config files.

# Validate a full config file
tablassist validate-config-file config.yaml

# Validate a single section from a YAML string
tablassist validate-section-str '<yaml>'

# Validate a full config from a YAML string
tablassist validate-config-str '<yaml>'

# Get the Section JSON schema
tablassist section-schema

Data Preview

# List sheets in an Excel file
tablassist excel-sheets data.xlsx

# Preview rows from an Excel sheet
tablassist preview-excel data.xlsx "Sheet1" 10

# Preview rows from a CSV file
tablassist preview-csv data.csv 10

# Extract text from a document (PDF, DOCX, etc.)
tablassist extract-text document.pdf

# Extract semantic Markdown from a document with Docling
tablassist extract-text-semantic document.pdf

# Extract plain text and explicitly disable OCR
tablassist extract-text-semantic document.pdf text off

extract-text is optimized for fast, low-overhead text grabs.

extract-text-semantic runs IBM Docling directly from the CLI module. It is the better choice when reading order, headings, lists, or table-aware Markdown matter more than raw speed.

Arguments for extract-text-semantic:

  • file — local document path
  • output_formatmarkdown (default) or text
  • ocrauto (default), off, or on

Use ocr=auto for the default balance. Use ocr=on for scans and image-heavy PDFs, and ocr=off when you know the source is born-digital and want the lightest path.

PMC Archive Download

# Download and extract a PMC tar archive
tablassist download-pmc-tar 12345 --dest-dir ./output

Development

uv sync                              # install dependencies
uv run ruff check .                  # lint
uv run ruff check --fix .            # lint with auto-fix
uv run ruff format .                 # format
uv run pyright                       # type check
uv run --group dev python -m pytest  # run all tests

License

Apache License 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tablassist-0.4.0.tar.gz (164.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tablassist-0.4.0-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file tablassist-0.4.0.tar.gz.

File metadata

  • Download URL: tablassist-0.4.0.tar.gz
  • Upload date:
  • Size: 164.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassist-0.4.0.tar.gz
Algorithm Hash digest
SHA256 3ec880d65743ba412a81ed4f732b63a092dd4509c828ab4ac3168ae1f29f3265
MD5 b610506d829efd24aa426ddce4e3c988
BLAKE2b-256 502b5d597314c64811355842c55ad163f59e57f40d1d380197281da6a49572cc

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassist-0.4.0.tar.gz:

Publisher: pypi.yml on SkyeAv/Tablassist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tablassist-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: tablassist-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for tablassist-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 66da69b9b21ca5951c6afbf892819c8b4d6b2b60d2b017d1a8a250e08961b962
MD5 d55f355ebe8f3f6204b25337fc08e0f8
BLAKE2b-256 70338dc42a8c2ae16fa5eec24454cbe37344aa41603c028e15e27b58c0532c37

See more details on using hashes here.

Provenance

The following attestation bundles were made for tablassist-0.4.0-py3-none-any.whl:

Publisher: pypi.yml on SkyeAv/Tablassist

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page