Skip to main content

Table image parsing with cell detection models

Project description

cells2table

Parsing tables in document images with cell detection models

Implemented pipelines

PaddlePaddle models

  • Classification model (wired / wireless)
  • Cell detection model with different weights for each class

Using ONNX weights (downloaded automatically on first use with huggingface_hub)

Instalation

With uv, add to your project with:

uv add git+https://github.com/jspast/cells2table

ONNX models need a ONNX Runtime installed to run. You can install one on your own or use one of the optionals already configured.

Optional Description
cuda For NVIDIA GPUs
openvino For Intel GPUs and CPUs
cpu Default CPU runtime
docling For docling usage

Usage

cells2table only extract structural information from the tables. Another library is needed to extract content from the cells.

Docling

A docling plugin is provided to allow integrating cells2table in a complete pipeline.

Usage example:

from cells2table.docling import CustomDoclingTableStructureOptions

pipeline_options = PdfPipelineOptions(
    allow_external_plugins=True,
    table_structure_options=CustomDoclingTableStructureOptions(),
)

converter = DocumentConverter(
    format_options={
        InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options),
        InputFormat.IMAGE: PdfFormatOption(pipeline_options=pipeline_options),
    }
)

result = converter.convert("path/to/document.pdf")
print(result.document.export_to_markdown())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cells2table-0.2.0.tar.gz (9.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cells2table-0.2.0-py3-none-any.whl (16.5 kB view details)

Uploaded Python 3

File details

Details for the file cells2table-0.2.0.tar.gz.

File metadata

  • Download URL: cells2table-0.2.0.tar.gz
  • Upload date:
  • Size: 9.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cells2table-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a342841c0d4dfda284516cd21f877bac3e8015bb08456527a50d98bb95c6d26a
MD5 2ae0e9d36b34e6c98d4d9ac443e6b86b
BLAKE2b-256 112f437d9ba9868d09b22e7e7f1b38857fd86e2208c131fa20ea82fc6fca089a

See more details on using hashes here.

File details

Details for the file cells2table-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: cells2table-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"43","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for cells2table-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eab0c383381f6854422ee12ed361ff88bacee7afb1a33dfeea1a7ac2bffaa0f0
MD5 cec55cd99a96465f89b1788ed81a2bdb
BLAKE2b-256 6c9b457ee3c3e0ef7a9c4ccfac2b3b3951d9d09b5b40369210e0b9f08175bac4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page