Skip to main content

A blazingly fast PDF table extraction library with python API powered by Rust

Project description

Rust Python

⚡ Tablers

A blazingly fast PDF table extraction library with python API powered by Rust

License: MIT PyPI version Python versions pdm-managed


Features

  • 🚀 Blazingly Fast - Core algorithms written in Rust for maximum performance
  • 🐍 Pythonic API - Easy-to-use Python interface with full type hints
  • 📄 Edge Detection - Accurate table detection using line and rectangle edge analysis
  • 📝 Text Extraction - Extract text content from table cells with configurable settings
  • 📤 Multiple Export Formats - Export tables to CSV, Markdown, and HTML
  • 🔐 Encrypted PDFs - Support for password-protected PDF documents
  • 💾 Memory Efficient - Lazy page loading for handling large PDF files
  • 🖥️ Cross-Platform - Works on Windows, Linux, and macOS

Why Tablers?

This project draws significant inspiration from the table extraction modules of pdfplumber and PyMuPDF. Compared to pdfplumber and PyMuPDF, tablers has the following advantages:

  • High Performance: Utilizes Rust for high-performance PDF processing
  • Higher Accuracy: Tablers optimizes some table detection algorithms to address table extraction problems that other libraries have not fully solved, including:
  • More Configurable: Supports customizable table filter settings (min_rows, min_columns, include_single_cell, e.g., see this issue)
  • Clean Python Dependencies: No external python dependencies required

Benchmark

Benchmarked on the ICDAR 2013 Table Competition dataset, evaluating both extraction speed and accuracy across tablers, PyMuPDF, pdfplumber, and camelot. All libraries use their default configuration for table extraction. PyMuPDF excludes tables that have only one row or only one column (see PyMuPDF#3987), and this behaviour is not configurable; among the compared libraries, only tablers allows configuring minimum row/column counts. For a fair comparison, the benchmark therefore includes both tablers (default) and tablers (min 2×2) — the latter with min_rows=2 and min_columns=2 so that single-row/single-column tables are filtered out in the same way as in PyMuPDF. For more on the libraries and settings, see the Libraries compared section in tablers-benchmark.

Table Extraction Benchmark

For more details, please refer to the tablers-benchmark repository.

Note

This solution is primarily designed for text-based PDFs and does not support scanned PDFs.

Installation

pip install tablers

Quick Start

Basic Table Extraction

from tablers import Document, find_tables

# Open a PDF document
doc = Document("example.pdf")

# Extract tables from each page
for page in doc.pages():
    tables = find_tables(page, extract_text=True)
    for table in tables:
        print(f"Found table with {len(table.cells)} cells")
        for cell in table.cells:
            print(f"  Cell: {cell.text} at {cell.bbox}")

doc.close()

Using Context Manager

from tablers import Document, find_tables

with Document("example.pdf") as doc:
    page = doc.get_page(0)  # Get first page
    tables = find_tables(page, extract_text=True)

    for table in tables:
        print(f"Table bbox: {table.bbox}")

For more advanced usage, please refer to the documents.

Requirements

  • Python >= 3.10
  • Supported platforms: Windows (x64), Linux (x64) with glibc >= 2.28, macOS (ARM64)

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

tablers-0.7.3-cp310-abi3-win_amd64.whl (3.7 MB view details)

Uploaded CPython 3.10+Windows x86-64

tablers-0.7.3-cp310-abi3-manylinux_2_28_x86_64.whl (4.2 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.28+ x86-64

tablers-0.7.3-cp310-abi3-macosx_11_0_arm64.whl (3.8 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file tablers-0.7.3-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: tablers-0.7.3-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.26.6 CPython/3.12.3 Linux/6.14.0-1017-azure

File hashes

Hashes for tablers-0.7.3-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 f1e8d83530724192336a4de58241b62d257a102557e6ffe1fee69d6df5e75f90
MD5 00abcde916035d3d5b864c56d3b7026a
BLAKE2b-256 fe2c6c3185314fc0ace45ed9addef6109e61059313c3646f6c27383bb8756bb2

See more details on using hashes here.

File details

Details for the file tablers-0.7.3-cp310-abi3-manylinux_2_28_x86_64.whl.

File metadata

  • Download URL: tablers-0.7.3-cp310-abi3-manylinux_2_28_x86_64.whl
  • Upload date:
  • Size: 4.2 MB
  • Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.26.6 CPython/3.12.3 Linux/6.14.0-1017-azure

File hashes

Hashes for tablers-0.7.3-cp310-abi3-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 8a9f4cda3b44d3fde0ebd02068bbf39390c651f86d4cb35866c6a5e47719c226
MD5 2a16b925261c09ad56a9536a5b51d392
BLAKE2b-256 f6bc8752cacdfb93fda74222888179fe80797b85ab9003c92b2860058c89e8bb

See more details on using hashes here.

File details

Details for the file tablers-0.7.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

  • Download URL: tablers-0.7.3-cp310-abi3-macosx_11_0_arm64.whl
  • Upload date:
  • Size: 3.8 MB
  • Tags: CPython 3.10+, macOS 11.0+ ARM64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.26.6 CPython/3.12.3 Linux/6.14.0-1017-azure

File hashes

Hashes for tablers-0.7.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 64a669f0fd919b4f51b8e3379686607cf617e4c60ef3ecc467ba66a2d800d9a2
MD5 5ecb22a86a4d63bb421a2430fe5cd446
BLAKE2b-256 562e2e5824228892cf556a6932a7c44be4eadb45f67280697451ef12d5a8b970

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page