A blazingly fast PDF table extraction library with python API powered by Rust
Project description
⚡ Tablers
A blazingly fast PDF table extraction library with python API powered by Rust
Features
- 🚀 Blazingly Fast - Core algorithms written in Rust for maximum performance
- 🐍 Pythonic API - Easy-to-use Python interface with full type hints
- 📄 Edge Detection - Accurate table detection using line and rectangle edge analysis
- 📝 Text Extraction - Extract text content from table cells with configurable settings
- 📤 Multiple Export Formats - Export tables to CSV, Markdown, and HTML
- 🔐 Encrypted PDFs - Support for password-protected PDF documents
- 💾 Memory Efficient - Lazy page loading for handling large PDF files
- 🖥️ Cross-Platform - Works on Windows, Linux, and macOS
Why Tablers?
This project draws significant inspiration from the table extraction modules of pdfplumber and PyMuPDF. Compared to pdfplumber and PyMuPDF, tablers has the following advantages:
- High Performance: Utilizes Rust for high-performance PDF processing
- Higher Accuracy: Tablers optimizes some table detection algorithms to address table extraction problems that other libraries have not fully solved, including:
- Mixed strategies where one is text and the other is lines (#8)
- Tables whose edges are actually narrow closepath polylines (#13)
- Extracting table content when the bottom border is absent (pdfplumber discussion #631)
- Table recognition when outer lines are missing (pdfplumber issue #1296)
- Excluding tables formed by invisible edges (pdfplumber issue #1357)
- More Configurable: Supports customizable table filter settings (
min_rows,min_columns,include_single_cell, e.g., see this issue) - Clean Python Dependencies: No external python dependencies required
Benchmark
Benchmarked on the ICDAR 2013 Table Competition dataset, evaluating both extraction speed and accuracy across tablers, PyMuPDF, pdfplumber, and camelot. All libraries use their default configuration for table extraction. PyMuPDF excludes tables that have only one row or only one column (see PyMuPDF#3987), and this behaviour is not configurable; among the compared libraries, only tablers allows configuring minimum row/column counts. For a fair comparison, the benchmark therefore includes both tablers (default) and tablers (min 2×2) — the latter with min_rows=2 and min_columns=2 so that single-row/single-column tables are filtered out in the same way as in PyMuPDF. For more on the libraries and settings, see the Libraries compared section in tablers-benchmark.
For more details, please refer to the tablers-benchmark repository.
Note
This solution is primarily designed for text-based PDFs and does not support scanned PDFs.
Installation
pip install tablers
Quick Start
Basic Table Extraction
from tablers import Document, find_tables
# Open a PDF document
doc = Document("example.pdf")
# Extract tables from each page
for page in doc.pages():
tables = find_tables(page, extract_text=True)
for table in tables:
print(f"Found table with {len(table.cells)} cells")
for cell in table.cells:
print(f" Cell: {cell.text} at {cell.bbox}")
doc.close()
Using Context Manager
from tablers import Document, find_tables
with Document("example.pdf") as doc:
page = doc.get_page(0) # Get first page
tables = find_tables(page, extract_text=True)
for table in tables:
print(f"Table bbox: {table.bbox}")
For more advanced usage, please refer to the documents.
Requirements
- Python >= 3.10
- Supported platforms: Windows (x64), Linux (x64) with glibc >= 2.28, macOS (ARM64)
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- pdfplumber - PDF parsing library
- PyMuPDF - PDF parsing library
- pdfium-render - Rust bindings for PDFium
- PyO3 - Rust bindings for Python
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tablers-0.7.3-cp310-abi3-win_amd64.whl.
File metadata
- Download URL: tablers-0.7.3-cp310-abi3-win_amd64.whl
- Upload date:
- Size: 3.7 MB
- Tags: CPython 3.10+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: pdm/2.26.6 CPython/3.12.3 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f1e8d83530724192336a4de58241b62d257a102557e6ffe1fee69d6df5e75f90
|
|
| MD5 |
00abcde916035d3d5b864c56d3b7026a
|
|
| BLAKE2b-256 |
fe2c6c3185314fc0ace45ed9addef6109e61059313c3646f6c27383bb8756bb2
|
File details
Details for the file tablers-0.7.3-cp310-abi3-manylinux_2_28_x86_64.whl.
File metadata
- Download URL: tablers-0.7.3-cp310-abi3-manylinux_2_28_x86_64.whl
- Upload date:
- Size: 4.2 MB
- Tags: CPython 3.10+, manylinux: glibc 2.28+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: pdm/2.26.6 CPython/3.12.3 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8a9f4cda3b44d3fde0ebd02068bbf39390c651f86d4cb35866c6a5e47719c226
|
|
| MD5 |
2a16b925261c09ad56a9536a5b51d392
|
|
| BLAKE2b-256 |
f6bc8752cacdfb93fda74222888179fe80797b85ab9003c92b2860058c89e8bb
|
File details
Details for the file tablers-0.7.3-cp310-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: tablers-0.7.3-cp310-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 3.8 MB
- Tags: CPython 3.10+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: pdm/2.26.6 CPython/3.12.3 Linux/6.14.0-1017-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
64a669f0fd919b4f51b8e3379686607cf617e4c60ef3ecc467ba66a2d800d9a2
|
|
| MD5 |
5ecb22a86a4d63bb421a2430fe5cd446
|
|
| BLAKE2b-256 |
562e2e5824228892cf556a6932a7c44be4eadb45f67280697451ef12d5a8b970
|