Colibrie is a blazing fast tool to extract tables from PDFs
Project description
Colibrie
Colibrie is a blazing fast tool to extract tables from PDFs
Why Colibrie?
- :rocket: Efficient: Colibrie is faster by multiple order of magnitude than any actual existing solution
- :sparkles: Fidel visual: Colibrie can provide 1:1 HTML representation of any tables it'll find
- :books: Reliable: Colibri will find every valid tables without exception if the PDF is compatible with the core principle of Colibrie
- :memo: Output: Each table can be export into to multiple formats, which include :
- Pandas Dataframe.
- HTML.
Benchmark :
Some number to compare Camelot (a popular library to extract tables from PDF) and Colibrie
Tables extracted | |||||||
---|---|---|---|---|---|---|---|
Times in second | camelot | colibrie | |||||
camelot | colibrie | valid | false positive | valid | false positive | pages count | pdf file |
0.53 | 0.00545 | 1 | 0 | 1 | 0 | 1 | small pdf |
5.95 | 0.02100 | 4 | 0 | 4 | 0 | 11 | medium pdf |
105.00 | 0.21900 | 62 | 1 | 62 | 0 | 167 | big pdf |
182.00 | 0.69000 | 175 | 1 | 177 | 0 | 269 | giant pdf |
Installation
using source
pip install poetry
git clone https://github.com/abitoun-42/colibrie.git
cd colibrie
poetry install
using pip
pip install colibrie
Usage
from colibrie.extract_tables import extract_table
tables = extract_table('pdf_path')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
colibrie-1.1.2.1.tar.gz
(23.8 kB
view details)
Built Distribution
File details
Details for the file colibrie-1.1.2.1.tar.gz
.
File metadata
- Download URL: colibrie-1.1.2.1.tar.gz
- Upload date:
- Size: 23.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.0 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 21ae094a5659afa30a3652e6cbe1e34ce0fcd0335bec0acb74d567f667164899 |
|
MD5 | 380259b01e3f59eb6d356d355f8947bc |
|
BLAKE2b-256 | a5a80beef83226ddb9ea352b3077c4e99106cd34583dd479c294259970304111 |
Provenance
File details
Details for the file colibrie-1.1.2.1-py3-none-any.whl
.
File metadata
- Download URL: colibrie-1.1.2.1-py3-none-any.whl
- Upload date:
- Size: 36.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.1 CPython/3.10.0 Darwin/21.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 391d3eb42b746930bb8a62c8562499621a5e04806457ad5b6cc0243d492280c6 |
|
MD5 | 65d0beff7c547b0058b0f19ba0f6c432 |
|
BLAKE2b-256 | 6933053f11bb99c1f20e76fcd0b6d265ea604eaee47870ab09bce716981ef78a |