Skip to main content

Colibrie is a blazing fast tool to extract tables from PDFs

Project description

Colibrie

image image

Colibrie is a blazing fast tool to extract tables from PDFs

Why Colibrie?

  • :rocket: Efficient: Colibrie is faster by multiple order of magnitude than any actual existing solution
  • :sparkles: Fidel visual: Colibrie can provide 1:1 HTML representation of any tables it'll find
  • :books: Reliable: Colibri will find every valid tables without exception if the PDF is compatible with the core principle of Colibrie
  • :memo: Output: Each table can be export into to multiple formats, which include :
    • Pandas Dataframe.
    • HTML.

Benchmark :

Some number to compare Camelot (a popular library to extract tables from PDF) and Colibrie

Tables extracted
Times in second camelot colibrie
camelot colibrie valid false positive valid false positive pages count pdf file
0.53 0.00545 1 0 1 0 1 small pdf
5.95 0.02100 4 0 4 0 11 medium pdf
105.00 0.21900 62 1 62 0 167 big pdf
182.00 0.69000 175 1 177 0 269 giant pdf

Installation

using source

pip install poetry

git clone https://github.com/abitoun-42/colibrie.git

cd colibrie

poetry install

using pip

pip install colibrie

Usage

from colibrie.extract_tables import extract_table

tables = extract_table('pdf_path')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

colibrie-1.1.2.1.tar.gz (23.8 kB view details)

Uploaded Source

Built Distribution

colibrie-1.1.2.1-py3-none-any.whl (36.5 kB view details)

Uploaded Python 3

File details

Details for the file colibrie-1.1.2.1.tar.gz.

File metadata

  • Download URL: colibrie-1.1.2.1.tar.gz
  • Upload date:
  • Size: 23.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.0 Darwin/21.6.0

File hashes

Hashes for colibrie-1.1.2.1.tar.gz
Algorithm Hash digest
SHA256 21ae094a5659afa30a3652e6cbe1e34ce0fcd0335bec0acb74d567f667164899
MD5 380259b01e3f59eb6d356d355f8947bc
BLAKE2b-256 a5a80beef83226ddb9ea352b3077c4e99106cd34583dd479c294259970304111

See more details on using hashes here.

Provenance

File details

Details for the file colibrie-1.1.2.1-py3-none-any.whl.

File metadata

  • Download URL: colibrie-1.1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 36.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.1 CPython/3.10.0 Darwin/21.6.0

File hashes

Hashes for colibrie-1.1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 391d3eb42b746930bb8a62c8562499621a5e04806457ad5b6cc0243d492280c6
MD5 65d0beff7c547b0058b0f19ba0f6c432
BLAKE2b-256 6933053f11bb99c1f20e76fcd0b6d265ea604eaee47870ab09bce716981ef78a

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page