Skip to main content

PDF Table Extraction for Humans.

Project description

Camelot: PDF Table Extraction for Humans

tests Documentation Status codecov.io image image image

Camelot is a Python library that can help you extract tables from PDFs.


Extract tables from PDFs in just a few lines of code:

Try it yourself in our interactive quickstart notebook. image

Or check out a simple example using this pdf.

>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
>>> tables
<TableList n=1>
>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html, markdown, sqlite
>>> tables[0]
<Table shape=(7, 7)>
>>> tables[0].parsing_report
{
    'accuracy': 99.02,
    'whitespace': 12.24,
    'order': 1,
    'page': 1
}
>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_markdown, to_sqlite
>>> tables[0].df # get a pandas DataFrame!
Cycle Name KI (1/km) Distance (mi) Percent Fuel Savings
Improved Speed Decreased Accel Eliminate Stops Decreased Idle
2012_2 3.30 1.3 5.9% 9.5% 29.2% 17.4%
2145_1 0.68 11.2 2.4% 0.1% 9.5% 2.7%
4234_1 0.59 58.7 8.5% 1.3% 8.5% 3.3%
2032_2 0.17 57.8 21.7% 0.3% 2.7% 1.2%
4171_1 0.07 173.9 58.1% 1.6% 2.1% 0.5%

Camelot also comes packaged with a command-line interface!

Refer to the QuickStart Guide to quickly get started with Camelot, extract tables from PDFs and explore some basic options.

Tip: Visit the parser-comparison-notebook to get an overview of all the packed parsers and their features. image

Note: Camelot only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

You can check out some frequently asked questions here.

Why Camelot?

  • Configurability: Camelot gives you control over the table extraction process with tweakable settings.
  • Metrics: You can discard bad tables based on metrics like accuracy and whitespace, without having to manually look at each table.
  • Output: Each table is extracted into a pandas DataFrame, which seamlessly integrates into ETL and data analysis workflows. You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML, Markdown, and Sqlite.

See comparison with similar libraries and tools.

Installation

Using conda

The easiest way to install Camelot is with conda, which is a package manager and environment management system for the Anaconda distribution.

conda install -c conda-forge camelot-py

Using pip

After installing the dependencies (tk and ghostscript), you can also just use pip to install Camelot:

pip install "camelot-py[base]"

From the source code

After installing the dependencies, clone the repo using:

git clone https://github.com/camelot-dev/camelot.git

and install using pip:

cd camelot
pip install "."

Documentation

The documentation is available at http://camelot-py.readthedocs.io/.

Wrappers

Related projects

Contributing

The Contributor's Guide has detailed information about contributing issues, documentation, code, and tests.

Versioning

Camelot uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out the releases page.

License

This project is licensed under the MIT License, see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

camelot_py-1.0.9.tar.gz (67.7 kB view details)

Uploaded Source

Built Distribution

camelot_py-1.0.9-py3-none-any.whl (66.8 kB view details)

Uploaded Python 3

File details

Details for the file camelot_py-1.0.9.tar.gz.

File metadata

  • Download URL: camelot_py-1.0.9.tar.gz
  • Upload date:
  • Size: 67.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for camelot_py-1.0.9.tar.gz
Algorithm Hash digest
SHA256 d43d88766f7c3462803ff11464e7d3d15a9223e845972de2b61ef7c3f62d3200
MD5 946a50877753ebd8e3d057892b0991db
BLAKE2b-256 bf990598762cd03de80406ec2326665f9185f03b0f540ca552344e5679edd7e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for camelot_py-1.0.9.tar.gz:

Publisher: release.yml on camelot-dev/camelot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file camelot_py-1.0.9-py3-none-any.whl.

File metadata

  • Download URL: camelot_py-1.0.9-py3-none-any.whl
  • Upload date:
  • Size: 66.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for camelot_py-1.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f4c78dc5eef879040219ee9cb8b0caeab6b95d52128eec96f6e0ae94fdd910fb
MD5 77dbd32f24480e9f88ffa52bd740454b
BLAKE2b-256 9401fffe89adaf51f7c2584d087aaafd260ed2d44b977bb795444b16d500ec5e

See more details on using hashes here.

Provenance

The following attestation bundles were made for camelot_py-1.0.9-py3-none-any.whl:

Publisher: release.yml on camelot-dev/camelot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page