Skip to main content

PDF Table Extraction for Humans.

Project description

Camelot: PDF Table Extraction for Humans

tests Documentation Status codecov.io image image image

Camelot is a Python library that can help you extract tables from PDFs.


Extract tables from PDFs in just a few lines of code:

Try it yourself in our interactive quickstart notebook. image

Or check out a simple example using this pdf.

>>> import camelot
>>> tables = camelot.read_pdf('foo.pdf')
>>> tables
<TableList n=1>
>>> tables.export('foo.csv', f='csv', compress=True) # json, excel, html, markdown, sqlite
>>> tables[0]
<Table shape=(7, 7)>
>>> tables[0].parsing_report
{
    'accuracy': 99.02,
    'whitespace': 12.24,
    'order': 1,
    'page': 1
}
>>> tables[0].to_csv('foo.csv') # to_json, to_excel, to_html, to_markdown, to_sqlite
>>> tables[0].df # get a pandas DataFrame!
Cycle Name KI (1/km) Distance (mi) Percent Fuel Savings
Improved Speed Decreased Accel Eliminate Stops Decreased Idle
2012_2 3.30 1.3 5.9% 9.5% 29.2% 17.4%
2145_1 0.68 11.2 2.4% 0.1% 9.5% 2.7%
4234_1 0.59 58.7 8.5% 1.3% 8.5% 3.3%
2032_2 0.17 57.8 21.7% 0.3% 2.7% 1.2%
4171_1 0.07 173.9 58.1% 1.6% 2.1% 0.5%

Camelot also comes packaged with a command-line interface!

Refer to the QuickStart Guide to quickly get started with Camelot, extract tables from PDFs and explore some basic options.

Tip: Visit the parser-comparison-notebook to get an overview of all the packed parsers and their features. image

Note: Camelot only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

You can check out some frequently asked questions here.

Why Camelot?

  • Configurability: Camelot gives you control over the table extraction process with tweakable settings.
  • Metrics: You can discard bad tables based on metrics like accuracy and whitespace, without having to manually look at each table.
  • Output: Each table is extracted into a pandas DataFrame, which seamlessly integrates into ETL and data analysis workflows. You can also export tables to multiple formats, which include CSV, JSON, Excel, HTML, Markdown, and Sqlite.

See comparison with similar libraries and tools.

Installation

Using conda

The easiest way to install Camelot is with conda, which is a package manager and environment management system for the Anaconda distribution.

conda install -c conda-forge camelot-py

Using pip

After installing the dependencies (tk and ghostscript), you can also just use pip to install Camelot:

pip install "camelot-py[base]"

From the source code

After installing the dependencies, clone the repo using:

git clone https://github.com/camelot-dev/camelot.git

and install using pip:

cd camelot
pip install "."

Documentation

The documentation is available at http://camelot-py.readthedocs.io/.

Wrappers

Related projects

Contributing

The Contributor's Guide has detailed information about contributing issues, documentation, code, and tests.

Versioning

Camelot uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out the releases page.

License

This project is licensed under the MIT License, see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

camelot_py-1.0.0.tar.gz (67.5 kB view details)

Uploaded Source

Built Distribution

camelot_py-1.0.0-py3-none-any.whl (66.6 kB view details)

Uploaded Python 3

File details

Details for the file camelot_py-1.0.0.tar.gz.

File metadata

  • Download URL: camelot_py-1.0.0.tar.gz
  • Upload date:
  • Size: 67.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for camelot_py-1.0.0.tar.gz
Algorithm Hash digest
SHA256 62514bd9effaef39a34c850f4b09705a817be160483b028cc8cde14954721466
MD5 6131dc1552084012c2d864d525f6e68f
BLAKE2b-256 d058b5432c271fcf25810091d4347a3b3201c69357536029daa0b0641a4fd5f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for camelot_py-1.0.0.tar.gz:

Publisher: release.yml on camelot-dev/camelot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file camelot_py-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: camelot_py-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 66.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for camelot_py-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 28d68373998ac778681988622616dac447bbe68267fd11d72e81b8b4716ae64a
MD5 c4438a9af7032b4e7343c395d207f0d8
BLAKE2b-256 07b71922e13626b58a4d7aacc915f9fbc724d12e32586433a5d0e899386138de

See more details on using hashes here.

Provenance

The following attestation bundles were made for camelot_py-1.0.0-py3-none-any.whl:

Publisher: release.yml on camelot-dev/camelot

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page