Skip to main content

A web interface for Camelot (PDF Table Extraction for Humans).

Project description

Excalibur: A web interface to extract tabular data from PDFs

(PDF Table Extraction for Humans)

Documentation Status image image image Gitter chat

Excalibur is a web interface to extract tabular data from PDFs, written in Python 3! It is powered by Camelot.

Note: Excalibur only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

Using Excalibur

After installation with pip, you can initialize the metadata database using:

$ excalibur initdb

And then start the webserver using:

$ excalibur webserver

That's it! Now you can go to http://localhost:5000 and extract data tables from your PDFs using the web interface! Check out the usage section of the documentation for step-by-step instructions.

Note: You can also download executables for Windows and Linux from the releases page!

usage.gif

Why Excalibur?

  • Excalibur gives you complete control over your data. All file storage and processing happens on your own local or remote machine.
  • Excalibur can be configured with MySQL and Celery for parallel and distributed workloads. By default, sqlite and multiprocessing are used for sequential workloads.
  • You can save table extraction rules as presets and apply them on different PDFs to extract tables with similar structures. (in v0.3.0)
  • You can extract tables from multiple PDFs in one go using an extraction rule by starting jobs. (in v0.4.0)

Excalibur uses Camelot under the hood. You can check out its comparison with other PDF table extraction libraries and tools.

Support us on Patreon

If Excalibur solves your PDF table extraction needs, please consider supporting its development by becoming a patron!

Installation

Using pip

After installing ghostscript, which is one of the requirements for Camelot (See install instructions), you can simply use pip to install Excalibur:

$ pip install excalibur-py

From the source code

After installing ghostscript, clone the repo using:

$ git clone https://www.github.com/camelot-dev/excalibur

and install Excalibur using pip:

$ cd excalibur
$ pip install .

Documentation

Great documentation is available at http://excalibur-py.readthedocs.io/.

Development

The Contributor's Guide has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.

Source code

You can check the latest sources with:

$ git clone https://www.github.com/camelot-dev/excalibur

Setting up a development environment

You can install the development dependencies easily, using pip:

$ pip install excalibur-py[dev]

Testing (soon)

After installation, you can run tests using:

$ python setup.py test

Versioning

Excalibur uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out HISTORY.md.

License

This project is licensed under the MIT License, see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

excalibur-py-0.2.1.tar.gz (168.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

excalibur_py-0.2.1-py2.py3-none-any.whl (187.0 kB view details)

Uploaded Python 2Python 3

File details

Details for the file excalibur-py-0.2.1.tar.gz.

File metadata

  • Download URL: excalibur-py-0.2.1.tar.gz
  • Upload date:
  • Size: 168.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for excalibur-py-0.2.1.tar.gz
Algorithm Hash digest
SHA256 ea6cf84e3f4366a2beb8d53956720f33bae84484fbaa49d7bcbcc28cdd4db44f
MD5 d6534dc04060326602e75e6b3eb9b538
BLAKE2b-256 6c0e9d7391502e35182d70b87b6e84b6c7a7a48156a5551450805455bd618632

See more details on using hashes here.

File details

Details for the file excalibur_py-0.2.1-py2.py3-none-any.whl.

File metadata

  • Download URL: excalibur_py-0.2.1-py2.py3-none-any.whl
  • Upload date:
  • Size: 187.0 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.5.0 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.6

File hashes

Hashes for excalibur_py-0.2.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 3f766d527564a2f95d5c1eb26ffced9571a323eaee580cbf85adb66e7b7c3014
MD5 0c09fd2edaa59dfe69ee44688d3f2de3
BLAKE2b-256 9287f93da9eef0dcafbdb939a16ee860b16ab519f1103a2c8ddfc8ff98917254

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page