Skip to main content

A web interface for Camelot (PDF Table Extraction for Humans).

Project description

Excalibur: A web interface for Camelot

(PDF Table Extraction for Humans)

Documentation Status image image image

Excalibur is a web interface to extract data tables from PDFs! It is powered by Camelot and works with Python 3.

Note: Excalibur only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)

Using Excalibur

After installation, you need to initialize the Excalibur metadata database using:

$ excalibur initdb

And then start the webserver using:

$ excalibur webserver

Now you can go to http://localhost:5000 and extract data tables from your PDFs using the web interface! Check out the usage section of the documentation for instructions.

usage.gif

Why Excalibur?

  • Your data remains with you. All file storage and processing happens on your own local or remote machine.
  • Table extraction rules can be saved as presets which can then be applied on different PDFs to extract tables with similar structures. (in v0.2.0)
  • Execution of jobs which use a rule to extract tables from multiple PDFs in one go. (in v0.2.0)
  • Configurable with MySQL and Celery for heavy workloads. (in v0.2.0) By default, sqlite and multiprocessing are used for light workloads.
  • Job scheduling and incoming/outgoing webhooks. (in v0.3.0)

Excalibur uses Camelot under the hood. See comparison with other PDF table extraction libraries and tools.

Installation

Using pip

After installing the dependencies for Camelot (tk and ghostscript), you can simply use pip to install Excalibur:

$ pip install excalibur-py

From the source code

After installing the dependencies for Camelot, clone the repo using:

$ git clone https://www.github.com/camelot-dev/excalibur

and install Excalibur using pip:

$ cd excalibur
$ pip install .

Documentation

Great documentation is available at http://excalibur-py.readthedocs.io/.

Development

The Contributor's Guide has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.

Source code

You can check the latest sources with:

$ git clone https://www.github.com/camelot-dev/excalibur

Setting up a development environment

You can install the development dependencies easily, using pip:

$ pip install excalibur-py[dev]

Testing (soon)

After installation, you can run tests using:

$ python setup.py test

Versioning

Camelot uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out HISTORY.md.

License

This project is licensed under the MIT License, see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

excalibur-py-0.1.0.tar.gz (1.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

excalibur_py-0.1.0-py2.py3-none-any.whl (1.6 MB view details)

Uploaded Python 2Python 3

File details

Details for the file excalibur-py-0.1.0.tar.gz.

File metadata

  • Download URL: excalibur-py-0.1.0.tar.gz
  • Upload date:
  • Size: 1.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.28.0 CPython/3.6.6

File hashes

Hashes for excalibur-py-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0edb00f1a12f4a8e4e2d66023fd170e1676bcb9ee81060ef3311880e75b97e65
MD5 07c2cb5d15ed2ee5f1762063814c1c41
BLAKE2b-256 83711915694852ae3d6a7b63f20f06e186ebac75ecfa08412eac8485f0f39945

See more details on using hashes here.

File details

Details for the file excalibur_py-0.1.0-py2.py3-none-any.whl.

File metadata

  • Download URL: excalibur_py-0.1.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 1.6 MB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.20.0 setuptools/40.4.3 requests-toolbelt/0.8.0 tqdm/4.28.0 CPython/3.6.6

File hashes

Hashes for excalibur_py-0.1.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 236d1a1dd0a009f0423a419f9a416260fde020694a2a7eca6cb95271cb2fe5c0
MD5 f4a63da5c5774e665e044ee7c60a159b
BLAKE2b-256 5d893f466afc88ce2f51e08f73a0d0e7892e2bbe340ba331d6858005b818e2e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page