A web interface for Camelot (PDF Table Extraction for Humans).
Project description
Excalibur: A web interface for Camelot
(PDF Table Extraction for Humans)
Excalibur is a web interface to extract data tables from PDFs! It is powered by Camelot and works with Python 3.
Note: Excalibur only works with text-based PDFs and not scanned documents. (As Tabula explains, "If you can click and drag to select text in your table in a PDF viewer, then your PDF is text-based".)
Using Excalibur
After installation, you need to initialize the Excalibur metadata database using:
$ excalibur initdb
And then start the webserver using:
$ excalibur webserver
Now you can go to http://localhost:5000 and extract data tables from your PDFs using the web interface! Check out the usage section of the documentation for instructions.
Why Excalibur?
- Your data remains with you. All file storage and processing happens on your own local or remote machine.
- Table extraction rules can be saved as presets which can then be applied on different PDFs to extract tables with similar structures. (in v0.2.0)
- Execution of jobs which use a rule to extract tables from multiple PDFs in one go. (in v0.2.0)
- Configurable with MySQL and Celery for heavy workloads. (in v0.2.0) By default, sqlite and multiprocessing are used for light workloads.
- Job scheduling and incoming/outgoing webhooks. (in v0.3.0)
Excalibur uses Camelot under the hood. See comparison with other PDF table extraction libraries and tools.
Installation
Using pip
After installing the dependencies for Camelot (tk and ghostscript), you can simply use pip to install Excalibur:
$ pip install excalibur-py
From the source code
After installing the dependencies for Camelot, clone the repo using:
$ git clone https://www.github.com/camelot-dev/excalibur
and install Excalibur using pip:
$ cd excalibur $ pip install .
Documentation
Great documentation is available at http://excalibur-py.readthedocs.io/.
Development
The Contributor's Guide has detailed information about contributing code, documentation, tests and more. We've included some basic information in this README.
Source code
You can check the latest sources with:
$ git clone https://www.github.com/camelot-dev/excalibur
Setting up a development environment
You can install the development dependencies easily, using pip:
$ pip install excalibur-py[dev]
Testing (soon)
After installation, you can run tests using:
$ python setup.py test
Versioning
Camelot uses Semantic Versioning. For the available versions, see the tags on this repository. For the changelog, you can check out HISTORY.md.
License
This project is licensed under the MIT License, see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for excalibur_py-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 236d1a1dd0a009f0423a419f9a416260fde020694a2a7eca6cb95271cb2fe5c0 |
|
MD5 | f4a63da5c5774e665e044ee7c60a159b |
|
BLAKE2b-256 | 5d893f466afc88ce2f51e08f73a0d0e7892e2bbe340ba331d6858005b818e2e0 |