Skip to main content

Smart text extraction from PDF documents

Project description

Tests Documentation PyPI Codecov DOI

EDS-PDF

EDS-PDF provides modular framework to extract text from PDF documents.

You can use it out-of-the-box, or extend it to fit your use-case.

Getting started

Install the library with pip:

$ pip install edspdf

Visit the documentation for more information!

Citation

If you use EDS-NLP, please cite us as below.

@software{edspdf,
  author  = {Dura, Basile and Wajsburt, Perceval and Calliger, Alice and Gérardin, Christel and Bey, Romain},
  doi     = {10.5281/zenodo.6902977},
  license = {BSD-3-Clause},
  title   = {{EDS-PDF: Smart text extraction from PDF documents}},
  url     = {https://github.com/aphp/edspdf}
}

Acknowledgement

We would like to thank Assistance Publique – Hôpitaux de Paris and AP-HP Foundation for funding this project.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edspdf-0.5.3.tar.gz (15.1 kB view details)

Uploaded Source

Built Distribution

edspdf-0.5.3-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file edspdf-0.5.3.tar.gz.

File metadata

  • Download URL: edspdf-0.5.3.tar.gz
  • Upload date:
  • Size: 15.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.0 CPython/3.8.10 Linux/5.15.0-1017-azure

File hashes

Hashes for edspdf-0.5.3.tar.gz
Algorithm Hash digest
SHA256 fa2db49b5fbd3f42d360b79a8e17ad5e34689567547397e5c91d33604e69690f
MD5 4fd35a1674f251c07c95105bd7f5200b
BLAKE2b-256 a9cca16ae24cf6fa57b0740a2a7c479074e98e74ad95cb90c4f6209545d88c40

See more details on using hashes here.

File details

Details for the file edspdf-0.5.3-py3-none-any.whl.

File metadata

  • Download URL: edspdf-0.5.3-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.0 CPython/3.8.10 Linux/5.15.0-1017-azure

File hashes

Hashes for edspdf-0.5.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0e5b3c3f568c2b48eca147a3a15f21bab2026b6cff9071b66531c8d2fd848a5c
MD5 17f64b24da9b9a231f40ae87f0829619
BLAKE2b-256 6413349e9078f9d836d21b446c31b2407f6ed484ee18166f567cfba7ff0b6087

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page