Skip to main content

Information extraction and named-entity recognition for indexing PDFs

Project description

pdfner

Information extraction and named entity recognition for indexing PDFs

Install NLP tools

  1. Download language-specific model data in spaCy
        $ python -m spacy download en
    
  2. Download Stanford CoreNLP from https://stanfordnlp.github.io/CoreNLP/download.html and extract to {project root}/pdfner/tests/tools

Install OCRmyPDF

https://ocrmypdf.readthedocs.io/en/latest/installation.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfner-0.1.0.tar.gz (6.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfner-0.1.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file pdfner-0.1.0.tar.gz.

File metadata

  • Download URL: pdfner-0.1.0.tar.gz
  • Upload date:
  • Size: 6.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for pdfner-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6726df18ba31486c9a8a40275610cf3d3abffb77ca50cd337032784eb625d73b
MD5 4fe123c5b3bb3495ccee7b0d26df269f
BLAKE2b-256 b102aef45ce8d32df584bb921f656348f7405347c557b43260f02241d9796934

See more details on using hashes here.

File details

Details for the file pdfner-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdfner-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0.post20200106 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.6

File hashes

Hashes for pdfner-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 877285f80aedaaa68da8793fd7ed0f46f308af8fe1757b9d4b2870f06313c152
MD5 31835b59dd815ebb9909cf51c25bd01a
BLAKE2b-256 883f3cd1c20fba0382506a8f9777bbf3f951257885e756b361f96d6805a9e067

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page