Skip to main content

This package extracts important information from a pdf document such as heading, paragraphs and important keywords!!!

Project description

Data extractor for PDF documents - pdf-info

A command line tool and Python library to support your analysis of pdf documents.

Extracts important fetures from a document like headers, paragraphs, important keywords and subscripts.

Returns a vector of relevant details!!

Installation

Install pdf-info using pip

pip install pdf-info

Use as Python Library

You can easily add pdf-info to your own Python scripts as library.

from pdf_info import pdf_info_class

ob = pdf_info_class()

result = ob.pdf_info('path/to/my/file.pdf',page_number,tag)

List of tags supported are - "headers", "paragraphs", "keywords", "subscripts".

Maintainers

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_info-2.1.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdf_info-2.1.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file pdf_info-2.1.0.tar.gz.

File metadata

  • Download URL: pdf_info-2.1.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.5.0 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.7.1

File hashes

Hashes for pdf_info-2.1.0.tar.gz
Algorithm Hash digest
SHA256 d4f3b1f1187fe3ce65a021651f719a3f4c6db69d264652be268bd3b71905f473
MD5 61f41032fd420760a8268a9ff82d69ed
BLAKE2b-256 e627177c03a62c51a5d27f708bf3fedf623532a5ff121cae6e3ff995e8d79c3c

See more details on using hashes here.

File details

Details for the file pdf_info-2.1.0-py3-none-any.whl.

File metadata

  • Download URL: pdf_info-2.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.5.0 pkginfo/1.7.1 requests/2.24.0 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.7.1

File hashes

Hashes for pdf_info-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dd111a002486799e229884588299b5771a02c8c85ffdcd6088752b896142de45
MD5 f202279a02c2acde4f8bb70cd72141aa
BLAKE2b-256 add8960e8e95209110eced134472629c9304ddc7488a61ebbf725781def4c697

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page