Skip to main content

A set of extended tools to process pdf files

Project description

PdfExTools

This package contains tools can be used to handle pdf files.

1. PageNumberExtractor:

to extract physical page numbers, i.e., the page numbers printed in page (as part of content in the page), rather than the logical page numbers tracked by pdf reader or tools like pdfplumber, pymupdf, etc. This can be useful because sometimes the pdf file was excerpted from a large file, as a result, the page number showing in page can start from 135 to 145, while the total number of pages is 11 (1-11).

1) to install:

pip install PdfExTools

2) to use, do the following:

from pdfextools import PageNumberExtractor

pdf_file = r"./sample-pdfs/2-col-pubmed.pdf"

print("pdf_file: " + pdf_file)

extractor = PageNumberExtractor()
page_numbers = extractor.process(pdf_file)

print(page_numbers)

The result will be a dictionary mapping logical page numbers (base-0) to the physical ones. For example:

pdf_file: /sample-pdfs/2-col-pubmed.pdf
{0: 11, 1: 12, 2: 13, 3: 14, 4: 15, 5: 16, 6: 17, 7: 18, 8: 19, 9: 20, 10: 21, 11: 22, 12: 23}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdfextools-0.2.0.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pdfextools-0.2.0-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file pdfextools-0.2.0.tar.gz.

File metadata

  • Download URL: pdfextools-0.2.0.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for pdfextools-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b4c92ce1190165c5232d6aac0ca2655f6a1c379a26626c1814755351e49f020b
MD5 b374af1ec2383afc760e61be16e07336
BLAKE2b-256 186ebcf330ad3740f7d80f8284c90211dc859ce06773e2e5f49980843db487d7

See more details on using hashes here.

File details

Details for the file pdfextools-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pdfextools-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.1.post20201107 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.5

File hashes

Hashes for pdfextools-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1feebc461cc7a828563f563d500fe6876b9e23cc76bab67822e5bd902108c311
MD5 e5eed5b93da805f69635fa968e01a7fb
BLAKE2b-256 d4ef0c83bbb02fa3887a0945ea24cf6dbb5f6b6aa438eefb1e51bb777966c153

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page