Skip to main content

pdf 内の text や image へのアクセスをコンテキストマネージャーを使ってシンプルに行える

Project description

pakkan-pdf

PDF 内の text や image へのアクセスをコンテキストマネージャーを使ってシンプルに行える。 pdfminer/pdfminer.six の Wrapper ライブラリです。

install

pip install pakkanpdf

使い方

  • PdfExtractor の pdf_path に pdf のパスを与え、work_dir に存在するディレクトリを指定する
    • work_dir に image を書き出すための一時ディレクトリが作成さえる
  • extractor.text を使うと、PDF の text を取得できる
  • extractor.image_file_paths を使うと、PDF の image (file path) を取得できる
from pakkanpdf import PdfExtractor

def test_sample():
    with PdfExtractor(pdf_path="data/example.pdf", work_dir="demo_work_dir") as extractor:
        assert "これはサンプルのPDFです" in extractor.text
        assert extractor.image_file_paths == ["demo_work_dir/work_images/X8.jpg"]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pakkanpdf-0.1.3.tar.gz (7.0 kB view details)

Uploaded Source

Built Distribution

pakkanpdf-0.1.3-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file pakkanpdf-0.1.3.tar.gz.

File metadata

  • Download URL: pakkanpdf-0.1.3.tar.gz
  • Upload date:
  • Size: 7.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.12 Darwin/20.6.0

File hashes

Hashes for pakkanpdf-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5d8a5de2f9bbce11943443c2000cb02f02e827cac46c18ee7a99badf2b7734d0
MD5 a3b3d11b1f8ec57846ad79224b09dc49
BLAKE2b-256 4193aac8ad91c1e53599974db5d23f979c4dbb05490bf2dd9953861c99ef363c

See more details on using hashes here.

File details

Details for the file pakkanpdf-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: pakkanpdf-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.9.12 Darwin/20.6.0

File hashes

Hashes for pakkanpdf-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 bd2eb1803c7f9cbc5b46d6ccd0663727df9d43ce4f5ad0e8a4a59e1f86d2bdf9
MD5 1b14fa974a3cc41d22f5c63dd96422ab
BLAKE2b-256 f086319dcbd246a4495a00e73ee3a76610ab353147dc21649a3617535c8df618

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page