pdf 内の text や image へのアクセスをコンテキストマネージャーを使ってシンプルに行える
Project description
pakkan-pdf
PDF 内の text や image へのアクセスをコンテキストマネージャーを使ってシンプルに行える。 pdfminer/pdfminer.six の Wrapper ライブラリです。
install
pip install pakkanpdf
使い方
- PdfExtractor の pdf_path に pdf のパスを与え、work_dir に存在するディレクトリを指定する
- work_dir に image を書き出すための一時ディレクトリが作成さえる
- extractor.text を使うと、PDF の text を取得できる
- extractor.image_file_paths を使うと、PDF の image (file path) を取得できる
from pakkanpdf import PdfExtractor
def test_sample():
with PdfExtractor(pdf_path="data/example.pdf", work_dir="demo_work_dir") as extractor:
assert "これはサンプルのPDFです" in extractor.text
assert extractor.image_file_paths == ["demo_work_dir/work_images/X8.jpg"]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pakkanpdf-0.1.3.tar.gz
(7.0 kB
view details)
Built Distribution
File details
Details for the file pakkanpdf-0.1.3.tar.gz
.
File metadata
- Download URL: pakkanpdf-0.1.3.tar.gz
- Upload date:
- Size: 7.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.9.12 Darwin/20.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d8a5de2f9bbce11943443c2000cb02f02e827cac46c18ee7a99badf2b7734d0 |
|
MD5 | a3b3d11b1f8ec57846ad79224b09dc49 |
|
BLAKE2b-256 | 4193aac8ad91c1e53599974db5d23f979c4dbb05490bf2dd9953861c99ef363c |
File details
Details for the file pakkanpdf-0.1.3-py3-none-any.whl
.
File metadata
- Download URL: pakkanpdf-0.1.3-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.9.12 Darwin/20.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd2eb1803c7f9cbc5b46d6ccd0663727df9d43ce4f5ad0e8a4a59e1f86d2bdf9 |
|
MD5 | 1b14fa974a3cc41d22f5c63dd96422ab |
|
BLAKE2b-256 | f086319dcbd246a4495a00e73ee3a76610ab353147dc21649a3617535c8df618 |