pdflex

Python tools for PDF automation.

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

zeroxeli

These details have not been verified by PyPI

Project description

What is `PDFlex?`

PDFlex is a powerful PDF processing toolkit for Python. It provides robust tools for PDF validation, text extraction, merging (with custom separator pages), searching, and more—all built to streamline your PDF automation workflows.

Features

PDF Validation: Quickly verify if a file is a valid PDF.
Text Extraction: Extract text from PDFs using either PyMuPDF or PyPDF.
Directory Processing: Process entire directories of PDFs for text extraction.
PDF Merging: Merge multiple PDF files into one, automatically inserting a custom separator page between documents.
- The separator page displays the title (derived from the filename) with underscores and hyphens removed.
- Supports both portrait and landscape separator pages (ideal for lecture slides).
PDF Searching: Recursively search for PDFs in a directory based on filename patterns (e.g., numeric float prefixes).

Quick Start

Installation

PDFlex is available on PyPI. To install using pip:

pip install -U pdflex

Alternatively, install in an isolated environment with pipx:

pipx install pdflex

For the fastest installation using uv:

uv tool install pdflex

Usage

Command-Line Interface (CLI)

PDFlex provides a convenient CLI for merging and searching PDFs. The CLI supports two primary commands: merge and search.

Merge Command

Merge multiple PDF files into a single document while automatically inserting a separator page before each document.

Usage:

pdflex merge /path/to/file1.pdf /path/to/file2.pdf -o merged_output.pdf

Add the --landscape flag to create separator pages in landscape orientation:

pdflex merge /path/to/file1.pdf /path/to/file2.pdf -o merged_output.pdf --landscape

Search and Merge Command

Search for PDF files in a directory based on filename filters (or search for lecture slides with numeric float prefixes) and merge them into one PDF.

Usage:

General Search:

pdflex search /path/to/search -o merged_output.pdf --prefix "Chapter" --suffix ".pdf"

Lecture Slides Merge: (Merges all PDFs whose filenames start with a numeric float prefix like 1.2_, 3.2_, etc., in sorted order. Separator pages will be in landscape orientation.)
```
pdflex search /path/to/algorithms-and-computation -o merged_lectures.pdf --lecture
```

Python API Usage

You can also use PDFlex directly from your Python code. Below are examples for some common tasks.

Merging PDFs with Separator Pages

from pathlib import Path
from pdflex.merge import merge_pdfs

# List of PDF file paths to merge
pdf_files = [
    "/path/to/document1.pdf",
    "/path/to/document2.pdf"
]

# Merge files, using landscape separator pages (ideal for lecture slides)
merge_pdfs(pdf_files, output_path="merged_output.pdf", landscape=True)

Searching for PDFs by Filename

from pdflex.search import search_pdfs, search_numeric_prefixed_pdfs

# General search: Find PDFs that start with a prefix and/or end with a suffix
pdf_list = search_pdfs("/path/to/search", prefix="Chapter", suffix=".pdf")
print("Found PDFs:", pdf_list)

# Lecture slides: Find PDFs with numeric float prefixes (e.g., "1.2_Intro.pdf")
lecture_slides = search_numeric_prefixed_pdfs("/path/to/algorithms-and-computation")
print("Found lecture slides:", lecture_slides)

Contributing

Contributions are welcome! Whether it's bug reports, feature requests, or code contributions, please feel free to:

Open an issue
Submit a pull request
Improve documentation.
Share your ideas!

Acknowledgments

This project is built upon several awesome PDF open-source projects:

License

PDFlex is released under the MIT license.
Copyright (c) 2020 to present PDFlex and contributors.

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

zeroxeli

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.9

Mar 20, 2025

0.1.7

Feb 18, 2025

0.1.6

Feb 18, 2025

0.1.5

Feb 18, 2025

This version

0.1.4

Feb 18, 2025

0.1.3

Feb 18, 2025

0.1.0

Feb 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdflex-0.1.4.tar.gz (340.5 kB view details)

Uploaded Feb 18, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pdflex-0.1.4-py3-none-any.whl (14.3 kB view details)

Uploaded Feb 18, 2025 Python 3

File details

Details for the file pdflex-0.1.4.tar.gz.

File metadata

Download URL: pdflex-0.1.4.tar.gz
Upload date: Feb 18, 2025
Size: 340.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pdflex-0.1.4.tar.gz
Algorithm	Hash digest
SHA256	`10617351bd256d62f91aa4f48cb4e68eac93979fd7d8d82c13493d9953270052`
MD5	`643f5d18780415d31d01181ef37a6804`
BLAKE2b-256	`b15e4066395ebcc9d89db075dba636c5269cc5bbcc1d239d620e2d9d5f6c7fd2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdflex-0.1.4.tar.gz:

Publisher: ci.yml on eli64s/pdflex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pdflex-0.1.4.tar.gz
- Subject digest: 10617351bd256d62f91aa4f48cb4e68eac93979fd7d8d82c13493d9953270052
- Sigstore transparency entry: 172177918
- Sigstore integration time: Feb 18, 2025
Source repository:
- Permalink: eli64s/pdflex@5f6e37b693c2a0b421f065fb038877b2c073687f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/eli64s
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@5f6e37b693c2a0b421f065fb038877b2c073687f
- Trigger Event: push

File details

Details for the file pdflex-0.1.4-py3-none-any.whl.

File metadata

Download URL: pdflex-0.1.4-py3-none-any.whl
Upload date: Feb 18, 2025
Size: 14.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for pdflex-0.1.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`321a72940a2bb7cd70af2beab1fee11b7f71257ec3d932d0fbfbb8f0e8d5818d`
MD5	`85b7a05e3203dd58239fb2c05463f0f3`
BLAKE2b-256	`3b3b791d2510d392eec7dafcf52ef9f3e4a3b5c20134eb45d8e8da619e892b9a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pdflex-0.1.4-py3-none-any.whl:

Publisher: ci.yml on eli64s/pdflex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pdflex-0.1.4-py3-none-any.whl
- Subject digest: 321a72940a2bb7cd70af2beab1fee11b7f71257ec3d932d0fbfbb8f0e8d5818d
- Sigstore transparency entry: 172177921
- Sigstore integration time: Feb 18, 2025
Source repository:
- Permalink: eli64s/pdflex@5f6e37b693c2a0b421f065fb038877b2c073687f
- Branch / Tag: refs/heads/main
- Owner: https://github.com/eli64s
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@5f6e37b693c2a0b421f065fb038877b2c073687f
- Trigger Event: push

pdflex 0.1.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

What is PDFlex?

Features

Quick Start

Installation

Usage

Command-Line Interface (CLI)

Merge Command

Search and Merge Command

Python API Usage

Merging PDFs with Separator Pages

Searching for PDFs by Filename

Contributing

Acknowledgments

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

What is `PDFlex?`