Host your own local PDF server applying OCR and duplex scanning on your documents

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

pyPDFserver

pyPDFserver provides a bridge FTP server accepting PDFs (for example from your network printer) and applies OCR, image optimization and/or merging to a duplex scan. The final PDF is uploaded to your target machine (e.g. you NAS) via FTP.

Installation

pyPDFserver is designed to run in a Docker container, but you can also host it manually. First, install Python (>= 3.10) and install pyPDFserver via pip

pip install pyPDFserver

Then you need to install the external dependencies for ocrmypdf (e.g. tesseract, ghostscript) by following this manual: https://ocrmypdf.readthedocs.io/en/latest/installation.html. You can then run pyPDFserver with

python -m pyPDFserver

After first run, two configruation files will be created in your systems configruation folder (refer to the console output to extract the exact paths) named pyPDFserver.ini and profiles.ini. You need to modify them with your settings and restart pyPDFserver.

Docker

A docker image is available including the most popular languages.

Usage

Now simply connect to your FTP server and upload files. After some time (OCR may take several minutes), they will be uploaded to your server.

OCR

pyPDFserver uses OCRmyPDF to apply OCR to your PDF. Simply set ocr_enabled to True in your profile to apply OCR to your files. Please note that you should define an language in the profile.ini to get the best OCR results.

Duplex scan

pyPDFserver allows you to automatically merge two scans of the front and back pages (i.e. duplex 1 and duplex 2) into a single file. This is intended to be used with an Automatic Document Feeder (ADF). Keep the following in mind:

The uploaded files must match the input_duplex1_name and input_duplex1_name templates in your profile.ini
The back pages must have reversed order in the pdf file (as you simply turn them around for scanning)
The page count of both files must match or the task is rejected

Commands

At any time you can see your progress in the console by using

tasks list: List all running and recently finished or failed tasks

Other useful commands are

exit: Terminate the server and clear temporary files
version: List the installed version

Some internal commands you don't usually need to use:

tasks force_clear: Clear all scheduled and finished tasks (does not abort the current task)
artifacts list: Internal command to list all artifacts
artifacts clean: Remove some untracked artifacts to release some storage (usually not needed)

Configruation

pyPDFserver.ini

[SETTINGS]
# Set the desired log level (CRITICAL, ERROR, WARNING, INFO, DEBUG)
log_level = INFO
# If set to False, disable interactive console input
interactive_shell = False
# If set to True, enable colored console output
log_colors = True
# If set to True, create log files
log_to_file = True
# Time (in seconds) to wait for the back pages of a duplex scan after the
# front page upload before timing out. Set to zero to disable the timeout.
duplex_timeout = 600
# If set to True, pyPDFserver will search for old temporary files at startup
# and delete them
clean_old_temporary_files = True

[FTP]
local_ip = 127.0.0.1
port = 21
# If pyPDFserver is running behind a NAT, you may need to set the IP address
# that clients use to connect to the FTP server to prevent foreign address errors.
public_ip = 
# In FTP passive mode, clients open both control and data connections to bypass
# NATs on the client side. If pyPDFserver itself is running behind a NAT, you
# need to open the passive ports. By default, FTP servers use random ports, but
# you can define a custom list or range of ports.
# Write them as a comma-separated list (e.g. 6000,6010-6020,6030).
passive_ports = 23001-23010

[EXPORT_FTP_SERVER]
# Set the address and credentials for the external FTP server
host = 
port = 
username = 
password = 
# If pyPDFserver is running behind a NAT (e.g. in a Docker container), you may
# want to define control ports (the ports used to open connections to the
# external FTP server) and allow them in your firewall settings.
control_port = 23000

[WEBINTERFACE]
# If set to True, start a simple web interface to display currently scheduled,
# running, and finished tasks
enabled = True
# Set the port for the web server. If empty, it defaults to 80 or 443 (TLS enabled).
port =

profiles.ini

# You can define multiple profiles to use different settings (e.g. different OCR languages,
# optimization levels, or file name templates). Each profile must have a unique username.
# Any fields not explicitly set will fall back to the DEFAULT profile.


[DEFAULT]
# Username for the FTP server
username = pyPDFserver
# Password for the FTP server. Note that after the first run it will be replaced with
# a hash value. To change the password later, remove its value and set a new password.
# After the next run, it will again be replaced with its hash value.
password = 

# OCR settings
# Refer to https://ocrmypdf.readthedocs.io/en/latest/optimizer.html for a more detailed explanation

ocr_enabled = False
# Set the three-letter language code for Tesseract OCR. You can provide multiple languages serperated by a plus
# You must install the corresponding Tesseract language pack first.
ocr_language = 
# Correct pages that were scanned at a skewed angle by rotating them into alignment
# (--deskew option for OCRmyPDF)
ocr_deskew = True
# Optimization level passed to OCRmyPDF
# (e.g. 0: no optimization, 1: lossless optimizations,
#  2: some lossy optimizations, 3: aggressive optimization)
ocr_optimize = 1
# Attempt to determine the correct orientation for each page and rotate it if necessary
# (--rotate-pages parameter for OCRmyPDF)
ocr_rotate_pages = True
# Timeout (in seconds) for Tesseract processing per page
# (--tesseract-timeout parameter for OCRmyPDF)
ocr_tesseract_timeout = 60

# File name settings
# When uploading a file to pyPDFserver, it is matched against the defined template strings
# and rejected if it does not match any of them. You can use tags (which pyPDFserver replaces
# with regular expression patterns) to capture groups.
# Available tags:
#   (lang): capture a three-letter language code. Multiple languages can be given (seperated by comma)
#   (*): capture any content
# In export_duplex_name you can also use:
#   (*1): insert the (*) match from duplex1
#   (*2): insert the (*) match from duplex2

# If set to True, file name matching is case-sensitive
input_case_sensitive = True
# Template string for incoming PDF files
input_pdf_name = SCAN_(*).pdf
# Template string for exported PDF files
export_pdf_name = Scan_(*).pdf
# Template strings for duplex PDF files (1 = front pages, 2 = back pages)
input_duplex1_name = DUPLEX1_(*).pdf
input_duplex2_name = DUPLEX2_(*).pdf
# Template string for exported duplex PDF files
export_duplex_name = Scan_(*1)_(lang).pdf
# Target path on the external FTP server for uploaded files
export_path = 

# Two example profiles. You can define as many profiles as you like
[DE]
username = pyPDFserver_de
ocr_enabled = True
ocr_language = deu

[EN]
username = pyPDFserver_en
ocr_enabled = True
ocr_language = eng

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

andreasmz

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.3.1

Jan 12, 2026

1.3.0

Jan 11, 2026

1.2.3

Jan 8, 2026

1.2.2

Jan 5, 2026

1.2.1

Jan 4, 2026

1.2.0

Jan 2, 2026

1.1.4

Jan 2, 2026

1.1.3

Jan 2, 2026

1.1.2

Jan 1, 2026

1.1.1

Dec 31, 2025

This version

1.1.0

Dec 31, 2025

1.0.4

Dec 30, 2025

1.0.3

Dec 30, 2025

1.0.2

Dec 30, 2025

1.0.1

Dec 30, 2025

1.0.0

Dec 29, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypdfserver-1.1.0.tar.gz (24.0 kB view details)

Uploaded Dec 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pypdfserver-1.1.0-py3-none-any.whl (25.0 kB view details)

Uploaded Dec 31, 2025 Python 3

File details

Details for the file pypdfserver-1.1.0.tar.gz.

File metadata

Download URL: pypdfserver-1.1.0.tar.gz
Upload date: Dec 31, 2025
Size: 24.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pypdfserver-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`389d99e139ec95fe61427ed4e391c144e905e40389700e7aec7e3f6339dee183`
MD5	`545a9647dd95a1cf7a9b9762f9377bd4`
BLAKE2b-256	`aa3cef4e10cb8674a4f1b6dc42a09f283b25957b5476e9183b10f573e369f680`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypdfserver-1.1.0.tar.gz:

Publisher: build.yml on andreasmz/pyPDFserver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pypdfserver-1.1.0.tar.gz
- Subject digest: 389d99e139ec95fe61427ed4e391c144e905e40389700e7aec7e3f6339dee183
- Sigstore transparency entry: 785694680
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: andreasmz/pyPDFserver@87372e3b7c6e497f36b9031f9e4eb984d657f816
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/andreasmz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: build.yml@87372e3b7c6e497f36b9031f9e4eb984d657f816
- Trigger Event: push

File details

Details for the file pypdfserver-1.1.0-py3-none-any.whl.

File metadata

Download URL: pypdfserver-1.1.0-py3-none-any.whl
Upload date: Dec 31, 2025
Size: 25.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pypdfserver-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bf93a4e7e25ae4573864a2a4935c2b422eabd9593080fff56959a3f8f4e99b4b`
MD5	`bcc0b081e591c8c139164d0878c13490`
BLAKE2b-256	`88baeb86f649e353d92169fb1820879bf67ef88dd913a4177c2b09af5f26d53a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pypdfserver-1.1.0-py3-none-any.whl:

Publisher: build.yml on andreasmz/pyPDFserver

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pypdfserver-1.1.0-py3-none-any.whl
- Subject digest: bf93a4e7e25ae4573864a2a4935c2b422eabd9593080fff56959a3f8f4e99b4b
- Sigstore transparency entry: 785694681
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: andreasmz/pyPDFserver@87372e3b7c6e497f36b9031f9e4eb984d657f816
- Branch / Tag: refs/tags/v1.1.0
- Owner: https://github.com/andreasmz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: build.yml@87372e3b7c6e497f36b9031f9e4eb984d657f816
- Trigger Event: push

pyPDFServer 1.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

pyPDFserver

Installation

Docker

Usage

OCR

Duplex scan

Commands

Configruation

pyPDFserver.ini

profiles.ini

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance