Plugin framework for pdf-autofillr — extend extractors, mappers, validators, fillers, and more
Project description
pdf-autofillr-plugins
Plugin framework for extending pdf-autofillr — zero runtime dependencies, pure Python.
Build custom extractors, mappers, validators, fillers, chunkers, embedders, and transformers and drop them into any pdf-autofillr module without touching its source code.
pip install pdf-autofillr-plugins
Why plugins?
pdf-autofillr ships with sensible defaults for every stage of the PDF pipeline. Plugins let you override any stage for your use case:
| Need | Plugin type |
|---|---|
| Parse a proprietary document format | ExtractorPlugin |
| Map fields to your internal schema | MapperPlugin |
| Validate phone, tax ID, IBAN fields | ValidatorPlugin |
| Use a custom PDF writing library | FillerPlugin |
| Split PDFs into domain-specific chunks | ChunkerPlugin |
| Use your own embedding model or provider | EmbedderPlugin |
| Normalise currencies, dates, addresses | TransformerPlugin |
Install
pip install pdf-autofillr-plugins
No extra dependencies. The package is pure Python and works on Python 3.9+.
Quick start
import re
from pdf_autofillr_plugins import plugin, PluginManager
from pdf_autofillr_plugins.interfaces import ValidatorPlugin, PluginMetadata
@plugin(category="validator", name="phone-validator", version="1.0.0")
class PhoneValidatorPlugin(ValidatorPlugin):
_E164 = re.compile(r"^\+[1-9]\d{6,14}$")
def get_metadata(self):
return PluginMetadata(name="phone-validator", version="1.0.0",
author="You", description="E.164 phone validator",
category="validator")
def supports_field_type(self, field_type):
return field_type.lower() in {"phone", "telephone", "mobile"}
def validate(self, field_name, field_value, rules=None, **kwargs):
errors = [] if self._E164.match(str(field_value)) else [f"Invalid: {field_value!r}"]
return {"valid": not errors, "errors": errors, "warnings": [], "validator": "phone-validator"}
# Register and use
manager = PluginManager()
manager.registry.register_plugin(PhoneValidatorPlugin, "validator", "phone-validator")
validator = manager.load_plugin("phone-validator", "validator")
print(validator.validate("phone", "+12125551234"))
# {"valid": True, "errors": [], "warnings": [], "validator": "phone-validator"}
→ See quickstart.md for more examples.
Plugin interfaces
| Interface | Method to implement | Use for |
|---|---|---|
ExtractorPlugin |
extract(), supports() |
Reading data from PDFs or documents |
MapperPlugin |
map_fields(), supports_schema() |
Mapping fields to a target schema |
ValidatorPlugin |
validate(), supports_field_type() |
Validating field values |
FillerPlugin |
fill(), supports_pdf_type() |
Writing data into PDFs |
ChunkerPlugin |
chunk() |
Splitting PDF content for processing |
EmbedderPlugin |
embed(), check() |
Embedding metadata into PDFs |
TransformerPlugin |
transform(), supports_type() |
Transforming field values |
All interfaces extend BasePlugin which provides: initialize(), shutdown(), config, name, version, category, is_initialized.
Built-in plugins
Three plugins ship with the package and are ready to use:
email-validator (category: validator)
Validates email addresses: format, length, disposable domain detection, and optional rules.
from pdf_autofillr_plugins.builtin.validators.email_validator import EmailValidatorPlugin
from pdf_autofillr_plugins import PluginManager
manager = PluginManager()
manager.registry.register_plugin(EmailValidatorPlugin, "validator", "email-validator")
v = manager.load_plugin("email-validator", "validator")
v.validate("email", "user@example.com")
# {"valid": True, "errors": [], "warnings": []}
v.validate("email", "test@tempmail.com")
# {"valid": True, "errors": [], "warnings": ["Disposable email domain detected: tempmail.com"]}
v.validate("email", "not-an-email")
# {"valid": False, "errors": ["Invalid email format: 'not-an-email'"]}
# Custom rules
v.validate("email", "user@gmail.com", rules={"allowed_domains": ["company.com"]})
# {"valid": False, "errors": ["Email domain not in allowed list: gmail.com"]}
passthrough-extractor (category: extractor)
Returns pre-configured fields unchanged. Useful for testing pipelines without a real PDF.
from pdf_autofillr_plugins.builtin.extractors.passthrough_extractor import PassthroughExtractorPlugin
fields = [
{"name": "investor_name", "value": "Jane Smith", "confidence": 0.99},
{"name": "email", "value": "jane@example.com", "confidence": 0.98},
]
extractor = PassthroughExtractorPlugin(config={"fields": fields})
extractor.initialize()
result = extractor.extract("blank_form.pdf")
# {"fields": [...], "extractor": "passthrough-extractor", "confidence": 1.0}
identity-mapper (category: mapper)
Maps extracted fields to a schema by exact match, then snake_case normalisation.
from pdf_autofillr_plugins.builtin.mappers.identity_mapper import IdentityMapperPlugin
mapper = IdentityMapperPlugin()
mapper.initialize()
result = mapper.map_fields(
extracted_fields=[
{"name": "Investor Name", "value": "Jane Smith", "confidence": 1.0},
{"name": "email_address", "value": "jane@example.com", "confidence": 1.0},
],
target_schema={
"investor_name": "string",
"email_address": "string",
},
)
# {
# "mapped_fields": {"investor_name": "Jane Smith", "email_address": "jane@example.com"},
# "unmapped_fields": [],
# "coverage": 1.0,
# }
The @plugin decorator
@plugin(
category="extractor", # required — extractor | mapper | validator | filler | chunker | embedder | transformer
name="my-extractor", # optional — defaults to class name
version="1.0.0", # optional — default "1.0.0"
author="Your Team", # optional
description="What it does", # optional
tags=["invoice", "finance"],# optional — for filtering
priority=200, # optional — higher loads first (default 100)
enabled=True, # optional — can be disabled without removing
)
class MyExtractor(ExtractorPlugin):
...
Plugin discovery
The registry can discover plugins automatically from a directory or a module path:
manager = PluginManager()
# From a file system directory — scans all .py files
manager.discover_plugins(["./my_plugins/", "./team_plugins/"])
# From an installed Python module
manager.discover_plugins(["my_company.pdf_plugins"])
# Filter by category
manager.discover_plugins(["./my_plugins/"], categories=["validator"])
PluginManager API
from pdf_autofillr_plugins import PluginManager
manager = PluginManager(
plugin_paths=["./my_plugins/"], # auto-discover on init
enabled_plugins=["email-validator", "phone-validator"], # allowlist (None = all)
lazy_load=True, # load on-demand vs at startup
)
# Register manually
manager.registry.register_plugin(MyValidator, "validator", "my-validator")
# Load a plugin (returns None if not found or not enabled)
validator = manager.load_plugin("my-validator", "validator")
# Get a loaded plugin (loads lazily if not yet loaded)
plugin = manager.get_plugin("my-validator", "validator")
# Auto-select the best extractor for a file (uses supports())
extractor = manager.find_extractor("path/to/invoice.pdf")
# Auto-select the best mapper for a schema (uses supports_schema())
mapper = manager.find_mapper({"investor_name": "str", "email": "str"})
# List all registered plugins
all_plugins = manager.list_plugins() # {"validator": [...], "extractor": [...]}
validators = manager.list_plugins("validator") # {"validator": [...]}
# Metadata without loading the plugin
info = manager.get_plugin_info("email-validator", "validator")
# {"name": "email-validator", "version": "1.0.0", "author": "...", "priority": 100, ...}
# Unload a single plugin
manager.unload_plugin("email-validator", "validator")
# Shutdown all — calls shutdown() on each loaded plugin
manager.shutdown()
For developers
Run from source
git clone https://github.com/Engineersmind/pdf-autofillr.git
cd pdf-autofillr/packages/plugins
python -m venv .venv
source .venv/bin/activate # Windows: .venv\Scripts\activate
pip install -e ".[dev]"
Project layout
packages/plugins/
├── src/
│ └── pdf_autofillr_plugins/
│ ├── __init__.py # public API — PluginManager, PluginRegistry, @plugin
│ ├── manager.py # PluginManager
│ ├── registry.py # PluginRegistry
│ ├── decorators.py # @plugin, @requires
│ ├── interfaces/
│ │ ├── __init__.py # re-exports all interfaces
│ │ ├── base_plugin.py # BasePlugin, PluginMetadata
│ │ ├── extractor_plugin.py
│ │ ├── mapper_plugin.py
│ │ ├── validator_plugin.py
│ │ ├── filler_plugin.py
│ │ ├── chunker_plugin.py
│ │ ├── embedder_plugin.py
│ │ └── transformer_plugin.py
│ └── builtin/
│ ├── validators/
│ │ └── email_validator.py
│ ├── extractors/
│ │ └── passthrough_extractor.py
│ └── mappers/
│ └── identity_mapper.py
├── tests/
│ ├── conftest.py
│ ├── unit/
│ │ ├── test_builtin_plugins.py
│ │ └── test_registry_and_manager.py
│ └── integration/
│ └── test_plugin_lifecycle.py
├── examples/
│ ├── custom_validator.py
│ └── custom_extractor.py
├── requirements/
│ ├── base.txt
│ └── dev.txt
├── .env.example
├── CHANGELOG.md
├── LICENSE
├── MANIFEST.in
├── README.md
├── USAGE.md
├── quickstart.md
└── pyproject.toml
Run tests
pip install -e ".[dev]"
# All tests (64 tests, ~0.5s, no external deps)
pytest tests/ -v
# Unit only
pytest tests/unit/ -v
# Integration only
pytest tests/integration/ -v
# With coverage
pytest tests/ --cov=src/pdf_autofillr_plugins --cov-report=term-missing
Publish a new version
# 1. Bump version in pyproject.toml and src/pdf_autofillr_plugins/__init__.py
# 2. Add entry to CHANGELOG.md
# 3. Build
pip install build
python -m build
# 4. Upload
pip install twine
twine upload dist/*
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pdf_autofillr_plugins-0.1.0.tar.gz.
File metadata
- Download URL: pdf_autofillr_plugins-0.1.0.tar.gz
- Upload date:
- Size: 39.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c25c937a0e22bd54f18a04ec597b3f9ef22e5c4fd83e59e89685538f76d89e77
|
|
| MD5 |
aec04cda119866ff452486b9c45f7de7
|
|
| BLAKE2b-256 |
46bda46e93ccfae5e804e0dc9440d299a9bba763db954db4c326b43f7b4c9866
|
File details
Details for the file pdf_autofillr_plugins-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pdf_autofillr_plugins-0.1.0-py3-none-any.whl
- Upload date:
- Size: 37.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f3ae9470fe09bf1cb34db67fbfdb66a70595d5b18fd051fe47ad136f45e1adfe
|
|
| MD5 |
62eb65332563dd4b1a44196e2ad75d59
|
|
| BLAKE2b-256 |
2084d708525c4254a74587862ed65043fc4eae87c758dbfe957f992d016af962
|