Skip to main content

Automatically convert a PDF into a fillable form

Project description

CommonForms

🪄 Automatically convert a PDF into a fillable form.

💻 Hosted Models (detect.semanticdocs.org) | 📄 CommonForms Paper | 🤗 Dataset | 🤗 FFDNet-L | 🤗 FFDNet-S

Pipeline

This repo contains three things:

  1. the pip-installable commonforms package, which has a CLI and API for converting PDFs into fillable forms
  2. the FFDNet-S and FFDNet-L models from the paper CommonForms: A Large, Diverse Dataset for Form Field Detection
  3. the preprocessing code for the CommonForms dataset, which is hosted on HuggingFace: https://huggingface.co/datasets/jbarrow/CommonForms

Installation

CommonForms can be installed with either uv or pip, feel free to choose your package manager flavor:

uv pip install commonforms

Once it's installed, you should be able to run the CLI command on ~any PDF.

CommonForms CLI

The simplest usage will run inference on your CPU using the default suggested settings:

commonforms <input.pdf> <output.pdf>
Input Output
Input PDF Output PDF

Command Line Arguments

Argument Type Default Description
input Path Required Path to the input PDF file
output Path Required Path to save the output PDF file
--model str FFDNet-L Model name (FFDNet-L/FFDNet-S) or path to custom .pt file
--keep-existing-fields flag False Keep existing form fields in the PDF
--use-signature-fields flag False Use signature fields instead of text fields for detected signatures
--device str cpu Device for inference (e.g., cpu, cuda, 0)
--image-size int 1600 Image size for inference
--confidence float 0.3 Confidence threshold for detection
--fast flag False If running on a CPU, you can trade off accuracy for speed and run in about half the time
--multiline flag False If you want the detected textboxes to allow multiline inputs

CommonForms API

In addition to the CLI, you can use

from commonforms import prepare_form

prepare_form(
    "path/to/input.pdf",
    "path/to/output.pdf"
)

All of the above arguments are keyword arguments to the prepare_form function.

Dataset Prep

🚧 Code for dataset prep exists in the dataset folder.

Citation

If you use the tool, models, or code in an academic paper, please cite the CommonForms paper:

@misc{barrow2025commonforms,
  title        = {CommonForms: A Large, Diverse Dataset for Form Field Detection},
  author       = {Barrow, Joe},
  year         = {2025},
  eprint       = {2509.16506},
  archivePrefix= {arXiv},
  primaryClass = {cs.CV},
  doi          = {10.48550/arXiv.2509.16506},
  url          = {https://arxiv.org/abs/2509.16506}
}

If you use it in a non-academic setting, please reach out to the author (joseph.d.barrow [at] gmail.com)! I love to hear when people are using my work!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

commonforms-0.2.0.tar.gz (9.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

commonforms-0.2.0-py3-none-any.whl (9.7 kB view details)

Uploaded Python 3

File details

Details for the file commonforms-0.2.0.tar.gz.

File metadata

  • Download URL: commonforms-0.2.0.tar.gz
  • Upload date:
  • Size: 9.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.16

File hashes

Hashes for commonforms-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c75c3679aa0d40d72cab0f57a513eb1234824a0a8fd96ef1a012454136878ff1
MD5 439964eb091f610fcd11db909d1bd6c4
BLAKE2b-256 ebb58f67308f50fa9ab7f9871998d05103e9d9f60d1746ccf1213f4163afa0fe

See more details on using hashes here.

File details

Details for the file commonforms-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for commonforms-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 2397a2d6954483b9b96f057a652da24e953a527463df3c8fa2794b74fc569f19
MD5 8d9285877b12b0b0500a9dfe80d1518d
BLAKE2b-256 5e89999e784ceb951c97663bd9e25e62a11d473f4839c3a83aadd998c92937d8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page