Skip to main content

Analyze PDF and web forms and fill in the forms

Project description

formalyzer

Motivation

I am happy to write a recommendation letter “by hand” for a student. But then each graduate school has their own lengthy form by which they try to foist upon me their job of data entry and that is tedious, especially with many schools and several students. Hence, I’ve wanted to automate the form-filling-out for quite a while.

Description

Formalyzer will scrape the text from the PDF recc letter, and for each URL in url_list, it will:

  • launch a browser tab for that url
  • fill in the form using what the LLM has gleaned from the recc letter
  • attach the PDF via the form’s upload/attachment button

…and do no more.

The user will need to review the page and press the Submit button manually.

Requirements:

  • Either ollama installed locally or ANTHROPIC_API_KEY environment variable set
  • beautifulsoup4, playwright, claudette, lisette, pypdf, fastcore

Usage

On MacOS, startup the Chrome browser looking to port 9222 by executing this command in the terminal:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug

Then you can run this command:

formalyzer --debug <recc_info.txt> <recc_letter.pdf> <url_list.txt>

where recc_info.txt contains information about the recommender, their name, their title, their address, phone number and email. urls_list.txt is a file containing one URL per line.

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/drscotthawley/formalyzer.git

or from pypi:

$ pip install formalyzer

After installing, users need to run playwright install chromium to download the browser binaries.

Demo

On MacOS, run these commands in Terminal:

  1. /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug &
  2. cd example
  3. python -m http.server 8000 &
  4. export ANTHROPIC_API_KEY="__your_API_key_goes_here__"
  5. formalyzer --debug recc_info.txt sample_letter.pdf sample_urls.txt

Local LLM Execution

For FERPA compliance, running a local model is preferable so that student data is not broadcast elsewhere. I recommend using ollama and starting with something medium-small like qwen2.5:14b (9 GB). Start up ollama:

ollama serve & 
ollama pull qwen2.5:14b 

Then you can use the --model CLI flag, e.g. 

formalyzer --debug --model 'ollama/qwen2.5:14b' recc_info.txt sample_letter.pdf sample_urls.txt

The quality of the form-filling will vary depending on the quality and size of the model you get. Smaller models like mistral (4 GB) may hallucinate many of the form field IDs, resulting in a mostly-blank form in the end. For a huge (41 GB) model, try ollama/qwen2:72b.

Developer Guide

Install formalyzer in Development mode

# make sure formalyzer package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to formalyzer
$ nbdev_prepare

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

Limitations

Sometimes the LLM will miss certain fields – that’s just the nature of the game – so you’ll still need to fill those in by hand. But it gets most of them!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

formalyzer-0.0.3.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

formalyzer-0.0.3-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file formalyzer-0.0.3.tar.gz.

File metadata

  • Download URL: formalyzer-0.0.3.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for formalyzer-0.0.3.tar.gz
Algorithm Hash digest
SHA256 41495f7b962cba676f4c0b72966bd2e1db535b21f9cef300f2f1a2dca45b79bc
MD5 7c5ecaf4e952e11d747b58ba43c2f864
BLAKE2b-256 896f56255048c09c6c147094830ccd8fd0495056bf1f74c3d692dea769183847

See more details on using hashes here.

File details

Details for the file formalyzer-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: formalyzer-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for formalyzer-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c0a03bbe58ea93f9629ca909ea9149761d0a7206af30443b8b5aab8936edb2e7
MD5 72704cfd148cb939b10ca84e8788aefc
BLAKE2b-256 4b7800d4ecf342be5a0ebb1b2c78f6ed609560e516671e174751c822cd825cfc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page