Skip to main content

Analyze PDF and web forms and fill in the forms

Project description

formalyzer

Description:

Formalyzer will scrape the text from the PDF recc letter, and for each URL in url_list, it will:

  • launch a browser tab for that url
  • fill in the form using what the LLM has gleaned from the recc letter
  • attach the PDF via the form’s upload/attachment button

…and do no more.

The user will need to review the page and press the Submit button manually.

Requirements:

  • Either ollama installed locally or ANTHROPIC_API_KEY environment variable set
  • beautifulsoup4, playwright, claudette, lisette, pypdf, fastcore

Usage

On MacOS, startup the Chrome browser looking to port 9222 by executing this command in the terminal:

/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug

Then you can run this command:

formalyzer --debug <recc_info.txt> <recc_letter.pdf> <url_list.txt>

where recc_info.txt contains information about the recommender, their name, their title, their address, phone number and email. urls_list.txt is a file containing one URL per line.

Installation

Install latest from the GitHub repository:

$ pip install git+https://github.com/drscotthawley/formalyzer.git

or from pypi:

$ pip install formalyzer

After installing, users need to run playwright install chromium to download the browser binaries.

Demo

On MacOS, run these commands in Terminal:

  1. /Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug &
  2. cd example
  3. python -m http.server 8000 &
  4. export ANTHROPIC_API_KEY="__your_API_key_goes_here__"
  5. formalyzer --debug recc_info.txt sample_letter.pdf sample_urls.txt

Local LLM Execution

For FERPA compliance, running a local model is preferable so that student data is not broadcast elsewhere. I recommend using ollama and starting with something medium-small like qwen2.5:14b (9 GB). Start up ollama:

ollama serve & 
ollama pull qwen2.5:14b 

Then you can use the --model CLI flag, e.g. 

formalyzer --debug --model 'ollama/qwen2.5:14b' recc_info.txt sample_letter.pdf sample_urls.txt

The quality of the form-filling will vary depending on the quality and size of the model you get. Smaller models like mistral (4 GB) may hallucinate many of the form field IDs, resulting in a mostly-blank form in the end. For a huge (41 GB) model, try ollama/qwen2:72b.

Developer Guide

Install formalyzer in Development mode

# make sure formalyzer package is installed in development mode
$ pip install -e .

# make changes under nbs/ directory
# ...

# compile to have changes apply to formalyzer
$ nbdev_prepare

Documentation

Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.

TODO:

  • Test with a less-than-superlative recc letter – to make sure it’s not just always selecting the top rating(s).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

formalyzer-0.0.2.tar.gz (13.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

formalyzer-0.0.2-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file formalyzer-0.0.2.tar.gz.

File metadata

  • Download URL: formalyzer-0.0.2.tar.gz
  • Upload date:
  • Size: 13.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for formalyzer-0.0.2.tar.gz
Algorithm Hash digest
SHA256 5c8c6b3efec4d46c33aa7e487e221273689839871ce0e5754c097502c1f16326
MD5 6bee4f6f9828d604b4fc7a4f6fed5b49
BLAKE2b-256 83a98ec6918966f16eb932bae1d64a4c297666fabac708557a395d0dcca9c15f

See more details on using hashes here.

File details

Details for the file formalyzer-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: formalyzer-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for formalyzer-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 00ddedee39ddc07c01e9b8083bf5aa872726d44c787ff1f7155c88238180a856
MD5 67ac67afe3c63fad1ab53a43ef1c706b
BLAKE2b-256 fc3d197b72adaaad8d7bdcc9628e48fd091df03307225be4c66509b05f263416

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page