Analyze PDF and web forms and fill in the forms
Project description
formalyzer
Description:
Formalyzer will scrape the text from the PDF recc letter, and for each URL in url_list, it will:
- launch a browser tab for that url
- fill in the form using what the LLM has gleaned from the recc letter
- attach the PDF via the form’s upload/attachment button
…and do no more.
The user will need to review the page and press the Submit button manually.
Requirements:
- Either
ollamainstalled locally orANTHROPIC_API_KEYenvironment variable set beautifulsoup4, playwright, claudette, lisette, pypdf, fastcore
Usage
On MacOS, startup the Chrome browser looking to port 9222 by executing this command in the terminal:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug
Then you can run this command:
formalyzer --debug <recc_info.txt> <recc_letter.pdf> <url_list.txt>
where recc_info.txt contains information about the recommender, their
name, their title, their address, phone number and email.
urls_list.txt is a file containing one URL per line.
Installation
Install latest from the GitHub repository:
$ pip install git+https://github.com/drscotthawley/formalyzer.git
or from conda
$ conda install -c drscotthawley formalyzer
or from pypi
$ pip install formalyzer
After installing, users need to run playwright install chromium to
download the browser binaries.
Demo
Using example/ data. On MacOS, from the main formalyzer package
directory:
- Start up Chrome:
/Applications/Google\ Chrome.app/Contents/MacOS/Google\ Chrome --remote-debugging-port=9222 --user-data-dir=/tmp/chrome-debug - Launch a local web server:
python -m http.server 8000 --directory example/ - Set your
ANTHROPIC_API_KEYshell environment variable. - Run the script:
formalyzer --debug example/recc_info.txt example/sample_letter.pdf example/sample_urls.txt
Local LLM Execution
For FERPA compliance, running a local model is preferable so that
student data is not broadcast elsewhere. I recommend using ollama and
starting with something medium-small like qwen2.5:14b (9 GB). Start up
ollama:
ollama serve &
ollama pull qwen2.5:14b
Then you can use the --model CLI flag, e.g.
formalyzer --debug -model 'ollama/qwen2.5:14b' example/recc_info.txt example/sample_letter.pdf example/sample_urls.txt
The quality of the form-filling will vary depending on the quality and
size of the model you get. Smaller models like mistral (4 GB) may
hallucinate many of the form field IDs, resulting in a mostly-blank form
in the end. For a huge (41 GB) model, try ollama/qwen2:72b.
Developer Guide
Install formalyzer in Development mode
# make sure formalyzer package is installed in development mode
$ pip install -e .
# make changes under nbs/ directory
# ...
# compile to have changes apply to formalyzer
$ nbdev_prepare
Documentation
Documentation can be found hosted on this GitHub repository’s pages. Additionally you can find package manager specific guidelines on conda and pypi respectively.
TODO:
- Test with a less-than-superlative recc letter – to make sure it’s not just always selecting the top rating(s).
- Enable switching from Anthropic API to local LLM and/or CoPilot API (if possible)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file formalyzer-0.0.1.tar.gz.
File metadata
- Download URL: formalyzer-0.0.1.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc1292bb1337a51709a09ce4bb040d3482242253bcc5d6d22b02de85498ce06c
|
|
| MD5 |
d7128ebb7db60a5fcef534c82010877d
|
|
| BLAKE2b-256 |
5c7c4f920dabb798d18d09aa399649793e611eb7e9dce10ce287cb66018a2338
|
File details
Details for the file formalyzer-0.0.1-py3-none-any.whl.
File metadata
- Download URL: formalyzer-0.0.1-py3-none-any.whl
- Upload date:
- Size: 12.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49d5c3eacfbc62e12a10eac0e81bd7179c91e8168b9706b5c25e7ae3cefcd6a2
|
|
| MD5 |
4c6b3b4841c2c1fbe0a280c6c126f3fe
|
|
| BLAKE2b-256 |
b6d6d59a29034af5f9b1d2e1da0df33e559035ad7522239cc3c91ea4573dc988
|