LLM-powered text linter

These details have not been verified by PyPI

Project links

Homepage

Project description

textaur

textaur takes an input document (plain text or pdf), extracts the text, prompts an LLM to lint/reformat/clean up, and saves the result.

It automagically turns this noisy OCR scan:

48

49
50
50a
ASL

51
AS2
52

REVISED 11/1/2017
-- 33-36.

David dives headfirst onto the small bed. He grabs the
pillow and hugs it tightly. A single tear runs down
his cheek.
DISSOLVE TO:
OMIT 43
OMIT 49
OMIT 50
OMIT SOA
EXT. HOTEL - MORNING ASL
David and Steve walk out of the hotel onto the sunny
street. David looks tired.
STEVE

It's not that big a city, David.

I'll bet there's an arcade at

every corner.
The boys look up the block then turn and look down
the other way.

STEVE
Let's try the next street.
cur To:

OMIT 51
OMIT AS2

into this properly formatted screenplay:

David dives headfirst onto the small bed. He grabs the pillow and hugs it tightly. A single tear runs down his cheek.

DISSOLVE TO:

EXT. HOTEL - MORNING

David and Steve walk out of the hotel onto the sunny street. David looks tired.

STEVE
It's not that big a city, David. I bet there's an arcade at every corner.

The boys look up the block then turn and look down the other way.

STEVE
Let's try the next street.

CUT TO:

The textaur pipeline was originally created to reformat messy text into proper (fountain) screenplay format but the same process (with a much simpler linting prompt) works for general text. Textaur also lets you use your own linting prompt if you have some special use case.

Installation

pip install textaur

OpenAI API Key

textaur requires an OpenAI API key to use the LLM to lint text. (Text extraction will work without it.) Set the key in your environment:

OPENAI_API_KEY="very secret key"

Optional Dependencies

If you want to use optical character recognition to extract text from pdfs, textaur requires binaries for tesseract and poppler to be installed on your system. These binaries are not included in textaur itself and you'll need to install them separately:

# macOS
brew install tesseract poppler

# Ubuntu/Debian
sudo apt install tesseract-ocr poppler-utils

# Windows
lol

OCR is noticeably slower than direct text extraction, but is usually necessary if you're starting with a scanned file. If you don't know whether you need to use OCR you can always try to extract the text without linting it using the --no-lint flag and then review that output manually to see if it's any good.

Usage

To extract and lint general text from a pdf or text file:

textaur ./path/to/my.pdf

This will extract the text and save it to ./path/to/my_extracted_text.txt and lint the text and save it to ./path/to/my_linted_text.txt. The extracted text will only be saved as a separate file if the input file is a pdf, not if it's plain text.

If it's a screenplay and you want to use OCR:

textaur ./path/to/my_scanned_screenplay.pdf --mode screenplay --ocr

Options

-m, --mode <text|t|screenplay|sp>: Type of input text (generic or screenplay). Generic by default/if omitted.
-o, --output <file>: Save linted output to this file instead of default
--extracted-text <file>: Save extracted unlinted text to this file instead of default
--ocr: Use optical character recognition to extract text if it's a PDF (false by default; textaur will try to simply pull out the text if the input is a PDF)
--no-lint: Extract and save text only, without AI linting
--prompt <file>: File to use as custom AI linting prompt

Additional Notes

Input files must exist and be readable.
Output and extracted text files will be created in the same directory as the input unless specified.

TODO

Support more LLMs
More prompts for other kinds of text input/formatting
Set of evals for existing and new prompts
Config to set:
- Default mode (text, screenplay, other text types)
- Preferred LLM once supported
- LLM API key

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.0

Aug 30, 2025

0.0.0

Aug 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

textaur-0.1.0.tar.gz (25.8 kB view details)

Uploaded Aug 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

textaur-0.1.0-py3-none-any.whl (23.4 kB view details)

Uploaded Aug 30, 2025 Python 3

File details

Details for the file textaur-0.1.0.tar.gz.

File metadata

Download URL: textaur-0.1.0.tar.gz
Upload date: Aug 30, 2025
Size: 25.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for textaur-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`0c3f2b8395b1f03ff6a07d430286559fe9d2b2f8414cf466fe36e76bfc5feac6`
MD5	`6cd201d3dc02106bc32cb8acaeecaf00`
BLAKE2b-256	`4c49920d2f8a9c873f3de2cd20b00b6af679b27b9d8d2cd45830542501dc8247`

See more details on using hashes here.

File details

Details for the file textaur-0.1.0-py3-none-any.whl.

File metadata

Download URL: textaur-0.1.0-py3-none-any.whl
Upload date: Aug 30, 2025
Size: 23.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.5

File hashes

Hashes for textaur-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0af4774bf705d2a8a392a78ac80bced59d82e57399a33a737b6dbcbb11674e40`
MD5	`51c0f31873a2a5fd988cba2886708e8c`
BLAKE2b-256	`81bc447ace15518c7a92ce4f952d12450496bcb4aa6a207b039f4f8d7ab27952`

See more details on using hashes here.

textaur 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

textaur

Installation

OpenAI API Key

Optional Dependencies

Usage

Options

Additional Notes

TODO

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes