Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps/exclusions vendors bury. Every value cited to its page.

These details have not been verified by PyPI

Project links

Project description

📄 BidReader

Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps and exclusions vendors bury in the fine print.

Every line item carries its page, the exact source text it came from, and an arithmetic check (qty × unit_price == amount) — verification on top of extraction, not just an LLM guess.

"Manually typing numbers from a PDF into Excel because the formatting is a crime scene… hunting for the one line where a sub quietly excluded 'trash removal' in size-8 font." — r/Construction, 498 upvotes (source)

Most construction-AI effort chases autonomous takeoff. BidReader does something narrower and more concrete: it reads subcontractor quotes / estimates (PDF) and helps you level competing bids — surfacing the scope a sub quietly excluded — so you catch it during leveling, not after award.

It's an open-source bid-leveling assistant, not an autopilot: it proposes, cites its source, and you verify. MIT, pip install, runs on free LLMs (or fully local via Ollama), and callable from an AI agent over MCP.

Scope, honestly: built and tested on estimate-class docs (sub quotes, GC estimates, schedules of values). It's an assistant for a human estimator — line-item extraction can be incomplete (the tool flags when it is), and inferred scope-gaps are prompts to check, not contractual findings. Not built for multi-bidder DOT unit-price bid-tabs. See demo/REAL_EVIDENCE.md for honest real-document results.

Quickstart (copy-paste, ~30 seconds)

pip install bidreader

# Use any one — a FREE key works (see docs/FREE_MODELS.md):
export GEMINI_API_KEY=...        # free at aistudio.google.com
# or  export OPENROUTER_API_KEY=...   (has :free models)
# or  export REQUESTY_API_KEY=...

bidreader your_sub_quote.pdf

from bidreader import read

doc = read("sub_quote.pdf")
doc.line_items     # [{section, description, qty, unit, amount, page}, ...]
doc.exclusions     # [{item, quote, page, risk}, ...]   <- the buried stuff
doc.scope_gaps     # trade-standard scope NOT in the doc — confirm before bidding
doc.to_json()

Private mode — bids never leave your machine

Sub bids are confidential. Run BidReader fully offline against a local Ollama model — no document text is sent to any cloud LLM, no API key:

ollama pull llama3.1
export BID_MODEL=ollama/llama3.1
bidreader your_sub_quote.pdf        # 100% local

Full guide + on-prem/shared-host options: docs/LOCAL_MODELS.md.

Real output

On a real $324,240.61 drywall estimate (72 line items, scanned in seconds), BidReader's scope engine caught a genuinely expensive hole:

!!  SCOPE GAPS TO CONFIRM:
  - Finishing (taping, mudding, sanding) -- the gypsum line items price the BOARD
    only, not the finishing labor to reach a paint-ready surface.
  - Door hardware -- "Door W/ Frame" lines don't include hinges/locks/closers.
  - Firestopping at rated assemblies -- life-safety scope, commonly omitted.

On a real 25-page multi-trade GC estimate, it parsed 959 line items across 16 CSI divisions (demolition → concrete → steel → finishes → plumbing → fire suppression), each page-cited. See docs/RESULTS.md and a full worked example in examples/.

Scanned PDFs

Lots of real bids are scans with no text layer. BidReader auto-detects those and falls back to local Tesseract OCR — same structured output, still private:

pip install "bidreader[ocr]"           # + tesseract binary: brew install tesseract
bidreader scanned_quote.pdf            # auto-OCR; or force with --ocr always

Verified on an image-only quote: recovered all line items, total, and exclusions purely from the page image.

Bid leveling — compare subs side-by-side → Excel

The bid-day workflow: read every sub's quote and level them apples-to-apples.

pip install "bidreader[xlsx]"
bidreader level voltage_bros.pdf current_co.pdf sparky.pdf -o leveling.xlsx

It builds an Excel workbook (bidders as columns) with a scope/exclusion matrix that exposes the catch every estimator dreads — the apparent low bid that quietly carved out scope:

                  Voltage Bros   Current Co   Sparky
Bid total            $64,300      $108,890    $77,520
                     ◀ LOW
EXCLUSION MATRIX (filled = this bidder EXCLUDED it):
Fire alarm system    EXCL p1         —        EXCL p1
Temporary power      EXCL p1         —        EXCL p1
Permits                 —            —        EXCL p1

The "$64,300 low bid" excluded the fire alarm the $108,890 bid includes — not actually the cheapest. Plus per-bidder detail sheets with line items + arithmetic flags. (Try it: python examples/make_leveling_sample.py → examples/leveling_demo.xlsx.)

Use it from an AI agent (MCP)

pip install "bidreader[mcp]"

{ "mcpServers": { "bidreader": {
    "command": "bidreader-mcp",
    "env": { "GEMINI_API_KEY": "..." }
}}}

Tools: read_document, catch_exclusions, extract_line_items. Now your agent can answer "which subs excluded fire-stopping across this bid folder?" Full guide: docs/MCP.md.

How it works

PDF (sub-quote / bid package / spec / schedule)
  → page-tagged text extraction (PyMuPDF)
  → chunk by page  (scales to 25+ page, 900+ line-item estimates)
  → LLM structured extraction  (line items · exclusions · assumptions · alternates · scope gaps)
  → merge + page-cited output (JSON / CLI / MCP)

Text-based, so it runs great on free models — see docs/FREE_MODELS.md.

Evidence pack — see what it does on 14 messy bids

demo/EVIDENCE.md runs BidReader across 14 deliberately-messy synthetic bids (prose-buried exclusions, fine-print footnotes, two-column layouts, planted arithmetic errors, multi-page, scanned image-only docs) and reports honestly — wins and failures:

100% line-item recall · 97% exclusion-catch · 100% bid-total · 3/3 planted arithmetic errors caught · 2/2 scanned docs OCR'd
One honest miss documented: a low-DPI scan dropped 1 of 3 exclusions.
Two committed Excel leveling workbooks (electrical 4-sub, drywall 3-sub) showing the apparent-low-bid-that-carved-out-scope.

Reproduce: python demo/make_corpus.py && python demo/run_eval.py.

Benchmark

Reproducible ground-truth benchmark (benchmark/) — synthetic docs we author, so truth is exact and the PDFs ship in-repo:

metric	score
Line-item recall	100%
Exclusion-catch recall (incl. prose-buried)	100%
No-hallucination rate (clean docs)	100%
Bid-total accuracy (±2%)	100%
Arithmetic errors caught	2/2, 0 false positives

Honest caveat: synthetic docs are cleaner than real scans — these are an upper bound on well-structured input, not a claim about messy real bids. Uncontrolled real-document results are in docs/RESULTS.md. Reproduce: python benchmark/generate.py && python benchmark/run.py.

Why this, and why now — the evidence

A full write-up (problem, market data, prior-art gap, method, results) is in PAPER.md. The short version:

Loudest, most-shared pain in construction-estimating communities (the 498-upvote thread above; more cited in the paper).
It works today — document extraction is LLM-native, unlike floor-plan symbol detection (academic SOTA tops out ~83% mAP).
Empty slot — bidreader, blueprint-parser, pytakeoff were all unclaimed on PyPI; the only adjacent tools are AGPL/non-commercial or abandoned toys.
Concrete wedge — not "do everything," just the bid-leveling step on bid day. Whether that is genuinely useful is unproven — this open release exists to find out. Feedback from real estimators welcome.

Roadmap

Multi-quote leveling → Excel (compare subs side-by-side) — v0.6
Fully-local / private mode via Ollama — v0.7
Scanned-PDF OCR (local Tesseract) — v0.8
Source-grounded click-back review UI (data already carries source_text)
Revision/addendum diff ("what changed between Addendum 3 and 4")
CSI/UNIFORMAT mapping + UOM normalization for estimator-grade leveling
Region/trade notation packs (AISC, BS/IS, AUS)

Contributing

PRs welcome — see CONTRIBUTING.md. Good first issues: add a notation parser, a new export format, or a test fixture.

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.5

Jun 18, 2026

0.9.2

Jun 17, 2026

0.8.1

Jun 17, 2026

0.5.0

Jun 17, 2026

0.2.0

Jun 16, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bidreader-0.9.5.tar.gz (25.6 kB view details)

Uploaded Jun 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bidreader-0.9.5-py3-none-any.whl (21.2 kB view details)

Uploaded Jun 18, 2026 Python 3

File details

Details for the file bidreader-0.9.5.tar.gz.

File metadata

Download URL: bidreader-0.9.5.tar.gz
Upload date: Jun 18, 2026
Size: 25.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for bidreader-0.9.5.tar.gz
Algorithm	Hash digest
SHA256	`7b88ba515b7609d91a2ee52073fc337aed781977b4c7d255b8b043832d493787`
MD5	`8cc142bc84902f59cc43009672884341`
BLAKE2b-256	`584ca29db792bdda35f4520a3cdaa6145f4436a85d6dbd6fbfaae44941456d1c`

See more details on using hashes here.

File details

Details for the file bidreader-0.9.5-py3-none-any.whl.

File metadata

Download URL: bidreader-0.9.5-py3-none-any.whl
Upload date: Jun 18, 2026
Size: 21.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for bidreader-0.9.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ae1b3a4135542aabec45743a8549637e5107a57a18b6a8a1e1a00686d760bf6e`
MD5	`6c2be701aae55a585ee85a3dfee4120c`
BLAKE2b-256	`be1179f2b7806259895d8a7ec3e2b85192a6d0560e47616192d1f8a5eac72a28`

See more details on using hashes here.

bidreader 0.9.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

📄 BidReader

Read messy construction sub-quotes, bid packages & spec PDFs into clean structured data — and catch the scope gaps and exclusions vendors bury in the fine print.

Quickstart (copy-paste, ~30 seconds)

Private mode — bids never leave your machine

Real output

Scanned PDFs

Bid leveling — compare subs side-by-side → Excel

Use it from an AI agent (MCP)

How it works

Evidence pack — see what it does on 14 messy bids

Benchmark

Why this, and why now — the evidence

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes