Skip to main content

Python parser for GAEB DA XML construction data exchange files, with LLM-powered item classification

Project description

pyGAEB

Python parser for GAEB DA XML construction data exchange files, with LLM-powered item classification.

Python 3.9+ License: MIT

pyGAEB parses, validates, classifies, and writes GAEB DA XML files (versions 2.0 through 3.3), producing a unified Pydantic v2 domain model from all inputs. An optional LLM classification layer enriches each item with a semantic construction element type via LiteLLM (100+ providers).

Installation

# Core parser + writer + export (zero LLM dependencies)
pip install pyGAEB

# With LLM classification (supports 100+ providers via LiteLLM)
pip install pyGAEB[llm]

Quick Start

Parse any GAEB file

from pygaeb import GAEBParser

doc = GAEBParser.parse("tender.X83")    # DA XML 3.x
doc = GAEBParser.parse("old.D83")       # DA XML 2.x — same call

print(doc.source_version)               # SourceVersion.DA_XML_33
print(doc.exchange_phase)               # ExchangePhase.X83
print(doc.grand_total)                  # Decimal("1234567.89")

Iterate items

for item in doc.award.boq.iter_items():
    print(item.oz)              # "01.02.0030"
    print(item.short_text)      # "Mauerwerk der Innenwand…"
    print(item.qty)             # Decimal("1170.000")
    print(item.unit)            # "m2"
    print(item.unit_price)      # Decimal("45.50")
    print(item.total_price)     # Decimal("53235.00")
    print(item.item_type)       # ItemType.NORMAL

Validation

from pygaeb import GAEBParser, ValidationMode

# Lenient (default) — collect warnings, keep parsing
doc = GAEBParser.parse("tender.X83")
for issue in doc.validation_results:
    print(issue.severity, issue.message)

# Strict — raise on first ERROR
doc = GAEBParser.parse("tender.X83", validation=ValidationMode.STRICT)

Write / Round-trip

from pygaeb import GAEBWriter, ExchangePhase
from decimal import Decimal

doc = GAEBParser.parse("tender.X83")
item = doc.award.boq.get_item("01.02.0030")
item.unit_price = Decimal("48.00")

GAEBWriter.write(doc, "bid.X84", phase=ExchangePhase.X84)

Export to JSON / CSV

from pygaeb.convert import to_json, to_csv

to_json(doc, "boq.json")     # full nested BoQ tree
to_csv(doc, "items.csv")     # flat item table with classification columns

LLM Classification

from pygaeb import LLMClassifier

# Default: in-memory cache (no disk I/O, session-scoped)
classifier = LLMClassifier(model="anthropic/claude-sonnet-4-6")
# classifier = LLMClassifier(model="gpt-4o")
# classifier = LLMClassifier(model="ollama/llama3")  # local, free, private

# Opt-in: persistent SQLite cache (survives across runs)
from pygaeb import SQLiteCache
classifier = LLMClassifier(model="anthropic/claude-sonnet-4-6", cache=SQLiteCache("~/.pygaeb/cache"))

# Check cost before running
estimate = await classifier.estimate_cost(doc)
print(f"Will classify {estimate.items_to_classify} items for ~${estimate.estimated_cost_usd:.2f}")

# Classify all items
await classifier.enrich(doc)

# Or synchronous
classifier.enrich_sync(doc)

for item in doc.award.boq.iter_items():
    if item.classification:
        print(item.oz, item.classification.element_type, item.classification.confidence)

Structured Extraction — Custom Schemas

After classification, extract typed attributes into your own Pydantic schema:

from pydantic import BaseModel, Field
from typing import Optional
from pygaeb import StructuredExtractor

class DoorSpec(BaseModel):
    door_type: str = Field("", description="single, double, sliding")
    width_mm: Optional[int] = Field(None, description="Width in mm")
    fire_rating: Optional[str] = Field(None, description="T30, T60, T90")
    glazing: bool = Field(False, description="Has glass panels")
    material: str = Field("", description="wood, steel, aluminium")

extractor = StructuredExtractor(model="anthropic/claude-sonnet-4-6")

# Extract from all items classified as "Door"
doors = await extractor.extract(doc, schema=DoorSpec, element_type="Door")
for item, spec in doors:
    print(item.oz, spec.door_type, spec.fire_rating, spec.width_mm)

# Filter by trade (broad) or sub_type (narrow)
pipes = await extractor.extract(doc, schema=PipeSpec, trade="MEP-Plumbing")
fire_doors = await extractor.extract(doc, schema=DoorSpec, sub_type="Fire Door")

# Or synchronous
doors = extractor.extract_sync(doc, schema=DoorSpec, element_type="Door")

Built-in starter schemas: DoorSpec, WindowSpec, WallSpec, PipeSpec — or define your own.

Custom Cache Backend

from pygaeb import CacheBackend, InMemoryCache, SQLiteCache

# Default: in-memory (no disk, session-scoped)
classifier = LLMClassifier()

# Persistent: SQLite
classifier = LLMClassifier(cache=SQLiteCache("~/.pygaeb/cache"))

# Share one backend between classifier and extractor
shared = SQLiteCache("/tmp/project-cache")
classifier = LLMClassifier(cache=shared)
extractor = StructuredExtractor(cache=shared)

# Bring your own: implement CacheBackend protocol
class RedisCache:
    def get(self, key: str) -> str | None: ...
    def put(self, key: str, value: str) -> None: ...
    def delete(self, key: str) -> None: ...
    def keys(self) -> list[str]: ...
    def clear(self) -> None: ...
    def close(self) -> None: ...

classifier = LLMClassifier(cache=RedisCache())

Cross-Phase Validation

from pygaeb import GAEBParser, CrossPhaseValidator

tender = GAEBParser.parse("tender.X83")
bid = GAEBParser.parse("bid.X84")

issues = CrossPhaseValidator.check(source=tender, response=bid)
for issue in issues:
    print(issue.severity, issue.message)

Supported Versions

Version Parser Track Status
DA XML 2.0 Track A (German elements) ✅ v1.0
DA XML 2.1 Track A (German elements) ✅ v1.0
DA XML 3.0 Track B (English elements) ✅ v1.0
DA XML 3.1 Track B (English elements) ✅ v1.0
DA XML 3.2 Track B (English elements) ✅ v1.0
DA XML 3.3 Track B (English elements) ✅ v1.0
GAEB 90 Track C (fixed-width) 🔜 v1.1

Configuration

# Environment variables
export PYGAEB_DEFAULT_MODEL=ollama/llama3
export PYGAEB_XSD_DIR=/opt/gaeb-schemas

# Or programmatic
from pygaeb import PyGAEBSettings
settings = PyGAEBSettings(default_model="gpt-4o", classifier_concurrency=10)

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygaeb-1.0.0.tar.gz (70.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pygaeb-1.0.0-py3-none-any.whl (64.5 kB view details)

Uploaded Python 3

File details

Details for the file pygaeb-1.0.0.tar.gz.

File metadata

  • Download URL: pygaeb-1.0.0.tar.gz
  • Upload date:
  • Size: 70.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pygaeb-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8a3321c41411b9280d78d6572abb32bc145cb111c68a8521cd875b5b0d66fcb6
MD5 3a7e015c55b5bfe92b6a8372afb02674
BLAKE2b-256 775eed8f245c07de2e8c3378a73d41a990633c66fd0548fee2d8f050ccf57007

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygaeb-1.0.0.tar.gz:

Publisher: publish.yml on frameIQ/pygaeb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pygaeb-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: pygaeb-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 64.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pygaeb-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ccdecc9b9b68f4cbab0a7a52ee8a8dfb5d7944b1efee65d20d9eb09f6ccaca2
MD5 49f3bf3bdacf18b2795cf9443d9c6c3d
BLAKE2b-256 b66d5a0d7ac9381975582300fc486479ff2b68465d5c8b7a53c12ef83cc4f09d

See more details on using hashes here.

Provenance

The following attestation bundles were made for pygaeb-1.0.0-py3-none-any.whl:

Publisher: publish.yml on frameIQ/pygaeb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page