Skip to main content

Extract geotechnical data from PDF reports and output DIGGS XML

Project description

Geotech Report Extraction

Extract geotechnical borehole data from PDF reports and output DIGGS 2.6 XML.

PyPI version License: MIT

Features

  • Parse borehole logs from geotechnical PDF reports (Langan, Schnabel, and generic formats)
  • Extract soil layers, SPT blow counts, groundwater levels, and lab test results
  • Optional vision-based extraction using Anthropic Claude or GPT-4o via Palantir Funhouse
  • Output DIGGS 2.6 XML for interoperability
  • Geospatial utilities for coordinate conversion and boring location mapping
  • Confidence scoring for extracted data quality

Installation

pip install geotech-report-extraction

Optional extras

# OCR support (Tesseract)
pip install geotech-report-extraction[ocr]

# Vision LLM extraction (Anthropic Claude)
pip install geotech-report-extraction[vision]

# Geospatial utilities (coordinate conversion, mapping)
pip install geotech-report-extraction[geo]

# Everything
pip install geotech-report-extraction[all]

Quick Start

from geotech_report_extraction import extract_report

# Basic text-based extraction
result = extract_report("report.pdf")

# Vision-based extraction with Anthropic Claude
result = extract_report("report.pdf", use_vision=True, vision_api_key="sk-...")

# Vision-based extraction with Palantir Funhouse
from geotech_report_extraction import FunhouseBackend
result = extract_report("report.pdf", llm_backend=FunhouseBackend(model="gpt-4.1"))

CLI

geotech-extract report.pdf -o output.xml

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geotech_report_extraction-0.2.0.tar.gz (153.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geotech_report_extraction-0.2.0-py3-none-any.whl (144.2 kB view details)

Uploaded Python 3

File details

Details for the file geotech_report_extraction-0.2.0.tar.gz.

File metadata

File hashes

Hashes for geotech_report_extraction-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1ed5cad3cae803dc98a0f7f9afa2af548e7c8cc3c8c04c144dbd4108856a7135
MD5 a5bc5d89b8b414fe656068ab0b9af1db
BLAKE2b-256 a55c5e8b21de50ddef66b4a9173d502e771f46333a56174e83f57ed4494adbb6

See more details on using hashes here.

File details

Details for the file geotech_report_extraction-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for geotech_report_extraction-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9cc5e5c44b3785246d1d51ef7bc14942c6aa542a0f0e43d2eda121efd8b6842b
MD5 9ddb1fbf4110d153b1a262099fa86fbe
BLAKE2b-256 0a48afb3b7a51fcc088df835584989a8945feeb2ed8275fcb0a575ec815c4669

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page