Skip to main content

Extract recipient address from AWB/shipping label PDF using Claude AI

Project description

AWB Extractor

Python SDK for extracting receiver and shipment information from AWB/shipping label PDF files using Claude AI.

Features

  • Extract from PDF bytes, local PDF files, or PDF URLs
  • Batch extraction from multiple URLs
  • Optional default HTTP headers for protected AWB URLs
  • Typed AWBResult dataclass output
  • Custom exceptions for API key, PDF download, and JSON parsing failures

Requirements

  • Python 3.9+
  • Anthropic API key

Installation

Install from PyPI:

pip install awb-extractor

For local development:

pip install -e ".[dev]"

Usage

from awb_extractor import AWBExtractor

extractor = AWBExtractor(api_key="sk-ant-...")
result = extractor.from_file("label.pdf")

print(result.recipient_name)
print(result.to_dict())

Example result:

{
    "tracking_number": "NHSVC972103440",
    "recipient_name": "Nguyen Van A",
    "recipient_phone": "(+84)03******37",
    "recipient_address": "237 Nguyen Trai",
    "recipient_ward": "Phuong Ben Thanh",
    "recipient_district": "Quan 1",
    "recipient_province": "TP. Ho Chi Minh",
    "sender_name": "Onflow",
    "sender_address": "TP. Ho Chi Minh",
    "cod": "0",
    "weight": "0.700 KG",
    "order_id": "584425059595159079",
}

Supported Inputs

PDF bytes

from awb_extractor import AWBExtractor

extractor = AWBExtractor(api_key="sk-ant-...")

with open("label.pdf", "rb") as file:
    result = extractor.from_bytes(file.read())

Local PDF file

from awb_extractor import AWBExtractor

extractor = AWBExtractor(api_key="sk-ant-...")
result = extractor.from_file("label.pdf")

PDF URL

from awb_extractor import AWBExtractor

extractor = AWBExtractor(
    api_key="sk-ant-...",
    http_headers={"Authorization": "Bearer token"},
)

result = extractor.from_url("https://example.com/awb.pdf")

You can pass request-specific headers with extra_headers:

result = extractor.from_url(
    "https://example.com/awb.pdf",
    extra_headers={"X-Request-ID": "request-123"},
)

Multiple URLs

from_urls() returns a list of dictionaries with url, data, and error. Failed URLs do not stop the whole batch.

from awb_extractor import AWBExtractor

extractor = AWBExtractor(api_key="sk-ant-...")
results = extractor.from_urls([
    "https://example.com/good.pdf",
    "https://example.com/bad.pdf",
])

Result Fields

AWBResult includes:

  • tracking_number
  • recipient_name
  • recipient_phone
  • recipient_address
  • recipient_ward
  • recipient_district
  • recipient_province
  • sender_name
  • sender_address
  • cod
  • weight
  • order_id

Use to_dict() or to_json() to serialize the result.

Exceptions

  • APIKeyError: missing API key
  • PDFDownloadError: PDF URL download failed
  • ExtractionError: Claude response could not be parsed as JSON

Package Structure

  • awb_extractor/extractor.py: public AWBExtractor class
  • awb_extractor/models.py: AWBResult dataclass
  • awb_extractor/exceptions.py: package exceptions

Publishing

GitHub Actions builds and publishes the package to PyPI on every push to main.

The repository must define this GitHub secret:

PYPI_API_TOKEN

PyPI does not allow replacing an existing version. If a commit on main does not bump project.version in pyproject.toml, the publish step skips the existing distribution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awb_extractor-0.1.3.tar.gz (6.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

awb_extractor-0.1.3-py3-none-any.whl (5.7 kB view details)

Uploaded Python 3

File details

Details for the file awb_extractor-0.1.3.tar.gz.

File metadata

  • Download URL: awb_extractor-0.1.3.tar.gz
  • Upload date:
  • Size: 6.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for awb_extractor-0.1.3.tar.gz
Algorithm Hash digest
SHA256 9bccbe3f81e356e8317e7f5027fc4fa26ebad24f4b5040e193f442721c41a88b
MD5 f0d6cd5fee527a3fdfb07480c8f97acc
BLAKE2b-256 df617256d950c1f1efdc2e8ff8f1f99e6dc151d65b625a5fd799ec389c4f5613

See more details on using hashes here.

File details

Details for the file awb_extractor-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: awb_extractor-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 5.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for awb_extractor-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 60c4d48a1a98ce2e2fdcc75c7f297027fb04645b03a5ce1f5d25e1bbe0080b3d
MD5 00e2a290513fd32ed6f014a61664ac4a
BLAKE2b-256 a792ffd2f99e1b54f8185bedee335a9f4d5e39fee79b22e0e45510db3d169939

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page