Extract recipient address from AWB/shipping label PDF using Claude AI
Project description
AWB Extractor
Python SDK for extracting receiver and shipment information from AWB/shipping label PDF files using Claude AI.
Features
- Extract from PDF bytes, local PDF files, or PDF URLs
- Batch extraction from multiple URLs
- Optional default HTTP headers for protected AWB URLs
- Typed
AWBResultdataclass output - Custom exceptions for API key, PDF download, and JSON parsing failures
Requirements
- Python 3.9+
- Anthropic API key
Installation
Install from PyPI:
pip install awb-extractor
For local development:
pip install -e ".[dev]"
Usage
from awb_extractor import AWBExtractor
extractor = AWBExtractor(api_key="sk-ant-...")
result = extractor.from_file("label.pdf")
print(result.recipient_name)
print(result.to_dict())
Example result:
{
"tracking_number": "NHSVC972103440",
"recipient_name": "Nguyen Van A",
"recipient_phone": "(+84)03******37",
"recipient_address": "237 Nguyen Trai",
"recipient_ward": "Phuong Ben Thanh",
"recipient_district": "Quan 1",
"recipient_province": "TP. Ho Chi Minh",
"sender_name": "Onflow",
"sender_address": "TP. Ho Chi Minh",
"cod": "0",
"weight": "0.700 KG",
"order_id": "584425059595159079",
}
Supported Inputs
PDF bytes
from awb_extractor import AWBExtractor
extractor = AWBExtractor(api_key="sk-ant-...")
with open("label.pdf", "rb") as file:
result = extractor.from_bytes(file.read())
Local PDF file
from awb_extractor import AWBExtractor
extractor = AWBExtractor(api_key="sk-ant-...")
result = extractor.from_file("label.pdf")
PDF URL
from awb_extractor import AWBExtractor
extractor = AWBExtractor(
api_key="sk-ant-...",
http_headers={"Authorization": "Bearer token"},
)
result = extractor.from_url("https://example.com/awb.pdf")
You can pass request-specific headers with extra_headers:
result = extractor.from_url(
"https://example.com/awb.pdf",
extra_headers={"X-Request-ID": "request-123"},
)
Multiple URLs
from_urls() returns a list of dictionaries with url, data, and error.
Failed URLs do not stop the whole batch.
from awb_extractor import AWBExtractor
extractor = AWBExtractor(api_key="sk-ant-...")
results = extractor.from_urls([
"https://example.com/good.pdf",
"https://example.com/bad.pdf",
])
Result Fields
AWBResult includes:
tracking_numberrecipient_namerecipient_phonerecipient_addressrecipient_wardrecipient_districtrecipient_provincesender_namesender_addresscodweightorder_id
Use to_dict() or to_json() to serialize the result.
Exceptions
APIKeyError: missing API keyPDFDownloadError: PDF URL download failedExtractionError: Claude response could not be parsed as JSON
Package Structure
awb_extractor/extractor.py: publicAWBExtractorclassawb_extractor/models.py:AWBResultdataclassawb_extractor/exceptions.py: package exceptions
Publishing
GitHub Actions builds and publishes the package to PyPI on every push to main.
The repository must define this GitHub secret:
PYPI_API_TOKEN
PyPI does not allow replacing an existing version. If a commit on main does not
bump project.version in pyproject.toml, the publish step skips the existing
distribution.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file awb_extractor-0.1.3.tar.gz.
File metadata
- Download URL: awb_extractor-0.1.3.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bccbe3f81e356e8317e7f5027fc4fa26ebad24f4b5040e193f442721c41a88b
|
|
| MD5 |
f0d6cd5fee527a3fdfb07480c8f97acc
|
|
| BLAKE2b-256 |
df617256d950c1f1efdc2e8ff8f1f99e6dc151d65b625a5fd799ec389c4f5613
|
File details
Details for the file awb_extractor-0.1.3-py3-none-any.whl.
File metadata
- Download URL: awb_extractor-0.1.3-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
60c4d48a1a98ce2e2fdcc75c7f297027fb04645b03a5ce1f5d25e1bbe0080b3d
|
|
| MD5 |
00e2a290513fd32ed6f014a61664ac4a
|
|
| BLAKE2b-256 |
a792ffd2f99e1b54f8185bedee335a9f4d5e39fee79b22e0e45510db3d169939
|