Python client for the Kiwi Data API
Project description
hitchhikers 🐦
Don't panic. This is the guide.
The Kiwi Data Python reference implementation — a working example of how to integrate with the Kiwi Data document processing API.
What Kiwi Data does
Kiwi Data extracts structured data from unstructured documents. Contracts, purchase orders, leases, financial statements — the kind of PDFs that live in shared drives and quietly cause compliance problems.
AI extracts the fields. Validation layers check the AI's work. Human reviewers confirm what the machines aren't sure about.
"Looks right" and "is right" are different things, especially in procurement and compliance.
What this repo shows
- How to authenticate and initialize the client
- How to upload a document and link it to your own system's record ID
- How to poll document state through the processing pipeline
- How to retrieve extracted attributes and transform outputs
- How to handle retries, timeouts, and API errors correctly
- How to write typed, validated API interactions with Pydantic schemas
Prerequisites
- Python 3.11+
- A Kiwi Data API key — contact fred@kiwidata.com to get one
uvorpipfor package management
Quickstart
1. Install
pip install hitchhikers
Or with uv:
uv add hitchhikers
2. Initialize the client
from hitchhikers import KiwiClient
client = KiwiClient(api_key="your-api-key")
3. Upload a document
doc = client.upload_document(
file_path="contract.pdf",
doc_type="contract",
external_id="CRM-12345", # your internal record ID — recommended for idempotency
)
print(doc.document_id)
4. Poll until processing is complete
from hitchhikers import DocumentState
import time
while True:
detail = client.get_document(doc.document_id)
state = DocumentState(detail.docstate_name)
if state.is_done:
break
if state.is_error:
raise RuntimeError(f"Processing failed: {state}")
time.sleep(5)
5. Retrieve extracted attributes
attributes = client.get_document_attributes(doc.document_id)
for attr in attributes:
print(attr)
6. Retrieve transform output
Transform output shape varies by document type and your configured transform. Parse it against your own schema.
output = client.get_transform_output(doc.document_id)
Best practices
Use the context manager. It closes the underlying HTTP connection cleanly.
with KiwiClient(api_key="your-api-key") as client:
doc = client.upload_document("invoice.pdf", doc_type="invoice")
Always set external_id. It ties the document back to your system's record
Handle errors explicitly. The client raises typed exceptions — catch what you care about.
from hitchhikers import AuthenticationError, NotFoundError, APIError
try:
detail = client.get_document(document_id)
except NotFoundError:
# document doesn't exist
...
except AuthenticationError:
# bad or expired API key
...
except APIError as e:
# unexpected 4xx or 5xx
print(e.status_code, e)
Retries are on by default. KiwiClient retries on 5xx errors and network failures with exponential backoff (2 attempts by default). Set max_retries=0 to disable.
Attributes are empty until processing completes. get_document_attributes returns an empty list while the document is still in NEW, EXTRACTED, or TRANSFORMED state. Always check DocumentState.is_done first.
How it works
KiwiClient wraps the Kiwi Data REST API with typed request/response models via Pydantic and automatic retry logic via Tenacity. Documents move through a state machine: NEW → EXTRACTED → TRANSFORMED → PUBLISHED, with HUMAN_REVIEW as a gate before final publish when the AI's confidence is low. Extraction attributes and transform outputs are available once the document clears its processing stage.
Project structure
hitchhikers/
├── src/hitchhikers/
│ ├── client.py # KiwiClient — all API methods live here
│ ├── enums.py # DocumentState with is_done / is_error helpers
│ ├── exceptions.py # APIError, AuthenticationError, NotFoundError
│ └── schemas/
│ ├── v1.py # Upload and document schemas (codegen from OpenAPI)
│ └── v2.py # Detail, list, and attribute schemas (codegen from OpenAPI)
├── tests/ # Unit tests, mocked with respx
├── features/ # BDD feature specs (Gherkin)
└── pyproject.toml
Contributing
This is a reference implementation. PRs that fix real issues are welcome. Feature requests belong in a conversation with the Kiwi Data team.
Setup:
uv sync --extra dev
Run tests:
pytest
Lint and type-check:
black src tests
flake8 src tests
pyright src
Commit conventions. Use Conventional Commits:
| Prefix | When |
|---|---|
feat: |
New capability |
fix: |
Bug fix |
chore: |
Tooling, deps, CI |
docs: |
README, docstrings only |
refactor: |
No behavior change |
test: |
Tests only |
Keep commits atomic — one logical change per commit. Don't bundle unrelated fixes.
Branch naming: feat/short-description, fix/short-description.
License
MIT. So long, and thanks for all the fish.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hitchhikers-0.1.0.tar.gz.
File metadata
- Download URL: hitchhikers-0.1.0.tar.gz
- Upload date:
- Size: 21.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a324d513e737d7b79cdfec10a9a6f1fe121794556c95e6136dc52a9c279c6593
|
|
| MD5 |
c2974900b5314765ac288b16b47f94c2
|
|
| BLAKE2b-256 |
9e19fb87a0124ee043fe621aed48d41daab1372584517185d22a4b47a02c8cc1
|
Provenance
The following attestation bundles were made for hitchhikers-0.1.0.tar.gz:
Publisher:
publish.yml on KiwiData-AI/hitchhikers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hitchhikers-0.1.0.tar.gz -
Subject digest:
a324d513e737d7b79cdfec10a9a6f1fe121794556c95e6136dc52a9c279c6593 - Sigstore transparency entry: 1615420581
- Sigstore integration time:
-
Permalink:
KiwiData-AI/hitchhikers@ca66371993adc8c55517a8d92a0f2848b1827bbd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/KiwiData-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca66371993adc8c55517a8d92a0f2848b1827bbd -
Trigger Event:
release
-
Statement type:
File details
Details for the file hitchhikers-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hitchhikers-0.1.0-py3-none-any.whl
- Upload date:
- Size: 18.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b82b18ce6558c0c14f1e95f380a7e4125954bb8bcb91cde8a2a1731d90265c54
|
|
| MD5 |
b52584219e2d30ae1514544eef886f9e
|
|
| BLAKE2b-256 |
44b42d3acfbce85c18e1ea9db7f223daabe68c516f7e1ac25e259574f1ac7eda
|
Provenance
The following attestation bundles were made for hitchhikers-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on KiwiData-AI/hitchhikers
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
hitchhikers-0.1.0-py3-none-any.whl -
Subject digest:
b82b18ce6558c0c14f1e95f380a7e4125954bb8bcb91cde8a2a1731d90265c54 - Sigstore transparency entry: 1615420610
- Sigstore integration time:
-
Permalink:
KiwiData-AI/hitchhikers@ca66371993adc8c55517a8d92a0f2848b1827bbd -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/KiwiData-AI
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@ca66371993adc8c55517a8d92a0f2848b1827bbd -
Trigger Event:
release
-
Statement type: