Skip to main content

Python client for the Kiwi Data API

Project description

hitchhikers 🐦

Don't panic. This is the guide.

The Kiwi Data Python reference implementation — a working example of how to integrate with the Kiwi Data document processing API.


What Kiwi Data does

Kiwi Data extracts structured data from unstructured documents. Contracts, purchase orders, leases, financial statements — the kind of PDFs that live in shared drives and quietly cause compliance problems.

AI extracts the fields. Validation layers check the AI's work. Human reviewers confirm what the machines aren't sure about.

"Looks right" and "is right" are different things, especially in procurement and compliance.


What this repo shows

  • How to authenticate and initialize the client
  • How to upload a document and link it to your own system's record ID
  • How to poll document state through the processing pipeline
  • How to retrieve extracted attributes and transform outputs
  • How to handle retries, timeouts, and API errors correctly
  • How to write typed, validated API interactions with Pydantic schemas

Prerequisites

  • Python 3.11+
  • A Kiwi Data API key — contact fred@kiwidata.com to get one
  • uv or pip for package management

Quickstart

1. Install

pip install hitchhikers

Or with uv:

uv add hitchhikers

2. Initialize the client

from hitchhikers import KiwiClient

client = KiwiClient(api_key="your-api-key")

3. Upload a document

doc = client.upload_document(
    file_path="contract.pdf",
    doc_type="contract",
    external_id="CRM-12345",  # your internal record ID — recommended for idempotency
)
print(doc.document_id)

4. Poll until processing is complete

from hitchhikers import DocumentState
import time

while True:
    detail = client.get_document(doc.document_id)
    state = DocumentState(detail.docstate_name)
    if state.is_done:
        break
    if state.is_error:
        raise RuntimeError(f"Processing failed: {state}")
    time.sleep(5)

5. Retrieve extracted attributes

attributes = client.get_document_attributes(doc.document_id)
for attr in attributes:
    print(attr)

6. Retrieve transform output

Transform output shape varies by document type and your configured transform. Parse it against your own schema.

output = client.get_transform_output(doc.document_id)

Best practices

Use the context manager. It closes the underlying HTTP connection cleanly.

with KiwiClient(api_key="your-api-key") as client:
    doc = client.upload_document("invoice.pdf", doc_type="invoice")

Always set external_id. It ties the document back to your system's record

Handle errors explicitly. The client raises typed exceptions — catch what you care about.

from hitchhikers import AuthenticationError, NotFoundError, APIError

try:
    detail = client.get_document(document_id)
except NotFoundError:
    # document doesn't exist
    ...
except AuthenticationError:
    # bad or expired API key
    ...
except APIError as e:
    # unexpected 4xx or 5xx
    print(e.status_code, e)

Retries are on by default. KiwiClient retries on 5xx errors and network failures with exponential backoff (2 attempts by default). Set max_retries=0 to disable.

Attributes are empty until processing completes. get_document_attributes returns an empty list while the document is still in NEW, EXTRACTED, or TRANSFORMED state. Always check DocumentState.is_done first.


How it works

KiwiClient wraps the Kiwi Data REST API with typed request/response models via Pydantic and automatic retry logic via Tenacity. Documents move through a state machine: NEWEXTRACTEDTRANSFORMEDPUBLISHED, with HUMAN_REVIEW as a gate before final publish when the AI's confidence is low. Extraction attributes and transform outputs are available once the document clears its processing stage.


Project structure

hitchhikers/
├── src/hitchhikers/
│   ├── client.py          # KiwiClient — all API methods live here
│   ├── enums.py           # DocumentState with is_done / is_error helpers
│   ├── exceptions.py      # APIError, AuthenticationError, NotFoundError
│   └── schemas/
│       ├── v1.py          # Upload and document schemas (codegen from OpenAPI)
│       └── v2.py          # Detail, list, and attribute schemas (codegen from OpenAPI)
├── tests/                 # Unit tests, mocked with respx
├── features/              # BDD feature specs (Gherkin)
└── pyproject.toml

Contributing

This is a reference implementation. PRs that fix real issues are welcome. Feature requests belong in a conversation with the Kiwi Data team.

Setup:

uv sync --extra dev

Run tests:

pytest

Lint and type-check:

black src tests
flake8 src tests
pyright src

Commit conventions. Use Conventional Commits:

Prefix When
feat: New capability
fix: Bug fix
chore: Tooling, deps, CI
docs: README, docstrings only
refactor: No behavior change
test: Tests only

Keep commits atomic — one logical change per commit. Don't bundle unrelated fixes.

Branch naming: feat/short-description, fix/short-description.


License

MIT. So long, and thanks for all the fish.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hitchhikers-0.1.0.tar.gz (21.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hitchhikers-0.1.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file hitchhikers-0.1.0.tar.gz.

File metadata

  • Download URL: hitchhikers-0.1.0.tar.gz
  • Upload date:
  • Size: 21.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hitchhikers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a324d513e737d7b79cdfec10a9a6f1fe121794556c95e6136dc52a9c279c6593
MD5 c2974900b5314765ac288b16b47f94c2
BLAKE2b-256 9e19fb87a0124ee043fe621aed48d41daab1372584517185d22a4b47a02c8cc1

See more details on using hashes here.

Provenance

The following attestation bundles were made for hitchhikers-0.1.0.tar.gz:

Publisher: publish.yml on KiwiData-AI/hitchhikers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hitchhikers-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hitchhikers-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hitchhikers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b82b18ce6558c0c14f1e95f380a7e4125954bb8bcb91cde8a2a1731d90265c54
MD5 b52584219e2d30ae1514544eef886f9e
BLAKE2b-256 44b42d3acfbce85c18e1ea9db7f223daabe68c516f7e1ac25e259574f1ac7eda

See more details on using hashes here.

Provenance

The following attestation bundles were made for hitchhikers-0.1.0-py3-none-any.whl:

Publisher: publish.yml on KiwiData-AI/hitchhikers

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page