Python client for the Dewey API

Project description

dewey

Python client for the Dewey API. No third-party dependencies — uses only the Python standard library. See the full API reference for details on all endpoints and types.

Installation

pip install dewey

Quick start

from dewey import DeweyClient

client = DeweyClient(api_key="dwy_live_...")

# Create a collection
col = client.collections.create("My Docs")

# Upload a document
from pathlib import Path
doc = client.documents.upload(col.id, Path("report.pdf"))

# Query
results = client.retrieval.query(col.id, "What is the refund policy?")
for r in results:
    print(r.score, r.chunk.content[:100])

# Research (SSE streaming)
for event in client.research.stream(col.id, "Summarise key findings"):
    if event.type == "chunk":
        print(event.content, end="", flush=True)
    elif event.type == "done":
        print("\nSources:", event.sources)

Constructor

DeweyClient(api_key: str, base_url: str = "https://api.meetdewey.com/v1")

Resources

`client.collections`

Method	Description
`create(name, *, visibility, chunk_size, chunk_overlap, embedding_model)`	Create a collection
`list()`	List collections
`get(collection_id)`	Get by ID
`update(collection_id, *, name, visibility, ...)`	Update
`delete(collection_id)`	Delete
`stats(collection_id)`	Document count, storage, section/chunk/claim counts
`recompute_summaries(collection_id)`	Re-run AI section summarization
`recompute_captions(collection_id)`	Re-run AI captioning for images and tables
`recompute_claims(collection_id)`	Re-extract factual claims (clears existing)

update() accepts: name, visibility, chunk_size, chunk_overlap, description, enable_summarization, enable_captioning, llm_model, instructions. llm_model and instructions accept None to clear the field; omit them entirely to leave unchanged.

# Set research instructions for a collection
client.collections.update(
    collection_id,
    instructions="All figures are in USD unless stated otherwise.",
)

# Clear instructions
client.collections.update(collection_id, instructions=None)

# Get collection statistics
stats = client.collections.stats(collection_id)
print(f"{stats.docCount} docs, {stats.totalClaimsCount} claims")

`client.documents`

Method	Description
`upload(collection_id, file, *, filename, content_type, ...)`	Multipart upload
`upload_many(collection_id, files, *, concurrency, on_progress)`	Bulk upload via presigned S3 URLs
`request_upload_url(collection_id, filename, content_type, file_size_bytes, content_hash)`	Presigned URL
`confirm(collection_id, document_id)`	Confirm presigned upload
`list(collection_id)`	List documents
`get(collection_id, document_id)`	Get document
`get_markdown(collection_id, document_id)`	Get Markdown string
`retry(collection_id, document_id)`	Retry failed document
`delete(collection_id, document_id)`	Delete document

upload() accepts a pathlib.Path, bytes, or any binary file-like object.

upload_many() is the recommended approach for large datasets. Each file is uploaded directly to S3 (bypassing the API server), so there are no payload-size limits. Files that match an existing document's hash are deduplicated automatically.

from pathlib import Path

docs = client.documents.upload_many(
    collection_id,
    list(Path("./reports").glob("**/*.pdf")),
    concurrency=10,
    on_progress=lambda doc, n, total: print(f"{n}/{total} {doc.filename}"),
)

Pass UploadManyItem instances when you need a custom filename or content type:

from dewey.resources.documents import UploadManyItem
from io import BytesIO

items = [
    UploadManyItem(file=BytesIO(data), filename="custom-name.pdf", content_type="application/pdf"),
]
docs = client.documents.upload_many(collection_id, items)

`client.sections`

Method	Description
`list(collection_id, document_id)`	List sections
`get(section_id)`	Get section with content
`get_chunks(section_id)`	Get chunks
`scan(collection_id, query, *, top_k)`	Full-text section scan

`client.retrieval`

Method	Description
`query(collection_id, q, *, limit)`	Hybrid search

`client.research`

Method	Description
`stream(collection_id, q, *, depth, model)`	SSE research → `Generator[ResearchEvent]`

depth options: "quick", "balanced" (default), "deep", "exhaustive".

`client.claims`

Method	Description
`map_stream(collection_id)`	SSE stream of all claims with UMAP coordinates
`list_by_document(document_id, *, min_importance)`	Claims extracted from a specific document

map_stream() yields raw event dicts. Check event["type"]: "progress", "done" (with claims list), or "error".

from dewey.types import ClaimMapItem

for event in client.claims.map_stream(collection_id):
    if event["type"] == "done":
        claims = [ClaimMapItem.from_dict(c) for c in event["claims"]]
        for claim in claims:
            print(f"[{claim.importance}] {claim.text}")

# Per-document claims (fast, no SSE)
result = client.claims.list_by_document(document_id, min_importance=3)
for claim in result.claims:
    print(claim.text)

`client.contradictions`

Method	Description
`list(collection_id, *, severity, status, limit)`	List detected contradictions
`detect(collection_id)`	Trigger async contradiction detection run
`get_latest_run(collection_id)`	Poll status of the latest detection run
`dismiss(collection_id, contradiction_id)`	Mark a contradiction as ignored
`apply_instruction(collection_id, contradiction_id, instruction)`	Apply resolution; appends to collection instructions

# Trigger detection
run = client.contradictions.detect(collection_id)
print("Run ID:", run.runId)

# Later: poll status
status = client.contradictions.get_latest_run(collection_id)
print(status.status, status.contradictionsFound)

# List active contradictions and apply resolutions
result = client.contradictions.list(collection_id, status="active")
for c in result.items:
    print(c.severity, c.explanation)
    # Apply the suggested resolution
    client.contradictions.apply_instruction(collection_id, c.id)

`client.duplicates`

Fuzzy document deduplication. Identifies near-duplicate documents via MinHash signatures, marks one member of each cluster as canonical, and excludes near-duplicates from retrieval and contradiction detection. Must be enabled per-collection with client.collections.update(id, enable_deduplication=True).

Method	Description
`detect(collection_id)`	Trigger async dedup run across all ready documents
`get_latest_run(collection_id)`	Poll status of the latest dedup run
`list(collection_id, *, limit, offset)`	List duplicate groups with members and coverage percentages
`promote_canonical(collection_id, group_id, canonical_document_id)`	Promote a different member to canonical; old canonical becomes near_duplicate
`disband(collection_id, group_id)`	Disband a group; all former members rejoin retrieval as distinct

# Enable on a collection (one-time)
client.collections.update(collection_id, enable_deduplication=True)

# Trigger detection, then poll
run = client.duplicates.detect(collection_id)
status = client.duplicates.get_latest_run(collection_id)
print(status.status, status.duplicateGroupsCreated)

# Review groups
result = client.duplicates.list(collection_id)
for group in result.items:
    for m in group.members:
        if m.relationship == "near_duplicate":
            pct = round((m.coverageToCanonical or 0) * 100)
            print(f"{m.filename} covers {pct}% of canonical")

`client.provider_keys`

Method	Description
`create(project_id, provider, key, name)`	Add provider key
`list(project_id)`	List keys
`delete(project_id, key_id)`	Delete key

Error handling

from dewey import DeweyClient, DeweyError

client = DeweyClient(api_key="dwy_live_...")

try:
    client.collections.get("unknown-id")
except DeweyError as e:
    print(e.status, e.message)  # e.g. 404 "Collection not found"

Presigned upload flow

For single files or when you need manual control, use the low-level presigned URL flow. For bulk ingestion, prefer upload_many() which handles this automatically with concurrency.

import hashlib
import urllib.request
from pathlib import Path

data = Path("file.pdf").read_bytes()
content_hash = hashlib.sha256(data).hexdigest()

# 1. Request a presigned URL
resp = client.documents.request_upload_url(
    collection_id,
    filename="file.pdf",
    content_type="application/pdf",
    file_size_bytes=len(data),
    content_hash=content_hash,
)

# 2. PUT bytes directly to S3 (no auth header needed)
req = urllib.request.Request(resp.uploadUrl, data=data, method="PUT")
req.add_header("Content-Type", "application/pdf")
urllib.request.urlopen(req)

# 3. Confirm to trigger ingestion
doc = client.documents.confirm(collection_id, resp.documentId)

Project details

Release history Release notifications | RSS feed

2.5.0

May 12, 2026

2.4.0

May 7, 2026

2.3.0

Apr 26, 2026

2.2.0

Apr 24, 2026

2.1.0

Apr 23, 2026

2.0.0

Apr 17, 2026

This version

1.4.0

Apr 17, 2026

1.3.0

Apr 15, 2026

1.2.0

Apr 14, 2026

1.1.0

Apr 11, 2026

1.0.0

Mar 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

meetdewey-1.4.0.tar.gz (24.0 kB view details)

Uploaded Apr 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

meetdewey-1.4.0-py3-none-any.whl (21.9 kB view details)

Uploaded Apr 17, 2026 Python 3

File details

Details for the file meetdewey-1.4.0.tar.gz.

File metadata

Download URL: meetdewey-1.4.0.tar.gz
Upload date: Apr 17, 2026
Size: 24.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for meetdewey-1.4.0.tar.gz
Algorithm	Hash digest
SHA256	`7a1501f35b08990598d376a7e8b65c640dba473eb5dbf0a4ea8f939a56ec88c3`
MD5	`83b8216f01f6a9624465f6e740aa658f`
BLAKE2b-256	`c31828b625c1f3e4c4a9af6e5ed3bae5e76215b8434ddbdcabe796ae0f8c0a11`

See more details on using hashes here.

File details

Details for the file meetdewey-1.4.0-py3-none-any.whl.

File metadata

Download URL: meetdewey-1.4.0-py3-none-any.whl
Upload date: Apr 17, 2026
Size: 21.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for meetdewey-1.4.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`80af6118569f3cde071d2aa2c1e98ae91d580da7313f8787b4aacd3800073663`
MD5	`db5ff63d80b6c26e6b23c236f09f8497`
BLAKE2b-256	`9a050da05a4cb3ba7791a6e72c9acdd9b3e1ca2b604c5410078216e8e1b8ea9c`

See more details on using hashes here.

meetdewey 1.4.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

dewey

Installation

Quick start

Constructor

Resources

`client.collections`

`client.documents`

`client.sections`

`client.retrieval`

`client.research`

`client.claims`

`client.contradictions`

`client.duplicates`

`client.provider_keys`

Error handling

Presigned upload flow

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes