Skip to main content

Your app's data rep — a local agent runtime that retrieves data from any source on behalf of consuming applications.

Project description

datarep

Your app's data rep.

A rep is someone you send to go get something on your behalf. You don't tell them how — you tell them what you need, and they figure it out. They show up, assess the situation, adapt to whatever they find, and come back with the goods.

That's what datarep does. Your app says "get me the user's Instagram DMs" and datarep handles it — asks the user how they access the data, extracts session cookies from their browser, calls the API, parses the response, and delivers structured data. No one wrote an Instagram integration. The rep figured one out at runtime.

And like a good rep, it learns. Working code is saved as recipes with a full access strategy, so next time it doesn't have to figure it out again. First request takes seconds. Every request after that is instant.

Why this exists

Every app that needs user data today has to build and maintain its own integrations — or depend on a cloud service that proxies the user's data through someone else's servers. datarep is a different approach: a local agent runtime that synthesizes integrations on demand, runs on the user's machine, and never sends their data anywhere.

There isn't really a category for this yet. It's not a connector (those are pre-built by humans), not an ETL pipeline, not an SDK. It's an autonomous agent that becomes a connector — for any source, on the fly.

Quick start

pip install datarep
datarep init
export ANTHROPIC_API_KEY="sk-ant-..."
datarep start

Register your app and get an API key:

datarep app register my-app

Retrieve data:

# Via CLI (interactive — agent asks follow-up questions)
datarep get "i want my Instagram DMs"

# Via HTTP API
curl -X POST http://127.0.0.1:7080/get \
  -H "Authorization: Bearer dr_<your-api-key>" \
  -H "Content-Type: application/json" \
  -d '{"query": "get my recent iMessages"}'

How it works

datarep uses a conversational agent that leads the data retrieval process:

  1. Asks how you access the data — "How do you usually access your Instagram — in a browser, the app, or something else?"
  2. Explores the device — scans browser profiles, app databases, local files based on your answer
  3. Extracts credentials programmatically — pulls session cookies from Safari, Chrome, Firefox, etc. using browser_cookie3
  4. Reports stats and gets approval — tells the user what it found (record count, date range) before extracting
  5. Writes and validates retrieval code — runs a test extraction (~1000 rows), checks quality, and saves a recipe
  6. Streams data on demand — consuming apps call GET /data/{recipe_id} to stream the full dataset as NDJSON, piped directly from the sandbox with no memory limits

Recipes are fault-tolerant — per-row error handling ensures a single bad row never kills the stream. Failed rows are logged, and datarep's agent automatically fixes the recipe so the consuming app can retry just the missing rows.

The agent has full read-only filesystem access and open network access. It never asks you to manually extract data it can get programmatically — the only thing it may ask is for you to log into a service.

Interfaces

Interface Use case
HTTP API (localhost:7080) Primary interface for all apps. Bearer token auth. Supports conversational sessions.
MCP server Native interface for agentic/LLM-powered apps.
CLI (datarep) Interactive retrieval, setup, source management, debugging.

Integration guide

See docs/integration-guide.md for the full walkthrough: API reference, conversational sessions, authentication, MCP setup, recipes, and code examples.

Development

pip install -e ".[dev]"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datarep-1.1.9.tar.gz (55.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datarep-1.1.9-py3-none-any.whl (44.8 kB view details)

Uploaded Python 3

File details

Details for the file datarep-1.1.9.tar.gz.

File metadata

  • Download URL: datarep-1.1.9.tar.gz
  • Upload date:
  • Size: 55.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datarep-1.1.9.tar.gz
Algorithm Hash digest
SHA256 4cdefd5ad67b89a0aa97b1900ffd0849f69c33675253b2557f97ddfbf50cbc62
MD5 a4ead567b4c29707fda6f1cbbab63ffb
BLAKE2b-256 3f2ffda4de1a39f46ba39fb4af49d8e3ff115aadd75f72f04f3da4924cf80f66

See more details on using hashes here.

Provenance

The following attestation bundles were made for datarep-1.1.9.tar.gz:

Publisher: publish.yml on datarep-ai/datarep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file datarep-1.1.9-py3-none-any.whl.

File metadata

  • Download URL: datarep-1.1.9-py3-none-any.whl
  • Upload date:
  • Size: 44.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for datarep-1.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 63ba3e8eca4aa9a1a9354dcca1fb16f3b56b827293baa4b4b107d47e0737fbfe
MD5 8e9169fe7b229165f16eaf3d022d4a2d
BLAKE2b-256 b0f057c7e218950c5d7653b742ad8826c22520f938ae5e282caf4b94bb5d9f7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for datarep-1.1.9-py3-none-any.whl:

Publisher: publish.yml on datarep-ai/datarep

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page