Skip to main content

Universal personal data format. JSONL in, SQL out, MCP to LLMs.

Project description

arkiv

Universal personal data format. JSONL in, SQL out, MCP to LLMs.

The Format

Every record is a JSON object. All fields optional.

{"mimetype": "text/plain", "content": "I think the key insight is...", "uri": "https://chatgpt.com/c/abc", "timestamp": "2023-05-14T10:30:00Z", "metadata": {"role": "user", "conversation_id": "abc"}}
{"mimetype": "audio/wav", "uri": "file://media/podcast.wav", "timestamp": "2024-01-15", "metadata": {"transcript": "Welcome to...", "duration": 45.2}}
{"mimetype": "image/jpeg", "uri": "file://media/photo.jpg", "metadata": {"caption": "My talk at MIT"}}

The Stack

JSONL directory (human-readable, portable, durable)
       ⇅ arkiv convert
SQLite database (queryable, efficient, standard SQL)
       ↓ arkiv mcp
MCP server (tools → any LLM)

The two forms (directory and database) are isomorphic peers. arkiv convert goes either direction, auto-detected from input type.

Quick Start

pip install arkiv

# Point at a directory and query. arkiv.db is auto-created on demand.
arkiv query ./my-archive/ "SELECT content FROM records WHERE metadata->>'role' = 'user' LIMIT 5"

# Serve to any LLM via MCP
arkiv mcp ./my-archive/

# Explicit conversion (either direction)
arkiv convert conversations.jsonl archive.db              # JSONL → database
arkiv convert archive.db ./exported/                      # database → directory
arkiv convert archive.db 2024/ --since 2024-01-01         # temporal slice
arkiv convert archive.db archive.zip                      # pack for transport

MCP Tools

Read-only by default. Start with arkiv mcp --writable db to enable the write tool.

Tool Description Mode
get_manifest() What collections exist, their descriptions and schemas read-only
get_schema(collection?) What metadata keys can be queried read-only
sql_query(query) Run read-only SQL read-only
write_record(...) Append a single record to a collection writable

Why

  • Your data lives in silos (ChatGPT, email, bookmarks, photos, voice memos)
  • Source toolkits (memex, mtk, btk, ptk, ebk) export it as JSONL
  • arkiv gives you one format, one database, one query interface
  • Any LLM can query it via MCP
  • JSONL is human-readable and durable. SQLite is the most deployed database in history.

Spec and philosophy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arkiv-0.3.0.tar.gz (127.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arkiv-0.3.0-py3-none-any.whl (35.0 kB view details)

Uploaded Python 3

File details

Details for the file arkiv-0.3.0.tar.gz.

File metadata

  • Download URL: arkiv-0.3.0.tar.gz
  • Upload date:
  • Size: 127.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for arkiv-0.3.0.tar.gz
Algorithm Hash digest
SHA256 fbc123cfd01b6f9b58e4a1b9306f42266e344be299d4603284d68e97336cbb37
MD5 3641fbe3d85d0058faaf62df019bffca
BLAKE2b-256 54407dd27856c641ca008e7d2e5c2939ba2e89e9ba3833563514b8c358ec6639

See more details on using hashes here.

File details

Details for the file arkiv-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: arkiv-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 35.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for arkiv-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a7a80aa4fb3075901cc1a0c365fad734ee912efaa244d7a0b87ccd92f46577c8
MD5 0ac348e15dbd1a6367ec4d294750b72a
BLAKE2b-256 a53a82c9b6a54f7642f51679cc63067a1595027bac7490b65b44540e7b867b9e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page