Skip to main content

Universal personal data format. JSONL in, SQL out, MCP to LLMs.

Project description

arkiv

Universal personal data format. JSONL in, SQL out, MCP to LLMs.

The Format

Every record is a JSON object. All fields optional.

{"mimetype": "text/plain", "content": "I think the key insight is...", "uri": "https://chatgpt.com/c/abc", "timestamp": "2023-05-14T10:30:00Z", "metadata": {"role": "user", "conversation_id": "abc"}}
{"mimetype": "audio/wav", "uri": "file://media/podcast.wav", "timestamp": "2024-01-15", "metadata": {"transcript": "Welcome to...", "duration": 45.2}}
{"mimetype": "image/jpeg", "uri": "file://media/photo.jpg", "metadata": {"caption": "My talk at MIT"}}

The Stack

JSONL directory (human-readable, portable, durable)
       ⇅ arkiv convert
SQLite database (queryable, efficient, standard SQL)
       ↓ arkiv mcp
MCP server (tools → any LLM)

The two forms (directory and database) are isomorphic peers. arkiv convert goes either direction, auto-detected from input type.

Quick Start

pip install arkiv

# Point at a directory and query. arkiv.db is auto-created on demand.
arkiv query ./my-archive/ "SELECT content FROM records WHERE metadata->>'role' = 'user' LIMIT 5"

# Serve to any LLM via MCP
arkiv mcp ./my-archive/

# Explicit conversion (either direction)
arkiv convert conversations.jsonl archive.db              # JSONL → database
arkiv convert archive.db ./exported/                      # database → directory
arkiv convert archive.db 2024/ --since 2024-01-01         # temporal slice
arkiv convert archive.db archive.zip                      # pack for transport

MCP Tools

Read-only by default. Start with arkiv mcp --writable db to enable the write tool.

Tool Description Mode
get_manifest() What collections exist, their descriptions and schemas read-only
get_schema(collection?) What metadata keys can be queried read-only
sql_query(query) Run read-only SQL read-only
write_record(...) Append a single record to a collection writable

Why

  • Your data lives in silos (ChatGPT, email, bookmarks, photos, voice memos)
  • Source toolkits (memex, mtk, btk, ptk, ebk) export it as JSONL
  • arkiv gives you one format, one database, one query interface
  • Any LLM can query it via MCP
  • JSONL is human-readable and durable. SQLite is the most deployed database in history.

Spec and philosophy

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arkiv-0.2.0.tar.gz (97.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

arkiv-0.2.0-py3-none-any.whl (26.8 kB view details)

Uploaded Python 3

File details

Details for the file arkiv-0.2.0.tar.gz.

File metadata

  • Download URL: arkiv-0.2.0.tar.gz
  • Upload date:
  • Size: 97.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for arkiv-0.2.0.tar.gz
Algorithm Hash digest
SHA256 8d7b36f7255bdb6cd05102ff709c0930c981bea272c63b3d0cec8eef9e08f95f
MD5 96427568e7e8b9e6508bb10dfd0263cd
BLAKE2b-256 597e8a4cfca85663d40ca4cfe2fa852f07696201674041c53aec1d68882c0a86

See more details on using hashes here.

File details

Details for the file arkiv-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: arkiv-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 26.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for arkiv-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 294fe2a567f779c9f86d0abe8c8eab7ad5ed4f5e29c13d0eab95e084ac6cc4df
MD5 776d99c6707e4e0852cf9bc1a284c77a
BLAKE2b-256 9d1c17152728ef6e5d8413e987f2457174133aad3b524448ab957b40de6c76fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page