Universal personal data format. JSONL in, SQL out, MCP to LLMs.
Project description
arkiv
Universal personal data format. JSONL in, SQL out, MCP to LLMs.
The Format
Every record is a JSON object. All fields optional.
{"mimetype": "text/plain", "content": "I think the key insight is...", "uri": "https://chatgpt.com/c/abc", "timestamp": "2023-05-14T10:30:00Z", "metadata": {"role": "user", "conversation_id": "abc"}}
{"mimetype": "audio/wav", "uri": "file://media/podcast.wav", "timestamp": "2024-01-15", "metadata": {"transcript": "Welcome to...", "duration": 45.2}}
{"mimetype": "image/jpeg", "uri": "file://media/photo.jpg", "metadata": {"caption": "My talk at MIT"}}
The Stack
JSONL directory (human-readable, portable, durable)
⇅ arkiv convert
SQLite database (queryable, efficient, standard SQL)
↓ arkiv mcp
MCP server (tools → any LLM)
The two forms (directory and database) are isomorphic peers. arkiv convert
goes either direction, auto-detected from input type.
Quick Start
pip install arkiv
# Point at a directory and query. arkiv.db is auto-created on demand.
arkiv query ./my-archive/ "SELECT content FROM records WHERE metadata->>'role' = 'user' LIMIT 5"
# Serve to any LLM via MCP
arkiv mcp ./my-archive/
# Explicit conversion (either direction)
arkiv convert conversations.jsonl archive.db # JSONL → database
arkiv convert archive.db ./exported/ # database → directory
arkiv convert archive.db 2024/ --since 2024-01-01 # temporal slice
arkiv convert archive.db archive.zip # pack for transport
MCP Tools
Read-only by default. Start with arkiv mcp --writable db to enable the write tool.
| Tool | Description | Mode |
|---|---|---|
get_manifest() |
What collections exist, their descriptions and schemas | read-only |
get_schema(collection?) |
What metadata keys can be queried | read-only |
sql_query(query) |
Run read-only SQL | read-only |
write_record(...) |
Append a single record to a collection | writable |
Why
- Your data lives in silos (ChatGPT, email, bookmarks, photos, voice memos)
- Source toolkits (memex, mtk, btk, ptk, ebk) export it as JSONL
- arkiv gives you one format, one database, one query interface
- Any LLM can query it via MCP
- JSONL is human-readable and durable. SQLite is the most deployed database in history.
Spec and philosophy
- SPEC.md: full technical specification
- docs/PHILOSOPHY.md: why arkiv exists and how it composes with longecho
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arkiv-0.2.0.tar.gz.
File metadata
- Download URL: arkiv-0.2.0.tar.gz
- Upload date:
- Size: 97.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d7b36f7255bdb6cd05102ff709c0930c981bea272c63b3d0cec8eef9e08f95f
|
|
| MD5 |
96427568e7e8b9e6508bb10dfd0263cd
|
|
| BLAKE2b-256 |
597e8a4cfca85663d40ca4cfe2fa852f07696201674041c53aec1d68882c0a86
|
File details
Details for the file arkiv-0.2.0-py3-none-any.whl.
File metadata
- Download URL: arkiv-0.2.0-py3-none-any.whl
- Upload date:
- Size: 26.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
294fe2a567f779c9f86d0abe8c8eab7ad5ed4f5e29c13d0eab95e084ac6cc4df
|
|
| MD5 |
776d99c6707e4e0852cf9bc1a284c77a
|
|
| BLAKE2b-256 |
9d1c17152728ef6e5d8413e987f2457174133aad3b524448ab957b40de6c76fa
|