Skip to main content

Universal capture → classify → route platform. Your AI-powered data pilot.

Project description

Arkiv

Universal capture → classify → route. Your AI-powered data pilot.

Arkiv takes any digital input — files, URLs, text — classifies it using AI, and routes it to the right destination. Think of it as an intelligent mail sorting facility for your digital life.

              ┌─────────┐
  File ──────►│         │──► Archiv/Rechnungen
  URL  ──────►│  ARKIV  │──► Leseliste/Artikel
  Text ──────►│         │──► Code/Snippets
  Mail ──────►│ classify │──► Review (unsicher)
              │  route   │──► Plugin: Webhook
              └─────────┘──► Plugin: Custom

Features

  • Universal Intake — Drop any file, paste text, pipe from stdin
  • AI Classification — LLM-powered content understanding (Ollama, OpenAI, Anthropic, HuggingFace)
  • Smart Routing — Category-based rules route items to folders, webhooks, or custom destinations
  • Full-Text Search — SQLite FTS5 search across all processed items
  • Plugin System — Extend with pip-installable plugins (powered by pluggy)
  • Local-First — Your data stays on your machine. No cloud required.
  • Filesystem Watcher — Auto-process files dropped into your inbox directory

Quick Start

# Install
pipx install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

# Make sure Ollama is running with a model
ollama pull qwen2.5:7b

# Start Arkiv
arkiv

That's it. kurier launches the interactive TUI where you can classify files, search, monitor your inbox, and more — all from one interface.

Alternative install methods:

# With pip (requires a virtual environment)
pip install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

# With uv
uv pip install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

CLI Commands

All features are also available as individual commands:

kurier                         # Interactive TUI (default)
kurier add invoice.pdf         # Classify and route a file
kurier watch                   # Auto-process files in inbox
kurier search "Rechnung"       # Hybrid keyword + semantic search
kurier status                  # Processing statistics
kurier undo                    # Undo last routing action
kurier export --format csv     # Export all items as CSV
kurier doctor                  # Check system health
kurier init                    # Interactive setup wizard

Configuration

Arkiv uses a TOML config file at ~/.config/arkiv/config.toml:

[llm]
provider = "ollama"
model = "mistral"
base_url = "http://localhost:11434"

[embeddings]
model = "BAAI/bge-small-en-v1.5"

[routes.archiv]
type = "folder"
path = "~/Documents/Arkiv/Archiv"
categories = ["rechnung", "vertrag", "brief"]
confidence_threshold = 0.7

[routes.artikel]
type = "folder"
path = "~/Documents/Arkiv/Artikel"
categories = ["artikel", "paper", "tutorial"]
confidence_threshold = 0.6

LLM Providers

Arkiv supports any LLM provider via LiteLLM:

Provider Config
Ollama (local) provider = "ollama", model = "qwen2.5:7b"
OpenAI provider = "openai", model = "gpt-4o-mini"
Anthropic provider = "anthropic", model = "claude-sonnet-4-5-20250514"
HuggingFace provider = "huggingface", model = "meta-llama/..."

REST API

Start the API server for external integrations, webhooks, and mobile capture:

# Install API dependencies
uv pip install "kurier[api] @ git+https://github.com/HerrStolzier/lotse.git"

# Start the server
kurier serve
# → http://127.0.0.1:8790/docs (Swagger UI)

Endpoints:

Method Endpoint Description
GET /health Health check
POST /ingest/file Upload a file for classification
POST /ingest/text Submit text for classification
GET /search?q=... Hybrid keyword + semantic search
GET /status Processing statistics
GET /recent Recently processed items
# Example: ingest a file via curl
curl -X POST http://localhost:8790/ingest/file -F "file=@invoice.pdf"

# Example: search
curl "http://localhost:8790/search?q=Telefonkosten&mode=auto"

Plugins

Arkiv is built to be extended. Plugins can:

  • Pre-process content before classification
  • Post-process classification results
  • Add custom routes (Slack, Notion, webhooks, ...)
  • React to routing events (notifications, logging, ...)

Writing a Plugin

# my_arkiv_plugin.py
from arkiv.plugins.spec import hookimpl

@hookimpl
def on_routed(path: str, destination: str, route_name: str) -> None:
    """Send a notification when a file is routed."""
    print(f"Routed {path}{destination}")
# pyproject.toml
[project.entry-points."arkiv.plugins"]
my-plugin = "my_arkiv_plugin"

See the Plugin Guide for details.

Architecture

src/arkiv/
├── cli.py              # Typer CLI interface
├── core/
│   ├── config.py       # TOML configuration
│   ├── classifier.py   # LLM classification engine
│   ├── router.py       # Route matching & execution
│   └── engine.py       # Main pipeline orchestrator
├── db/
│   └── store.py        # SQLite + FTS5 storage
├── inlets/
│   └── watch.py        # Filesystem watcher
├── plugins/
│   ├── spec.py         # Plugin hook specifications
│   └── manager.py      # Plugin discovery (pluggy)
└── routes/             # Built-in route handlers

Roadmap

  • Core pipeline: capture → classify → route
  • CLI interface
  • SQLite + FTS5 search
  • Plugin system (pluggy)
  • Filesystem watcher
  • REST API inlet (FastAPI with auto-docs)
  • Semantic search (FastEmbed + sqlite-vec hybrid search with RRF)
  • Web dashboard (HTMX + Tailwind, no build step)
  • Email inlet (IMAP fetch + .eml/.mbox import)
  • Webhook route plugin (Slack, Discord, generic)
  • OCR support (PyMuPDF + Tesseract)
  • Self-audit system (duplicates, misclassifications, orphaned files)
  • Interactive TUI (Textual)
  • Undo & Export commands
  • Retry logic with exponential backoff
  • Transaction safety (pending → routed/failed)
  • Custom categories via config
  • Browser extension

Development

# Clone
git clone https://github.com/HerrStolzier/lotse.git
cd lotse

# Create venv and install in development mode
uv venv && source .venv/bin/activate
uv pip install -e ".[dev,api,ocr,tui]"

# Run tests
pytest

# Lint + type check
ruff check src/
mypy src/

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kurier-0.2.1.tar.gz (301.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kurier-0.2.1-py3-none-any.whl (83.6 kB view details)

Uploaded Python 3

File details

Details for the file kurier-0.2.1.tar.gz.

File metadata

  • Download URL: kurier-0.2.1.tar.gz
  • Upload date:
  • Size: 301.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kurier-0.2.1.tar.gz
Algorithm Hash digest
SHA256 882722e1b6d74e436d2c78a7df651120a1c734407926394d5c21fa72b9c4ae60
MD5 e4f58d573f574bc95518a2d50e7d7741
BLAKE2b-256 9e7e183874f813cfe171e6f8fdfe3c39ff86a114c05e4fb549830a13d856085b

See more details on using hashes here.

File details

Details for the file kurier-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: kurier-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 83.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kurier-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 82884632dc6e4f77310b66031f8a25733ed52be67643cc831fe634df5eab2446
MD5 b34c426055c34570f2d7b767db128b42
BLAKE2b-256 938ef0a93f50c07f0344e7425c95a7322a2ccb49093919f7167d5256ccb4b710

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page