Skip to main content

Universal capture → classify → route platform. Your AI-powered data pilot.

Project description

Arkiv

Universal capture → classify → route. Your AI-powered data pilot.

Arkiv takes any digital input — files, URLs, text — classifies it using AI, and routes it to the right destination. Think of it as an intelligent mail sorting facility for your digital life.

              ┌─────────┐
  File ──────►│         │──► Archiv/Rechnungen
  URL  ──────►│  ARKIV  │──► Leseliste/Artikel
  Text ──────►│         │──► Code/Snippets
  Mail ──────►│ classify │──► Review (unsicher)
              │  route   │──► Plugin: Webhook
              └─────────┘──► Plugin: Custom

Features

  • Universal Intake — Drop any file, paste text, pipe from stdin
  • AI Classification — LLM-powered content understanding (Ollama, OpenAI, Anthropic, HuggingFace)
  • Smart Routing — Category-based rules route items to folders, webhooks, or custom destinations
  • Full-Text Search — SQLite FTS5 search across all processed items
  • Plugin System — Extend with pip-installable plugins (powered by pluggy)
  • Local-First — Your data stays on your machine. No cloud required.
  • Filesystem Watcher — Auto-process files dropped into your inbox directory

Quick Start

# Install
pipx install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

# Make sure Ollama is running with a model
ollama pull qwen2.5:7b

# Start Arkiv
arkiv

That's it. kurier launches the interactive TUI where you can classify files, search, monitor your inbox, and more — all from one interface.

Alternative install methods:

# With pip (requires a virtual environment)
pip install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

# With uv
uv pip install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

CLI Commands

All features are also available as individual commands:

kurier                         # Interactive TUI (default)
kurier add invoice.pdf         # Classify and route a file
kurier watch                   # Auto-process files in inbox
kurier search "Rechnung"       # Hybrid keyword + semantic search
kurier status                  # Processing statistics
kurier undo                    # Undo last routing action
kurier export --format csv     # Export all items as CSV
kurier doctor                  # Check system health
kurier init                    # Interactive setup wizard

Configuration

Arkiv uses a TOML config file at ~/.config/arkiv/config.toml:

[llm]
provider = "ollama"
model = "mistral"
base_url = "http://localhost:11434"

[embeddings]
model = "BAAI/bge-small-en-v1.5"

[routes.archiv]
type = "folder"
path = "~/Documents/Arkiv/Archiv"
categories = ["rechnung", "vertrag", "brief"]
confidence_threshold = 0.7

[routes.artikel]
type = "folder"
path = "~/Documents/Arkiv/Artikel"
categories = ["artikel", "paper", "tutorial"]
confidence_threshold = 0.6

LLM Providers

Arkiv supports any LLM provider via LiteLLM:

Provider Config
Ollama (local) provider = "ollama", model = "qwen2.5:7b"
OpenAI provider = "openai", model = "gpt-4o-mini"
Anthropic provider = "anthropic", model = "claude-sonnet-4-5-20250514"
HuggingFace provider = "huggingface", model = "meta-llama/..."

REST API

Start the API server for external integrations, webhooks, and mobile capture:

# Install API dependencies
uv pip install "kurier[api] @ git+https://github.com/HerrStolzier/lotse.git"

# Start the server
kurier serve
# → http://127.0.0.1:8790/docs (Swagger UI)

Endpoints:

Method Endpoint Description
GET /health Health check
POST /ingest/file Upload a file for classification
POST /ingest/text Submit text for classification
GET /search?q=... Hybrid keyword + semantic search
GET /status Processing statistics
GET /recent Recently processed items
# Example: ingest a file via curl
curl -X POST http://localhost:8790/ingest/file -F "file=@invoice.pdf"

# Example: search
curl "http://localhost:8790/search?q=Telefonkosten&mode=auto"

Plugins

Arkiv is built to be extended. Plugins can:

  • Pre-process content before classification
  • Post-process classification results
  • Add custom routes (Slack, Notion, webhooks, ...)
  • React to routing events (notifications, logging, ...)

Writing a Plugin

# my_arkiv_plugin.py
from arkiv.plugins.spec import hookimpl

@hookimpl
def on_routed(path: str, destination: str, route_name: str) -> None:
    """Send a notification when a file is routed."""
    print(f"Routed {path}{destination}")
# pyproject.toml
[project.entry-points."arkiv.plugins"]
my-plugin = "my_arkiv_plugin"

See the Plugin Guide for details.

Architecture

src/arkiv/
├── cli.py              # Typer CLI interface
├── core/
│   ├── config.py       # TOML configuration
│   ├── classifier.py   # LLM classification engine
│   ├── router.py       # Route matching & execution
│   └── engine.py       # Main pipeline orchestrator
├── db/
│   └── store.py        # SQLite + FTS5 storage
├── inlets/
│   └── watch.py        # Filesystem watcher
├── plugins/
│   ├── spec.py         # Plugin hook specifications
│   └── manager.py      # Plugin discovery (pluggy)
└── routes/             # Built-in route handlers

Roadmap

  • Core pipeline: capture → classify → route
  • CLI interface
  • SQLite + FTS5 search
  • Plugin system (pluggy)
  • Filesystem watcher
  • REST API inlet (FastAPI with auto-docs)
  • Semantic search (FastEmbed + sqlite-vec hybrid search with RRF)
  • Web dashboard (HTMX + Tailwind, no build step)
  • Email inlet (IMAP fetch + .eml/.mbox import)
  • Webhook route plugin (Slack, Discord, generic)
  • OCR support (PyMuPDF + Tesseract)
  • Self-audit system (duplicates, misclassifications, orphaned files)
  • Interactive TUI (Textual)
  • Undo & Export commands
  • Retry logic with exponential backoff
  • Transaction safety (pending → routed/failed)
  • Custom categories via config
  • Browser extension

Development

# Clone
git clone https://github.com/HerrStolzier/lotse.git
cd lotse

# Create venv and install in development mode
uv venv && source .venv/bin/activate
uv pip install -e ".[dev,api,ocr,tui]"

# Run tests
pytest

# Lint + type check
ruff check src/
mypy src/

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kurier-0.2.0.tar.gz (300.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kurier-0.2.0-py3-none-any.whl (83.5 kB view details)

Uploaded Python 3

File details

Details for the file kurier-0.2.0.tar.gz.

File metadata

  • Download URL: kurier-0.2.0.tar.gz
  • Upload date:
  • Size: 300.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kurier-0.2.0.tar.gz
Algorithm Hash digest
SHA256 9626b5f78cd0cca7a60777389595cd968c4f140b69f980425e92935096c43a5b
MD5 6e8c86bafc85ffcd1cd5cf510681302a
BLAKE2b-256 06b95b3281a5f19b3d3dd10738881cf7a586226869c9677a928ddebee18bc347

See more details on using hashes here.

File details

Details for the file kurier-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: kurier-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 83.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kurier-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 560bc4cee46063d7b9aa5d029431dac82c15923392148f623b771780e3262c62
MD5 280ca83e706e96fabc12608a62e06328
BLAKE2b-256 8b8e87f5424d5b604d3332df2f6e717aa10a33b6b6869e1dac7c9ef8efd02694

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page