Skip to main content

Universal capture → classify → route platform. Your AI-powered data pilot.

Project description

Arkiv

Universal capture → classify → route. Your AI-powered data pilot.

Arkiv takes any digital input — files, URLs, text — classifies it using AI, and routes it to the right destination. Think of it as an intelligent mail sorting facility for your digital life.

              ┌─────────┐
  File ──────►│         │──► Archiv/Rechnungen
  URL  ──────►│  ARKIV  │──► Leseliste/Artikel
  Text ──────►│         │──► Code/Snippets
  Mail ──────►│ classify │──► Review (unsicher)
              │  route   │──► Plugin: Webhook
              └─────────┘──► Plugin: Custom

Features

  • Universal Intake — Drop any file, paste text, pipe from stdin
  • AI Classification — LLM-powered content understanding (Ollama, OpenAI, Anthropic, HuggingFace)
  • Smart Routing — Category-based rules route items to folders, webhooks, or custom destinations
  • Full-Text Search — SQLite FTS5 search across all processed items
  • Plugin System — Extend with pip-installable plugins (powered by pluggy)
  • Local-First — Your data stays on your machine. No cloud required.
  • Filesystem Watcher — Auto-process files dropped into your inbox directory

Quick Start

# Install
pipx install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

# Make sure Ollama is running with a model
ollama pull qwen2.5:7b

# Start Arkiv
arkiv

That's it. kurier launches the interactive TUI where you can classify files, search, monitor your inbox, and more — all from one interface.

Alternative install methods:

# With pip (requires a virtual environment)
pip install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

# With uv
uv pip install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

CLI Commands

All features are also available as individual commands:

kurier                         # Interactive TUI (default)
kurier add invoice.pdf         # Classify and route a file
kurier watch                   # Auto-process files in inbox
kurier search "Rechnung"       # Hybrid keyword + semantic search
kurier status                  # Processing statistics
kurier undo                    # Undo last routing action
kurier export --format csv     # Export all items as CSV
kurier doctor                  # Check system health
kurier init                    # Interactive setup wizard

Configuration

Kurier uses a TOML config file at ~/.config/kurier/config.toml:

[llm]
provider = "ollama"
model = "mistral"
base_url = "http://localhost:11434"

[embeddings]
model = "BAAI/bge-small-en-v1.5"

[routes.archiv]
type = "folder"
path = "~/Documents/Arkiv/Archiv"
categories = ["rechnung", "vertrag", "brief"]
confidence_threshold = 0.7

[routes.artikel]
type = "folder"
path = "~/Documents/Arkiv/Artikel"
categories = ["artikel", "paper", "tutorial"]
confidence_threshold = 0.6

LLM Providers

Arkiv supports any LLM provider via LiteLLM:

Provider Config
Ollama (local) provider = "ollama", model = "qwen2.5:7b"
OpenAI provider = "openai", model = "gpt-4o-mini"
Anthropic provider = "anthropic", model = "claude-sonnet-4-5-20250514"
HuggingFace provider = "huggingface", model = "meta-llama/..."

REST API

Start the API server for external integrations, webhooks, and mobile capture:

# Install API dependencies
uv pip install "kurier[api] @ git+https://github.com/HerrStolzier/lotse.git"

# Start the server
kurier serve
# → http://127.0.0.1:8790/docs (Swagger UI)

Endpoints:

Method Endpoint Description
GET /health Health check
POST /ingest/file Upload a file for classification
POST /ingest/text Submit text for classification
GET /search?q=... Hybrid keyword + semantic search
GET /status Processing statistics
GET /recent Recently processed items
# Example: ingest a file via curl
curl -X POST http://localhost:8790/ingest/file -F "file=@invoice.pdf"

# Example: search
curl "http://localhost:8790/search?q=Telefonkosten&mode=auto"

Plugins

Arkiv is built to be extended. Plugins can:

  • Pre-process content before classification
  • Post-process classification results
  • Add custom routes (Slack, Notion, webhooks, ...)
  • React to routing events (notifications, logging, ...)

Writing a Plugin

# my_arkiv_plugin.py
from arkiv.plugins.spec import hookimpl

@hookimpl
def on_routed(path: str, destination: str, route_name: str) -> None:
    """Send a notification when a file is routed."""
    print(f"Routed {path}{destination}")
# pyproject.toml
[project.entry-points."arkiv.plugins"]
my-plugin = "my_arkiv_plugin"

See the Plugin Guide for details.

Architecture

src/arkiv/
├── cli.py              # Typer CLI interface
├── core/
│   ├── config.py       # TOML configuration
│   ├── classifier.py   # LLM classification engine
│   ├── router.py       # Route matching & execution
│   └── engine.py       # Main pipeline orchestrator
├── db/
│   └── store.py        # SQLite + FTS5 storage
├── inlets/
│   └── watch.py        # Filesystem watcher
├── plugins/
│   ├── spec.py         # Plugin hook specifications
│   └── manager.py      # Plugin discovery (pluggy)
└── routes/             # Built-in route handlers

Roadmap

  • Core pipeline: capture → classify → route
  • CLI interface
  • SQLite + FTS5 search
  • Plugin system (pluggy)
  • Filesystem watcher
  • REST API inlet (FastAPI with auto-docs)
  • Semantic search (FastEmbed + sqlite-vec hybrid search with RRF)
  • Web dashboard (HTMX + Tailwind, no build step)
  • Email inlet (IMAP fetch + .eml/.mbox import)
  • Webhook route plugin (Slack, Discord, generic)
  • OCR support (PyMuPDF + Tesseract)
  • Self-audit system (duplicates, misclassifications, orphaned files)
  • Interactive TUI (Textual)
  • Undo & Export commands
  • Retry logic with exponential backoff
  • Transaction safety (pending → routed/failed)
  • Custom categories via config
  • Browser extension

Development

# Clone
git clone https://github.com/HerrStolzier/lotse.git
cd lotse

# Create venv and install in development mode
uv venv && source .venv/bin/activate
uv pip install -e ".[dev,api,ocr,tui]"

# Run tests
pytest

# Lint + type check
ruff check src/
mypy src/

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kurier-0.4.0.tar.gz (311.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kurier-0.4.0-py3-none-any.whl (93.8 kB view details)

Uploaded Python 3

File details

Details for the file kurier-0.4.0.tar.gz.

File metadata

  • Download URL: kurier-0.4.0.tar.gz
  • Upload date:
  • Size: 311.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kurier-0.4.0.tar.gz
Algorithm Hash digest
SHA256 2b4ff9e2974fe68671c429b808ce75a2500091b85c4326eba9b5bda357ecbced
MD5 40669b06a6c8f9715a3cb618030ddde0
BLAKE2b-256 0997c95ee7054592844bcd06718a54e13cc54497e5ca91574061dfd59767aa0f

See more details on using hashes here.

File details

Details for the file kurier-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: kurier-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 93.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kurier-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90f5b150c4f6145740f7f839da083c2e6e279a5feb448835e78dafb9726e9397
MD5 b5f47fade0644b871fc43655185540e0
BLAKE2b-256 c25fc742e13c11079cb3cc71796c644c0da573006ecb9d6cf56cf1e01a31b82b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page