Skip to main content

Universal capture → classify → route platform. Your AI-powered data pilot.

Project description

Arkiv

Universal capture → classify → route. Your AI-powered data pilot.

Arkiv takes any digital input — files, URLs, text — classifies it using AI, and routes it to the right destination. Think of it as an intelligent mail sorting facility for your digital life.

              ┌─────────┐
  File ──────►│         │──► Archiv/Rechnungen
  URL  ──────►│  ARKIV  │──► Leseliste/Artikel
  Text ──────►│         │──► Code/Snippets
  Mail ──────►│ classify │──► Review (unsicher)
              │  route   │──► Plugin: Webhook
              └─────────┘──► Plugin: Custom

Features

  • Universal Intake — Drop any file, paste text, pipe from stdin
  • AI Classification — LLM-powered content understanding (Ollama, OpenAI, Anthropic, HuggingFace)
  • Smart Routing — Category-based rules route items to folders, webhooks, or custom destinations
  • Full-Text Search — SQLite FTS5 search across all processed items
  • Plugin System — Extend with pip-installable plugins (powered by pluggy)
  • Local-First — Your data stays on your machine. No cloud required.
  • Filesystem Watcher — Auto-process files dropped into your inbox directory

Quick Start

# Install
pipx install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

# Make sure Ollama is running with a model
ollama pull qwen2.5:7b

# Start Arkiv
arkiv

That's it. kurier launches the interactive TUI where you can classify files, search, monitor your inbox, and more — all from one interface.

Alternative install methods:

# With pip (requires a virtual environment)
pip install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

# With uv
uv pip install "kurier @ git+https://github.com/HerrStolzier/lotse.git"

CLI Commands

All features are also available as individual commands:

kurier                         # Interactive TUI (default)
kurier add invoice.pdf         # Classify and route a file
kurier watch                   # Auto-process files in inbox
kurier search "Rechnung"       # Hybrid keyword + semantic search
kurier status                  # Processing statistics
kurier undo                    # Undo last routing action
kurier export --format csv     # Export all items as CSV
kurier doctor                  # Check system health
kurier init                    # Interactive setup wizard

Configuration

Arkiv uses a TOML config file at ~/.config/arkiv/config.toml:

[llm]
provider = "ollama"
model = "mistral"
base_url = "http://localhost:11434"

[embeddings]
model = "BAAI/bge-small-en-v1.5"

[routes.archiv]
type = "folder"
path = "~/Documents/Arkiv/Archiv"
categories = ["rechnung", "vertrag", "brief"]
confidence_threshold = 0.7

[routes.artikel]
type = "folder"
path = "~/Documents/Arkiv/Artikel"
categories = ["artikel", "paper", "tutorial"]
confidence_threshold = 0.6

LLM Providers

Arkiv supports any LLM provider via LiteLLM:

Provider Config
Ollama (local) provider = "ollama", model = "qwen2.5:7b"
OpenAI provider = "openai", model = "gpt-4o-mini"
Anthropic provider = "anthropic", model = "claude-sonnet-4-5-20250514"
HuggingFace provider = "huggingface", model = "meta-llama/..."

REST API

Start the API server for external integrations, webhooks, and mobile capture:

# Install API dependencies
uv pip install "kurier[api] @ git+https://github.com/HerrStolzier/lotse.git"

# Start the server
kurier serve
# → http://127.0.0.1:8790/docs (Swagger UI)

Endpoints:

Method Endpoint Description
GET /health Health check
POST /ingest/file Upload a file for classification
POST /ingest/text Submit text for classification
GET /search?q=... Hybrid keyword + semantic search
GET /status Processing statistics
GET /recent Recently processed items
# Example: ingest a file via curl
curl -X POST http://localhost:8790/ingest/file -F "file=@invoice.pdf"

# Example: search
curl "http://localhost:8790/search?q=Telefonkosten&mode=auto"

Plugins

Arkiv is built to be extended. Plugins can:

  • Pre-process content before classification
  • Post-process classification results
  • Add custom routes (Slack, Notion, webhooks, ...)
  • React to routing events (notifications, logging, ...)

Writing a Plugin

# my_arkiv_plugin.py
from arkiv.plugins.spec import hookimpl

@hookimpl
def on_routed(path: str, destination: str, route_name: str) -> None:
    """Send a notification when a file is routed."""
    print(f"Routed {path}{destination}")
# pyproject.toml
[project.entry-points."arkiv.plugins"]
my-plugin = "my_arkiv_plugin"

See the Plugin Guide for details.

Architecture

src/arkiv/
├── cli.py              # Typer CLI interface
├── core/
│   ├── config.py       # TOML configuration
│   ├── classifier.py   # LLM classification engine
│   ├── router.py       # Route matching & execution
│   └── engine.py       # Main pipeline orchestrator
├── db/
│   └── store.py        # SQLite + FTS5 storage
├── inlets/
│   └── watch.py        # Filesystem watcher
├── plugins/
│   ├── spec.py         # Plugin hook specifications
│   └── manager.py      # Plugin discovery (pluggy)
└── routes/             # Built-in route handlers

Roadmap

  • Core pipeline: capture → classify → route
  • CLI interface
  • SQLite + FTS5 search
  • Plugin system (pluggy)
  • Filesystem watcher
  • REST API inlet (FastAPI with auto-docs)
  • Semantic search (FastEmbed + sqlite-vec hybrid search with RRF)
  • Web dashboard (HTMX + Tailwind, no build step)
  • Email inlet (IMAP fetch + .eml/.mbox import)
  • Webhook route plugin (Slack, Discord, generic)
  • OCR support (PyMuPDF + Tesseract)
  • Self-audit system (duplicates, misclassifications, orphaned files)
  • Interactive TUI (Textual)
  • Undo & Export commands
  • Retry logic with exponential backoff
  • Transaction safety (pending → routed/failed)
  • Custom categories via config
  • Browser extension

Development

# Clone
git clone https://github.com/HerrStolzier/lotse.git
cd lotse

# Create venv and install in development mode
uv venv && source .venv/bin/activate
uv pip install -e ".[dev,api,ocr,tui]"

# Run tests
pytest

# Lint + type check
ruff check src/
mypy src/

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kurier-0.3.0.tar.gz (306.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

kurier-0.3.0-py3-none-any.whl (89.5 kB view details)

Uploaded Python 3

File details

Details for the file kurier-0.3.0.tar.gz.

File metadata

  • Download URL: kurier-0.3.0.tar.gz
  • Upload date:
  • Size: 306.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kurier-0.3.0.tar.gz
Algorithm Hash digest
SHA256 38c02e816326f2ed3b58e8dfbb6cc3adab0620bce804360d672908ef530bac10
MD5 1bd25d86bc7446a741c77b4d422037a1
BLAKE2b-256 c045247f7927bdb7685f02a73f6c9ebc385bfc3f7385291e2bb52c10b5b36f99

See more details on using hashes here.

File details

Details for the file kurier-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: kurier-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 89.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for kurier-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eccb8e9bcb085a24d40bec75d05251f3ebe9c2cb684ff2ed544f3de8a985a3d4
MD5 55a667631fb1ac9dc3bae848f8c85d9a
BLAKE2b-256 3a4e6a6fc34e8399069d674cc8bc487f24570b10f5875ce57dcc27b1b8d6a553

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page