Local-first file scanner and labeler for AI agent governance. Dutch-first, GDPR-aware.
Project description
Filenthropist
The only local-first tool that scans your files for Dutch PII, labels them for AI agent access, and stores only redacted previews — so your governance tool never becomes a data breach.
Filenthropist is a file scanner and labeler built for Dutch SMEs navigating GDPR. It runs entirely on your machine, never sends data to the cloud, and gives AI agents a single API endpoint to check before touching any file.
Why Filenthropist?
- Local-only processing — files never leave your machine. No cloud APIs, no data uploads, no telemetry.
- Dutch-first PII detection — built for BSN, IBAN, Dutch addresses, KvK numbers, and NL-specific identifiers. Not a US-centric tool with a language pack bolted on.
- Redacted previews only — the database stores redacted text, not raw PII. Even if someone accesses the SQLite file, they get
[NAAM]and[BSN], never the personal data. - AI agent gateway — one API call (
/api/can-access) tells any agent whether a file is safe to use. - GDPR Article 30 RoPA export — generates a verwerkingsregister directly from your scanned files, with Dutch legal bases pre-filled per document type.
- Advisory-only — Filenthropist labels and advises. It never deletes, modifies, or moves your files.
System requirements
- Python 3.11 or newer
- Tesseract 5 with the Dutch language pack (needed for OCR on scanned PDFs and images)
- ~1 GB free disk for a PII NER model (downloaded on first run)
Install Tesseract
macOS (Homebrew):
brew install tesseract tesseract-lang
Debian / Ubuntu:
sudo apt install tesseract-ocr tesseract-ocr-nld
For a step-by-step guide covering Python, Tesseract, Windows, and common errors, see docs/INSTALL.md.
Install
We recommend pipx so Filenthropist lives in its own isolated environment:
pipx install "filenthropist[all]"
The base install is intentionally minimal. Pick the extras that match what you need:
pipx install filenthropist # CLI only, no OCR/NER/web
pipx install "filenthropist[ocr]" # + Tesseract OCR
pipx install "filenthropist[ner]" # + Dutch/multilingual PII NER
pipx install "filenthropist[web]" # + local dashboard
pipx install "filenthropist[all]" # everything (recommended)
Quick start
Four commands take you from a fresh install to a usable dashboard:
filenthropist doctor # verify environment
filenthropist init # interactive model picker
filenthropist scan ~/Documents # run first scan
filenthropist serve # open web dashboard
doctor— checks Python version, Tesseract install (with Dutch pack), writable config directory, and reports anything missing so you can fix it before scanning.init— walks you through selecting a PII NER model based on your language and priority (speed, balanced, accuracy). Downloads the model and writes your config.scan <path>— walks the directory, extracts text (including OCR for scanned PDFs), classifies each document, detects PII, and stores redacted labels in~/.filenthropist/filenthropist.db.serve— starts the local dashboard athttp://localhost:8080for reviewing labels and making retention decisions.
For three persona-based walkthroughs (freelancer, SME, multilingual org), see docs/QUICKSTART.md.
Choosing a PII model
Filenthropist supports multiple PII NER models so you can trade off language coverage, accuracy, and speed.
- Recommended for Dutch documents:
LokaalHub/nl-lokaal-middel— F1 0.84 on the ai4privacy Dutch validation set. Detects Dutch names, addresses, BSN-in-context, IBAN, phone, email, and more. A faster/smaller siblingLokaalHub/nl-lokaal-klein(F1 0.78, ~180 MB) is the default for laptops and thecombinedprovider. - Multilingual and English-only options are available for mixed-language or English-only corpora.
- Speed vs. accuracy — each model is tagged with a priority tier (
fast,balanced,accuracy) so you can match it to your hardware.
Browse and pick interactively:
filenthropist init # wizard: language + priority
filenthropist models list # show every model in the registry
filenthropist models info <model-id> # details for one model
Full decision tree and registry docs: docs/MODELS.md.
Scanning & querying
# Show all files with sensitive PII
filenthropist query --access-level sensitive_restricted
# Export all labels as JSON
filenthropist export --format json --output labels.json
# Export a GDPR Article 30 verwerkingsregister
filenthropist ropa --format csv --output verwerkingsregister.csv
The web dashboard (filenthropist serve) exposes the same data plus a review workflow for non-technical users.
AI agents integrate via a local HTTP API — GET /api/can-access?path=... returns an allow/deny decision, and GET /api/redacted?path=... returns the document text with PII replaced by type labels ([NAAM], [BSN], [IBAN]). See docs/AGENT_INTEGRATION.md for the full endpoint list and integration patterns.
Configuration
On first run, Filenthropist writes ~/.filenthropist/config.yaml. Edit it to tune scan behaviour, PII provider, and retention:
scan:
ignore_patterns: [".git", "node_modules", "__pycache__", ".venv"]
max_file_size_mb: 100
pii:
provider: "combined" # "regex", "ner", "combined", "http", or "stub"
ner_model_id: "LokaalHub/nl-lokaal-middel"
labeling:
retention_policy_years: 2
classification:
zeroshot_enabled: true
The combined provider runs regex detectors (BSN, IBAN, phone, email, postcode) alongside the NER model — structured PII that NER often misses is still caught.
Privacy & security
- Fully local. No network calls except model downloads, which you can pre-cache for offline use.
- Advisory-only. Filenthropist never deletes, moves, or modifies your files.
- Redacted previews. The database stores redacted text, so compromising the DB does not leak raw PII.
For the threat model and hardening recommendations, see SECURITY.md.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file filenthropist-0.3.0.tar.gz.
File metadata
- Download URL: filenthropist-0.3.0.tar.gz
- Upload date:
- Size: 1.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7a86ca2fc51c222072ea101ef18dc49e52c24404427c3eefe6add3510a8f44bf
|
|
| MD5 |
ce4f3899403cad8300b4f48f526b5c0f
|
|
| BLAKE2b-256 |
5385a1e400f71308c4b5bc9f281938636919784efa9d254c3d7c91a9e60a9311
|
Provenance
The following attestation bundles were made for filenthropist-0.3.0.tar.gz:
Publisher:
publish.yml on LokaalHub/filenthropist
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filenthropist-0.3.0.tar.gz -
Subject digest:
7a86ca2fc51c222072ea101ef18dc49e52c24404427c3eefe6add3510a8f44bf - Sigstore transparency entry: 1341824339
- Sigstore integration time:
-
Permalink:
LokaalHub/filenthropist@808b4088a4b2b0892a93590209d39e4b89599600 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/LokaalHub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@808b4088a4b2b0892a93590209d39e4b89599600 -
Trigger Event:
release
-
Statement type:
File details
Details for the file filenthropist-0.3.0-py3-none-any.whl.
File metadata
- Download URL: filenthropist-0.3.0-py3-none-any.whl
- Upload date:
- Size: 1.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1ea4ed7d95363178d4c974e7c9049a8beb9e9b350a6c6b360019ad0b105a838a
|
|
| MD5 |
b09194bea6fa734e6c2930bd14dcd698
|
|
| BLAKE2b-256 |
c498b1d2940189186d28cbb675939d063a3bc3a69d8e926a722daf890ff72d13
|
Provenance
The following attestation bundles were made for filenthropist-0.3.0-py3-none-any.whl:
Publisher:
publish.yml on LokaalHub/filenthropist
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
filenthropist-0.3.0-py3-none-any.whl -
Subject digest:
1ea4ed7d95363178d4c974e7c9049a8beb9e9b350a6c6b360019ad0b105a838a - Sigstore transparency entry: 1341824345
- Sigstore integration time:
-
Permalink:
LokaalHub/filenthropist@808b4088a4b2b0892a93590209d39e4b89599600 -
Branch / Tag:
refs/tags/v0.3.0 - Owner: https://github.com/LokaalHub
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@808b4088a4b2b0892a93590209d39e4b89599600 -
Trigger Event:
release
-
Statement type: