Audit PEA-eligibility of ETF KID documents with a vision LLM. French PEA (Plan d'Épargne en Actions) rules built in.
Project description
pea-audit
Audit French PEA (Plan d'Épargne en Actions) eligibility of ETFs by reading their KID (Key Information Document) with a vision LLM. Tells you whether a fund is actually eligible for a French PEA account — with verbatim citations from the document.
What is a PEA? France's tax-sheltered stock account (€150k cap, gains tax-free after 5 years). It only accepts EU-domiciled equities, or UCITS funds that synthetically replicate non-EU indexes (S&P 500, MSCI World, Nasdaq, …) via a swap on an EU-equity basket. Physical-replication funds of non-EU indexes — most iShares Core / Vanguard ETFs — don't qualify. This library tells you which side of that line your fund is on.
What's in this repo? Two things:
pea-audit— the library youpip install(lives inpea_audit/) — and ETFTracker — a reference app that consumes it (Streamlit dashboard + CLI + FastAPI at the repo root, plusetftracker/helper code). Most of this README is about the library; see ETFTracker.md for the app side (French).
Not a developer? Three options
The rest of this README is for Python devs adopting the library. If you just want to check your own PEA holdings:
- Run the dashboard locally —
git clone,cp .env.example .env(add your Ollama key),docker compose up -d web→ http://localhost:8502. Point-and-click verdicts; no code. - Upload via the HTTP API —
docker compose up -d apithenPOST /audit/uploadwith a PDF (Swagger docs at http://localhost:8080/docs). - Hire a dev or use a managed service — honestly the most realistic option for non-technical PEA holders. The library exists so a hosted version of this is buildable in a weekend.
Streamlit "Portefeuille" tab — 4 holdings with live yfinance prices and PEA-eligibility badges (✅ from the audit cache).
$ python audit_cli.py samples/amundi_pea_monde_kid.pdf
📄 Audit de : samples/amundi_pea_monde_kid.pdf
✅ ÉLIGIBLE PEA (confiance : high)
Émetteur : Amundi
ISIN : FR001400U5Q4
Indice : MSCI World Index EUR
Réplication : synthetic_swap
Le fonds est éligible au PEA car il utilise une réplication synthétique
via swap (IFT) avec un panier d'actions européennes ≥75%.
Preuves :
p.1 — « Le Fonds est éligible au Plan d'Épargne en Actions français (PEA) ... »
p.1 — « La performance sera échangée contre celle de l'Indice de Référence ... »
Why
PEA eligibility is opaque and changes silently — issuers re-domicile, swap counterparties, switch to ESG-screened variants, and rename funds (e.g. Amundi PEA Nasdaq-100 silently became "Amundi PEA US Tech Screened" under the same ticker). Brokers don't always flag this. pea-audit reads each fund's KID directly and tells you what the document actually says, with quotes you can verify.
Install
pip install pea-audit
Optional extras:
pip install 'pea-audit[observability]' # adds Langfuse for LLM tracing
pip install 'pea-audit[evals]' # adds pyyaml for the eval suite
pip install 'pea-audit[dev]' # everything above + python-dotenv
Quickstart
Get an Ollama Cloud key at https://ollama.com/settings/keys, then:
from pathlib import Path
from pea_audit import audit_pdf, VerdictCache
from pea_audit.llm import OllamaCloudClient
# Ollama Cloud keys look like "<32-hex-char id>.<24-char secret>"
# (not "sk-..." — that's the OpenAI format)
llm = OllamaCloudClient(api_key="abcdef0123456789abcdef0123456789.EXAMPLE-KEY-DO-NOT-USE")
# Cache is opt-in. Library never writes to disk unless you supply one.
cache = VerdictCache(Path("./cache"))
verdict = audit_pdf("path/to/kid.pdf", llm=llm, cache=cache)
print(verdict.eligible) # "yes" | "no" | "uncertain"
print(verdict.replication) # "physical" | "synthetic_swap" | "unknown"
print(verdict.isin) # deterministic — extracted from PDF text + Luhn-validated
for c in verdict.evidence:
print(f" p.{c.page}: « {c.quote} »")
Don't have a KID PDF handy? The repo ships samples/amundi_pea_monde_kid.pdf — clone or download it to try the example end-to-end on a real (PEA-eligible) Amundi fund.
Audit by ticker (built-in URL registry)
from pea_audit import audit_ticker, VerdictCache
from pea_audit.llm import OllamaCloudClient
llm = OllamaCloudClient(api_key="<your-ollama-cloud-key>")
cache = VerdictCache(Path("./cache"))
result = audit_ticker("EWLD.PA", llm=llm, kid_dir=Path("./kids"), cache=cache)
print(result.verdict.eligible) # "yes"
Built-ins ship for the most common French ETFs (Amundi PEA range, BNP Paribas Easy). Add more:
from pea_audit.sources import register_source, KIDSource
register_source(KIDSource(
ticker="LYX.PA",
isin="FR0010411884",
url="https://www.lyxoretf.fr/.../kid.pdf",
issuer="Lyxor",
))
Architecture
flowchart LR
A[KID PDF] --> B[pypdfium2<br/>rasterize pages]
A --> C[pypdfium2<br/>text layer]
C --> D[ISIN regex<br/>+ Luhn check]
B --> E[VisionLLM<br/>analyze_images]
D -.-> E
E --> F[PeaVerdict<br/>eligible / replication<br/>isin / evidence]
F --> G[VerdictCache<br/>sha256-keyed]
G --> H[Your app:<br/>CLI / Streamlit / FastAPI / …]
style E fill:#dbeafe,stroke:#1e40af
style D fill:#dcfce7,stroke:#166534
style F fill:#fef3c7,stroke:#854d0e
The LLM judges what the document says; deterministic regex + Luhn reconciles the ISIN string (vision is fuzzy on alphanumerics). The cache is opt-in — pass cache=None for a stateless library.
Two protocols make it extensible without forking:
VisionLLM — swap the model
from typing import Any, Protocol
class VisionLLM(Protocol):
def analyze_images(
self,
images: list[bytes],
prompt: str,
schema: dict[str, Any],
system: str | None = None,
) -> dict[str, Any]: ...
The default OllamaCloudClient wraps Gemma 4 via Ollama Cloud with tenacity retries on transient errors and optional Langfuse tracing. Anyone can implement this protocol to plug in Claude vision, GPT-4o, Gemini, a local Ollama instance, etc.
KIDSource — add issuers
from pea_audit.sources import register_source, KIDSource, get_source, all_sources
A registry of ticker → KID URL mappings. Ships builtins for Amundi (URL pattern), BNP Paribas (per-fund UUIDs); URL helpers for BlackRock/iShares + Vanguard are importable but don't auto-register (most of their funds are PEA-ineligible — they're for testing the negative path).
Eval baseline
The repo ships 13 regression cases under evals/cases/*.yaml — 7 PEA-eligible synthetic-swap, 6 ineligible physical non-EEA — covering Amundi, BNP, BlackRock/iShares, Vanguard. Current baseline on Gemma 4 31b-cloud: 13/13 (100%). Run before any prompt or model change:
python evals/run.py
What does it cost?
Default backend is Gemma 4 31b-cloud via Ollama Cloud:
| Operation | Approx. cost | Notes |
|---|---|---|
| One audit (cold cache) | ~$0.02 | 1 PDF, ~3 pages, vision model |
| One audit (cache hit) | $0 | sha256 lookup, no LLM call |
| Full eval suite (13 cases, cold) | ~$0.25 | Once per prompt/model change |
| Monthly portfolio re-audit (4 funds, force-refresh) | ~$0.10 | One scheduled run per month |
If you bring your own LLM via the VisionLLM protocol (Claude vision, GPT-4o, local Ollama, …), substitute that provider's per-image pricing — the library doesn't add overhead beyond one call per audit.
Production niceties
- Retries on transient errors —
tenacitywith exponential backoff (1s → 4s → 16s), only on network/timeout/5xx (not on 4xx or schema errors that won't self-resolve) - Optional observability — Langfuse traces per LLM call (model, input/output, tokens, latency). Activates when
LANGFUSE_PUBLIC_KEY/LANGFUSE_SECRET_KEYare set, silent no-op otherwise - Deterministic ISINs — vision misreads of the 12-char ISIN string are corrected by regex-extracting candidates from the PDF text layer and validating with the Luhn check digit
- Versioned prompts —
pea_audit/prompts/audit_v{N}.mdfiles, selected viaprompt_version=parameter; rollback is a config change, not a code edit - Hard vs soft fields in diffs —
compare_verdicts()defaults to comparing only categorical fields (eligible,replication,isin) so monthly re-audit doesn't false-fire on LLM rephrasing of free-text issuer/index names
Reference app: ETFTracker
The repo also ships a personal-tool app that consumes the library: a French ETF portfolio tracker with a Streamlit dashboard, monthly re-audit cron, FastAPI service, and Docker compose deployment. See ETFTracker.md (French) for that side.
To run it: cp positions.csv.example positions.csv, edit with your own holdings, cp .env.example .env with your Ollama key, then docker compose up -d web or streamlit run dashboard.py.
Contributing
See CONTRIBUTING.md. Maintainer? See PUBLISHING.md.
License
MIT.
Disclaimer
This is a personal-finance tool. The LLM-judged eligibility verdict is informational, not regulatory advice — always cross-check against the actual DIC/KID before buying.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pea_audit-0.2.0.tar.gz.
File metadata
- Download URL: pea_audit-0.2.0.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aaa4222f25e4b988e953776a56d59ecf5455d04728fbb002a093a9eadb6d0d94
|
|
| MD5 |
2f42726eac12774f1e1d251ef8028e8a
|
|
| BLAKE2b-256 |
508bc5d8b87821157a26b1aab8e430f0969a6cb58de4a718715a797eaa5a95c1
|
Provenance
The following attestation bundles were made for pea_audit-0.2.0.tar.gz:
Publisher:
release.yml on AndreLiar/pea-audit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pea_audit-0.2.0.tar.gz -
Subject digest:
aaa4222f25e4b988e953776a56d59ecf5455d04728fbb002a093a9eadb6d0d94 - Sigstore transparency entry: 1625014595
- Sigstore integration time:
-
Permalink:
AndreLiar/pea-audit@e584f048b05fd2d4be4cc726030444d5cb78ac20 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/AndreLiar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e584f048b05fd2d4be4cc726030444d5cb78ac20 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pea_audit-0.2.0-py3-none-any.whl.
File metadata
- Download URL: pea_audit-0.2.0-py3-none-any.whl
- Upload date:
- Size: 32.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a6aa1ad11f73357b9f23459c971a6a9ed6b02244a4d3bd1e85bcb7e92e6ccb97
|
|
| MD5 |
1df552ae7ad7e0888b3abe6e615b2a4b
|
|
| BLAKE2b-256 |
5d00dde4528a04cecd7d44d12a0117286efa7071b2d1e4272e323a2c5ea4f3b1
|
Provenance
The following attestation bundles were made for pea_audit-0.2.0-py3-none-any.whl:
Publisher:
release.yml on AndreLiar/pea-audit
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pea_audit-0.2.0-py3-none-any.whl -
Subject digest:
a6aa1ad11f73357b9f23459c971a6a9ed6b02244a4d3bd1e85bcb7e92e6ccb97 - Sigstore transparency entry: 1625014620
- Sigstore integration time:
-
Permalink:
AndreLiar/pea-audit@e584f048b05fd2d4be4cc726030444d5cb78ac20 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/AndreLiar
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@e584f048b05fd2d4be4cc726030444d5cb78ac20 -
Trigger Event:
push
-
Statement type: