An agentic Python library for extracting key information from receipts and preparing essential German tax return statements.
Project description
finamt
English |
German
An agentic Python library for extracting structured data from receipts and invoices and preparing essential German tax return statements.
Features
- German Tax Alignment — Category taxonomy and VAT handling aligned with German fiscal practice managing receipts
- Local-First — Everything runs completely offline, with data stored in a local database
- 4-Agent Pipeline — Sequential specialised agents for metadata, counterparty, amounts, and line items; short focused prompts for reliable local model performance
- Web UI — Full browser interface for uploading, reviewing, editing, and
Tech Stack
Backend
Python — package language
FastAPI — backend for the web UI
PaddleOCR — OCR for scanned PDFs
Tesseract — OCR for scanned PDFs and images when PaddleOCR fails or times out
Ollama — local LLMs for structured extraction of information from receipts and invoices
Qwen – laptop-compatible LLMs with qwen2.5:7b-instruct-q4_K_M currently as preferred default for text-based extraction
SQLite – local database for original receipts and extracted data
Frontend
React — interactive frontend
Vite — fast dev server and production bundler
Tailwind CSS — utility-first styling
TypeScript — type-safe component and API code
CLI
Typer — CLI with coloured progress output
Packaging
PyPI — distributed as an installable Python package
Installation
pip install finamt
For CLI usage, installing via pipx is recommended — it places finamt into its own dedicated virtual environment, ensuring its dependencies never interfere with your other projects, while still exposing the finamt command globally without requiring you to activate a virtualenv:
pipx install finamt
System Requirements
- Python 3.10+
- Ollama running locally with a supported model pulled
- Tesseract OCR (optional fallback when PaddleOCR times out)
Ollama
# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh
# Pull a model — qwen2.5 7B is the recommended default
ollama pull qwen2.5:7b-instruct-q4_K_M
Other models that work well: qwen3:8b, llama3.2, llama3.1.
Tesseract OCR (optional fallback from PaddleOCR)
Ubuntu / Debian
sudo apt-get install tesseract-ocr tesseract-ocr-deu
macOS
brew install tesseract tesseract-lang
Windows
Download the installer from https://github.com/UB-Mannheim/tesseract/wiki and add it to your PATH.
Quick Start
Interactive UI
finamt serve
Interactive UI to upload receipts and manage tax statements
Python API
Process a single receipt (expense)
from finamt import FinanceAgent
agent = FinanceAgent()
result = agent.process_receipt("receipt.pdf")
if result.success:
data = result.data
print(f"Counterparty: {data.vendor}")
print(f"Date: {data.receipt_date}")
print(f"Total: {data.total_amount} EUR")
print(f"VAT: {data.vat_percentage}% ({data.vat_amount} EUR)")
print(f"Net: {data.net_amount} EUR")
print(f"Category: {data.category}")
print(f"Items: {len(data.items)}")
# Serialise to JSON
with open("extracted.json", "w", encoding="utf-8") as f:
f.write(data.to_json())
else:
print(f"Extraction failed: {result.error_message}")
Sale invoices (outgoing)
result = agent.process_receipt("invoice_to_client.pdf", receipt_type="sale")
Batch processing
from pathlib import Path
from finamt import FinanceAgent
agent = FinanceAgent()
results = agent.batch_process(list(Path("receipts/").glob("*.pdf")))
for path, result in results.items():
if result.success:
print(f"{path}: {result.data.total_amount} EUR")
else:
print(f"{path}: ERROR — {result.error_message}")
Configuration
Settings are read in priority order from: environment variables → .env file → built-in defaults.
# .env
# OCR and general settings
FINAMT_OLLAMA_BASE_URL=http://localhost:11434
FINAMT_OCR_LANGUAGE=german
FINAMT_OCR_TIMEOUT=60
FINAMT_TESSERACT_CMD=tesseract
FINAMT_OCR_PREPROCESS=true
FINAMT_PDF_DPI=150
# Extraction agents — all 4 agents use this model
FINAMT_AGENT_MODEL=qwen2.5:7b-instruct-q4_K_M
FINAMT_AGENT_TIMEOUT=60
FINAMT_AGENT_NUM_CTX=4096
FINAMT_AGENT_MAX_RETRIES=2
You can also pass config objects directly:
from finamt import FinanceAgent
from finamt.agents.config import Config, AgentsConfig
agent = FinanceAgent(
config=Config(ocr_language="deu+eng", pdf_dpi=150),
agents_cfg=AgentsConfig(agent_model="qwen3:8b"),
)
API Reference
FinanceAgent
class FinanceAgent:
def __init__(
self,
config: Config | None = None,
db_path: str | Path | None = "~/.finamt/default/finamt.db",
agents_cfg: AgentsConfig | None = None,
) -> None: ...
def process_receipt(
self,
pdf_path: str | Path | bytes,
receipt_type: str = "purchase", # "purchase" or "sale"
) -> ExtractionResult: ...
def batch_process(
self,
pdf_paths: list[str | Path],
receipt_type: str = "purchase",
) -> dict[str, ExtractionResult]: ...
ExtractionResult
Always check success before accessing data.
@dataclass
class ExtractionResult:
success: bool
data: ReceiptData | None
error_message: str | None
duplicate: bool # True if already in the database
existing_id: str | None # ID of the original if duplicate
processing_time: float | None # seconds
def to_dict(self) -> dict: ...
ReceiptData
@dataclass
class ReceiptData:
id: str # SHA-256 of OCR text — stable dedup key
receipt_type: ReceiptType # "purchase" or "sale"
counterparty: Counterparty | None # vendor (purchase) or client (sale)
receipt_number: str | None
receipt_date: datetime | None
total_amount: Decimal | None
currency: str | "EUR"
vat_percentage: Decimal | None # e.g. Decimal("19.0")
vat_amount: Decimal | None
net_amount: Decimal | None # computed: total - vat
category: ReceiptCategory
items: list[ReceiptItem]
vat_splits: list[dict] # for mixed-rate invoices
vendor: str | None # alias for counterparty.name
def to_dict(self) -> dict: ...
def to_json(self) -> str: ...
Counterparty
@dataclass
class Counterparty:
id: str # UUID assigned by the database
name: str | None
vat_id: str | None # EU format, e.g. DE123456789
tax_number: str | None # German Steuernummer, e.g. 123/456/78901
address: Address
verified: bool # manually confirmed in the UI
ReceiptItem
@dataclass
class ReceiptItem:
position: int | None
description: str
quantity: Decimal | None
unit_price: Decimal | None
total_price: Decimal | None
vat_rate: Decimal | None
vat_amount: Decimal | None
category: ReceiptCategory
def to_dict(self) -> dict: ...
ReceiptCategory
A validated string subclass. Invalid values are silently normalised to "other".
from finamt.agents.prompts import RECEIPT_CATEGORIES # list[str]
from finamt.models import ReceiptCategory
cat = ReceiptCategory("software") # valid
cat = ReceiptCategory("unknown_value") # normalised to "other"
cat = ReceiptCategory.other() # explicit fallback
Exceptions
All exceptions inherit from FinanceAgentError.
| Exception | Raised when |
|---|---|
OCRProcessingError |
PDF cannot be opened or text extraction fails |
LLMExtractionError |
Ollama is unreachable or returns invalid JSON after all retries |
InvalidReceiptError |
Extracted data fails business-logic validation |
from finamt.exceptions import FinanceAgentError, OCRProcessingError
try:
result = agent.process_receipt("scan.pdf")
except OCRProcessingError as e:
print(e)
Extraction Pipeline
Each receipt goes through four sequential LLM calls, each with a short focused prompt:
| Agent | Extracts |
|---|---|
| Agent 1 | Receipt number, date, category |
| Agent 2 | Counterparty name, VAT ID, Steuernummer, address |
| Agent 3 | Total amount, VAT percentage, VAT amount |
| Agent 4 | Line items (description, VAT rate, VAT amount, price) |
Results are merged in Python — no additional LLM validation step. Debug output for every agent (prompt, raw response, parsed JSON) is saved to ~/.finamt/debug/<receipt_id>/.
Categories and Subcategories
Every receipt is tagged with a category and optional subcategory. Categories map directly to line items in the German ELSTER tax forms (EÜR / UStVA), so the correct totals land in the right fields without manual re-sorting.
| Category | Subcategories |
|---|---|
services |
freelance consulting legal accounting notary |
products |
physical_goods digital_goods merchandise samples |
material |
consumables raw_materials packaging merchandise |
equipment |
low_value_asset computer machinery furniture tools |
software |
subscriptions pay_as_you_go licenses hosting domains |
licensing |
software_licenses media_licenses other_ip |
telecommunication |
phone internet bundled |
travel |
transport accommodation meals per_diem incidental |
car |
fuel parking garage repair maintenance insurance leasing rental |
education |
courses books conferences certifications |
utilities |
electricity heating water waste |
insurance |
liability health vehicle property |
financial |
bank_fees interest loan_costs payment_fees |
office |
rent coworking storage cleaning security |
marketing |
advertising print_media trade_fairs sponsorship gifts |
donations |
charitable political church |
other |
membership_fees sundry |
TODO
- Receipt parsing
- Tax calculation engine
- ELSTER field mapper
- XML generator
- XSD validator
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/my-change) - Make your changes
- Run the test suite:
pytest --cov=src --cov-report=term-missing - Submit a pull request
License
MIT — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file finamt-0.13.0.tar.gz.
File metadata
- Download URL: finamt-0.13.0.tar.gz
- Upload date:
- Size: 241.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a0be0f7318feabc96c653418d214006c16f85eddad9b004012455f5c29770ed
|
|
| MD5 |
c8d1d56066bd3bb5dd7b8680723fd741
|
|
| BLAKE2b-256 |
25294caf3f2b30a852567d246be16c9e93aba2f9815c2a0377d99c1cf77dc123
|
File details
Details for the file finamt-0.13.0-py3-none-any.whl.
File metadata
- Download URL: finamt-0.13.0-py3-none-any.whl
- Upload date:
- Size: 215.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa909c29b1b61ad1703511c78407d77b142b2bd834dac90d9b5b49a5588f74b8
|
|
| MD5 |
0d19c1c507df38fc79253c9b15431dd9
|
|
| BLAKE2b-256 |
4c607b989d1c7d7fd791bcb1b0c87ce5290cf740c1126488cf24cf40b10b7767
|