An agentic Python library for extracting key information from receipts and preparing essential German tax return statements.

These details have not been verified by PyPI

Project links

Project description

finamt

English | German

An agentic Python library for extracting structured data from receipts and invoices and preparing essential German tax return statements.

Features

German Tax Alignment — Category taxonomy and VAT handling aligned with German fiscal practice
Local-First — Everything runs completely offline; models are auto-downloaded from HuggingFace and cached locally, data is stored in a local database
4-Agent Pipeline — Sequential specialised agents for metadata, counterparty, amounts, and line items; short focused prompts for reliable local model performance
Web UI — Full browser interface for uploading, reviewing, editing, and managing receipts and invoices and preparing tax returns

Tech Stack

Backend

Python — package language
FastAPI — backend for the web UI
PaddleOCR — OCR for scanned PDFs
Tesseract — OCR for scanned PDFs and images when PaddleOCR fails or times out
HuggingFace Hub — models downloaded and cached automatically on first use; no separate server required
- Mistral – mistral:7b is the recommended default (maps to mlx-community/Mistral-7B-Instruct-v0.3-4bit on Apple Silicon)
- Qwen – qwen2.5:7b-instruct-q4_K_M is a good alternative (maps to mlx-community/Qwen2.5-7B-Instruct-4bit on Apple Silicon)
- mlx-lm — 4-bit inference on Apple Silicon (M-series) via the MLX framework; ~13 % faster than Ollama
- transformers — cross-platform inference fallback (Linux / Windows / Intel Mac)
SQLite – local database for original receipts and extracted data

Frontend

React — interactive frontend
Vite — fast dev server and production bundler
Tailwind CSS — utility-first styling
TypeScript — type-safe component and API code

CLI

Typer — CLI with coloured progress output

Packaging

PyPI — distributed as an installable Python package

Installation

pip install finamt

For CLI usage, installing via pipx is recommended — it places finamt into its own dedicated virtual environment, ensuring its dependencies never interfere with your other projects, while still exposing the finamt command globally without requiring you to activate a virtualenv:

pipx install finamt

Note for Python 3.14+ users: finamt currently requires Python 3.13. If your system Python is 3.14 or newer, install uv to manage Python versions and pass the resolved path to pipx:
uv python install 3.13
pipx install finamt --python $(uv python find 3.13)

System Requirements

Python 3.10+
Tesseract OCR (optional fallback when PaddleOCR times out)

No Ollama required. LLM models are downloaded automatically from HuggingFace on first use (~4 GB per model) and cached at ~/.cache/huggingface/hub. On Apple Silicon the mlx-lm backend is used; on other platforms transformers is used.

Tesseract OCR (optional fallback from PaddleOCR)

Ubuntu / Debian

sudo apt-get install tesseract-ocr tesseract-ocr-deu

macOS

brew install tesseract tesseract-lang

Windows

Download the installer from https://github.com/UB-Mannheim/tesseract/wiki and add it to your PATH.

Quick Start

Interactive UI

finamt serve

Interactive UI to upload receipts and manage tax statements

Python API

Process a single receipt (expense)

from finamt import FinanceAgent

agent = FinanceAgent()
result = agent.process_receipt("receipt.pdf")

if result.success:
    data = result.data
    print(f"Counterparty: {data.vendor}")
    print(f"Date:         {data.receipt_date}")
    print(f"Total:        {data.total_amount} EUR")
    print(f"VAT:          {data.vat_percentage}% ({data.vat_amount} EUR)")
    print(f"Net:          {data.net_amount} EUR")
    print(f"Category:     {data.category}")
    print(f"Items:        {len(data.items)}")

    # Serialise to JSON
    with open("extracted.json", "w", encoding="utf-8") as f:
        f.write(data.to_json())
else:
    print(f"Extraction failed: {result.error_message}")

Sale invoices (outgoing)

result = agent.process_receipt("invoice_to_client.pdf", receipt_type="sale")

Batch processing

from pathlib import Path
from finamt import FinanceAgent

agent = FinanceAgent()
results = agent.batch_process(list(Path("receipts/").glob("*.pdf")))

for path, result in results.items():
    if result.success:
        print(f"{path}: {result.data.total_amount} EUR")
    else:
        print(f"{path}: ERROR — {result.error_message}")

Configuration

Settings are read in priority order from: environment variables → .env file → built-in defaults.

# .env

# OCR and general settings
FINAMT_OCR_LANGUAGE=german
FINAMT_OCR_TIMEOUT=60
FINAMT_TESSERACT_CMD=tesseract
FINAMT_OCR_PREPROCESS=true
FINAMT_PDF_DPI=150

# Extraction agents — all 4 agents use this model
FINAMT_AGENT_MODEL=mistral:7b
FINAMT_AGENT_TIMEOUT=60
FINAMT_AGENT_NUM_CTX=4096
FINAMT_AGENT_MAX_RETRIES=2

You can also pass config objects directly:

from finamt import FinanceAgent
from finamt.agents.config import Config, AgentsConfig

agent = FinanceAgent(
    config=Config(ocr_language="deu+eng", pdf_dpi=150),
    agents_cfg=AgentsConfig(agent_model="mistral:7b"),
)

API Reference

FinanceAgent

class FinanceAgent:
    def __init__(
        self,
        config:     Config | None = None,
        db_path:    str | Path | None = "~/.finamt/default/finamt.db",
        agents_cfg: AgentsConfig | None = None,
    ) -> None: ...

    def process_receipt(
        self,
        pdf_path:     str | Path | bytes,
        receipt_type: str = "purchase",   # "purchase" or "sale"
    ) -> ExtractionResult: ...

    def batch_process(
        self,
        pdf_paths:    list[str | Path],
        receipt_type: str = "purchase",
    ) -> dict[str, ExtractionResult]: ...

ExtractionResult

Always check success before accessing data.

@dataclass
class ExtractionResult:
    success:         bool
    data:            ReceiptData | None
    error_message:   str | None
    duplicate:       bool                  # True if already in the database
    existing_id:     str | None            # ID of the original if duplicate
    processing_time: float | None          # seconds

    def to_dict(self) -> dict: ...

ReceiptData

@dataclass
class ReceiptData:
    id:               str                  # SHA-256 of OCR text — stable dedup key
    receipt_type:     ReceiptType          # "purchase" or "sale"
    counterparty:     Counterparty | None  # vendor (purchase) or client (sale)
    receipt_number:   str | None
    receipt_date:     datetime | None
    total_amount:     Decimal | None
    currency:         str | "EUR"
    vat_percentage:   Decimal | None       # e.g. Decimal("19.0")
    vat_amount:       Decimal | None
    net_amount:       Decimal | None       # computed: total - vat
    category:         ReceiptCategory
    items:            list[ReceiptItem]
    vat_splits:       list[dict]           # for mixed-rate invoices

    vendor: str | None                     # alias for counterparty.name

    def to_dict(self) -> dict: ...
    def to_json(self) -> str: ...

Counterparty

@dataclass
class Counterparty:
    id:          str           # UUID assigned by the database
    name:        str | None
    vat_id:      str | None    # EU format, e.g. DE123456789
    tax_number:  str | None    # German Steuernummer, e.g. 123/456/78901
    address:     Address
    verified:    bool          # manually confirmed in the UI

ReceiptItem

@dataclass
class ReceiptItem:
    position:    int | None
    description: str
    quantity:    Decimal | None
    unit_price:  Decimal | None
    total_price: Decimal | None
    vat_rate:    Decimal | None
    vat_amount:  Decimal | None
    category:    ReceiptCategory

    def to_dict(self) -> dict: ...

ReceiptCategory

A validated string subclass. Invalid values are silently normalised to "other".

from finamt.agents.prompts import RECEIPT_CATEGORIES   # list[str]
from finamt.models import ReceiptCategory

cat = ReceiptCategory("software")       # valid
cat = ReceiptCategory("unknown_value")  # normalised to "other"
cat = ReceiptCategory.other()           # explicit fallback

Exceptions

All exceptions inherit from FinanceAgentError.

Exception	Raised when
`OCRProcessingError`	PDF cannot be opened or text extraction fails
`LLMExtractionError`	Model returns invalid JSON after all retries
`InvalidReceiptError`	Extracted data fails business-logic validation

from finamt.exceptions import FinanceAgentError, OCRProcessingError

try:
    result = agent.process_receipt("scan.pdf")
except OCRProcessingError as e:
    print(e)

Extraction Pipeline

Each receipt goes through four sequential LLM calls, each with a short focused prompt:

Agent	Extracts
Agent 1	Receipt number, date, category
Agent 2	Counterparty name, VAT ID, Steuernummer, address
Agent 3	Total amount, VAT percentage, VAT amount
Agent 4	Line items (description, VAT rate, VAT amount, price)

Results are merged in Python — no additional LLM validation step. Debug output for every agent (prompt, raw response, parsed JSON) is saved to ~/.finamt/debug/<receipt_id>/.

Categories and Subcategories

Every receipt is tagged with a category and optional subcategory. Categories map directly to line items in the German ELSTER tax forms (EÜR / UStVA), so the correct totals land in the right fields without manual re-sorting.

Category	Subcategories
`services`	`freelance` `consulting` `legal` `accounting` `notary`
`products`	`physical_goods` `digital_goods` `merchandise` `samples`
`material`	`consumables` `raw_materials` `packaging` `merchandise`
`equipment`	`low_value_asset` `computer` `machinery` `furniture` `tools`
`software`	`subscriptions` `pay_as_you_go` `licenses` `hosting` `domains`
`licensing`	`software_licenses` `media_licenses` `other_ip`
`telecommunication`	`phone` `internet` `bundled`
`travel`	`transport` `accommodation` `meals` `per_diem` `incidental`
`car`	`fuel` `parking` `garage` `repair` `maintenance` `insurance` `leasing` `rental`
`education`	`courses` `books` `conferences` `certifications`
`utilities`	`electricity` `heating` `water` `waste`
`insurance`	`liability` `health` `vehicle` `property`
`financial`	`bank_fees` `interest` `loan_costs` `payment_fees`
`office`	`rent` `coworking` `storage` `cleaning` `security`
`marketing`	`advertising` `print_media` `trade_fairs` `sponsorship` `gifts`
`donations`	`charitable` `political` `church`
`public_fees`	`broadcasting_fee` `ihk_hwk` `berufsgenossenschaft` `other_public_fee`
`other`	`membership_fees` `sundry`

TODO

Receipt processing

OCR pipeline (PaddleOCR + Tesseract fallback)
4-agent extraction (metadata, counterparty, amounts, line items)
Deduplication, database storage, batch processing

Tax calculation

UStVA — VAT pre-return (monthly / quarterly)
UStE — annual VAT return
EÜR — income-surplus statement
KSt 1 — corporate income tax return
GewSt — trade tax return
Jahresabschluss — annual accounts (Bilanz + GuV, § 267a HGB)

ELSTER transmission

UStVA — ELSTER XML builder + Kennzahlen mapper + RSA signing + HTTP submission
E-Bilanz — XBRL instance document (HGB taxonomy v6, MicroBilG schema)
E-Bilanz — ERiC ctypes bridge for actual transmission
KSt, GewSt, UStVA, USt — ELSTER XML builder
EÜR, ESt — ELSTER XML builder

Validation

XSD validation of generated XBRL against HGB taxonomy
ELSTER dry-run / test-server validation before live submission

Contributing

Fork the repository
Create a feature branch (git checkout -b feature/my-change) and make your changes

Run the test suite:

pytest --cov=src --cov-report=term-missing

Lint and format with Ruff

Ruff:

ruff check --fix src/ tests/
ruff format src/ tests/

Update the documentation
Submit a pull request

License

AGPL-3.0 — see LICENSE for details.

Commercial Licensing

finamt is available under the AGPL-3.0 license.

If you wish to use finamt in a proprietary setting, without the obligations of the AGPL (e.g. without releasing source code or for use in a commercial SaaS product), a commercial license is available.

For inquiries, contact: info@spaceoctahedron.com

Third-Party Components and Models

This software depends on external libraries and services, including:

PaddleOCR (Apache License 2.0)
Tesseract OCR (Apache License 2.0)
HuggingFace Hub / transformers (Apache License 2.0)
mlx-lm (MIT License, Apple Silicon only)

finamt downloads language models from HuggingFace (e.g. Mistral, Qwen) on first use and caches them locally.

These models are not distributed with this software and are subject to their own licenses. Users are responsible for complying with the respective terms when downloading and using such models.

Disclaimer

This software is provided for informational and automation purposes only.

It does not constitute tax, legal, or accounting advice.

While finamt is designed to assist with the preparation of German tax-related data (e.g. VAT returns, EÜR, ELSTER submissions), no guarantee is made regarding:

correctness of extracted data
completeness of financial records
compliance with applicable tax laws and regulations
acceptance by tax authorities

Users are solely responsible for verifying all outputs before submission to any authority.

Always consult a qualified tax advisor (Steuerberater) for legally binding guidance.

To the maximum extent permitted by law, Space Octahedron GmbH assumes no liability for:

errors in OCR or LLM-based extraction
incorrect classifications or calculations
rejected or incorrect tax filings
financial losses or penalties arising from use of this software

Product Information (ELSTER)

Produktname: Space Octahedron® finamt
Hersteller: Space Octahedron GmbH
Kontakt: info@spaceoctahedron.com

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.20.2

Jun 13, 2026

0.20.1

May 31, 2026

0.20.0

May 25, 2026

0.19.0

May 24, 2026

0.18.0

May 17, 2026

0.17.3

May 17, 2026

0.17.2

May 3, 2026

0.17.1

Apr 26, 2026

0.17.0

Apr 25, 2026

0.16.2

Apr 25, 2026

0.16.1

Apr 19, 2026

0.16.0

Apr 19, 2026

0.15.0

Apr 12, 2026

0.14.4

Apr 7, 2026

0.14.3

Apr 7, 2026

0.14.2

Apr 6, 2026

0.14.1

Apr 6, 2026

0.14.0

Apr 5, 2026

0.13.2

Apr 5, 2026

0.13.1

Apr 5, 2026

0.13.0

Apr 4, 2026

0.12.6

Apr 3, 2026

0.12.5

Apr 3, 2026

0.12.4

Apr 3, 2026

0.12.3

Apr 3, 2026

0.12.2

Apr 3, 2026

0.12.1

Apr 3, 2026

0.12.0

Apr 3, 2026

0.11.4

Mar 26, 2026

0.11.3

Mar 26, 2026

0.11.2

Mar 25, 2026

0.11.1

Mar 23, 2026

0.11.0

Mar 23, 2026

0.10.1

Mar 22, 2026

0.10.0

Mar 21, 2026

0.9.2

Mar 21, 2026

0.9.1

Mar 21, 2026

0.9.0

Mar 21, 2026

0.8.1

Mar 20, 2026

0.8.0

Mar 20, 2026

0.7.5

Mar 20, 2026

0.7.4

Mar 17, 2026

0.7.3

Mar 17, 2026

0.7.2

Mar 17, 2026

0.7.1

Mar 17, 2026

0.7.0

Mar 17, 2026

0.6.0

Mar 15, 2026

0.5.6

Mar 15, 2026

0.5.5

Mar 15, 2026

0.5.4

Mar 14, 2026

0.5.3

Mar 10, 2026

0.5.2

Mar 10, 2026

0.5.1

Mar 7, 2026

0.5.0

Mar 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

finamt-0.20.2.tar.gz (3.3 MB view details)

Uploaded Jun 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

finamt-0.20.2-py3-none-any.whl (3.3 MB view details)

Uploaded Jun 13, 2026 Python 3

File details

Details for the file finamt-0.20.2.tar.gz.

File metadata

Download URL: finamt-0.20.2.tar.gz
Upload date: Jun 13, 2026
Size: 3.3 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for finamt-0.20.2.tar.gz
Algorithm	Hash digest
SHA256	`d6b01172ffd99173ab22469b644878f4df5a881bd15dbbc334570cf7ab49966e`
MD5	`942ffc0af7cea2a05f354e8944180f78`
BLAKE2b-256	`111a5700d94e7ad52211dc9298fe96ff730c16a2f2bdeab919bfcc122a5c7fd3`

See more details on using hashes here.

File details

Details for the file finamt-0.20.2-py3-none-any.whl.

File metadata

Download URL: finamt-0.20.2-py3-none-any.whl
Upload date: Jun 13, 2026
Size: 3.3 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.11

File hashes

Hashes for finamt-0.20.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f5357e739e4b465056f2431da81847e3cc19029137aae46301b21f5901e8a1c6`
MD5	`b5dcce5b2011d5ba13a7c89c1085ed5e`
BLAKE2b-256	`673dfb242fff2a381e4aad3502daa20f413c79d4f88a56697308145978cb7458`

See more details on using hashes here.

finamt 0.20.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

finamt

Features

Tech Stack

Installation

System Requirements

Tesseract OCR (optional fallback from PaddleOCR)

Quick Start

Interactive UI

Python API

Process a single receipt (expense)

Sale invoices (outgoing)

Batch processing

Configuration

API Reference

FinanceAgent

ExtractionResult

ReceiptData

Counterparty

ReceiptItem

ReceiptCategory

Exceptions

Extraction Pipeline

Categories and Subcategories

TODO

Contributing

License

Commercial Licensing

Third-Party Components and Models

Disclaimer

Product Information (ELSTER)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes