Skip to main content

Parse bank statement PDFs, extract transactions, and persist to Parquet and SQLite.

Project description

uk-bank-statement-parser

License: MIT Python 3.14+

Parse bank statement PDFs, extract structured transaction data, validate financial information through checks and balances, and persist results to Parquet files and a SQLite star-schema data mart. Export reports as Excel workbooks or CSV files.

Features

  • PDF extraction — configurable pattern-based parsing of bank statement PDFs using pdfplumber.
  • Checks and balances — automatic validation of opening/closing balances, payment totals, and running balances against statement header values.
  • Dual persistence — write results to Parquet files, a SQLite database, or both.
  • Star-schema data mart — automatically builds dimension and fact tables (DimTime, DimAccount, DimStatement, FactTransaction, FactBalance) plus a GapReport for detecting missing statements.
  • Dual report backends — read the same report classes from either Parquet or SQLite, with identical schemas.
  • Export — single flat transactions table (default) or separate star-schema tables, as Excel and/or CSV.
  • PDF anonymisation — redact personally identifiable information from statement PDFs using a user-supplied mapping file. Transaction descriptions are scrambled so merchant names cannot be recovered.
  • Parallel processing — async + multiprocess batch mode for large PDF sets.
  • Cross-platform — pure Python with no OS-specific dependencies.

Installation

Using uv (recommended)

uv manages its own Python installations, so no system Python 3.14 is required. If uv is not already installed, follow the uv installation guide.

uv tool install uk-bank-statement-parser

This creates an isolated environment and puts bsp on your $PATH.

To upgrade later:

uv tool upgrade uk-bank-statement-parser

Debian / Ubuntu (.deb)

Download the .deb from the latest GitHub Release, then install:

sudo dpkg -i uk-bank-statement-parser_*_all.deb

This installs a self-contained virtualenv to /opt/uk-bank-statement-parser/ and a bsp wrapper to /usr/bin/bsp. No system Python is required. Uninstall with sudo dpkg -r uk-bank-statement-parser.

Fedora / RHEL (.rpm)

Download the .rpm from the latest GitHub Release, then install:

sudo rpm -i uk-bank-statement-parser-*-1.noarch.rpm

No system Python is required. Uninstall with sudo rpm -e uk-bank-statement-parser.

From source

git clone https://github.com/boscorat/bank_statement_parser.git
cd bank_statement_parser
uv sync

Prefer not to use uv? See Alternative installation (pipx / venv) for instructions using pipx or a manually created virtual environment.

Quick Start

Command line

Process all PDFs in a folder and export an Excel workbook and CSV file:

bsp process --pdfs ~/statements/

This creates a bsp_project/ directory in your current working directory containing the SQLite database, Parquet files, and exported reports.

Python API

import bank_statement_parser as bsp
from pathlib import Path

# Process a batch of PDFs
batch = bsp.StatementBatch(pdfs=sorted(Path("~/statements").expanduser().glob("*.pdf")))

# Persist to Parquet + SQLite
batch.update_data()

# Export a flat transactions table as Excel and CSV
batch.export(filetype="both")

# Copy source PDFs into the project tree
batch.copy_statements_to_project()

# Clean up temporary files
batch.delete_temp_files()

Read reports directly:

import bank_statement_parser as bsp

# From the SQLite backend
flat = bsp.db.FlatTransaction().all.collect()

# From the Parquet backend
flat = bsp.parquet.FlatTransaction().all.collect()

Both backends return Polars LazyFrames with identical schemas.

Documentation

Full documentation is available at boscorat.github.io/bank_statement_parser.

Guides

  • Adding a New Bank — TOML configuration for parsing statements from a new bank.
  • Anonymisation — redacting PII from statement PDFs, config setup, and output review.
  • Project Structure — directory layout, SQLite schema, and Parquet file organisation.
  • Export Options — simple vs. full export presets, CSV and Excel output.

Reference

  • CLI Reference — all bsp process and bsp anonymise options with examples.
  • Python API ReferenceStatementBatch, report backends, export helpers, and database utilities.

Contributing

Developer guidelines, architecture notes, code style rules, and test commands are documented in AGENTS.md.

# Run the test suite
pytest -v

# Lint and format
ruff check .
ruff format .

Releasing a new version

  1. Bump the version in pyproject.toml (the single source of truth).
  2. Commit and tag:
    # Include uv.lock only if dependencies changed since the last commit
    git add pyproject.toml
    git add uv.lock  # omit if no dependency changes
    git commit -m "release: v0.2.0"
    git tag v0.2.0
    git push origin master --tags
    
  3. The release.yml workflow runs automatically — builds and publishes to PyPI, builds .deb and .rpm packages, and creates a GitHub Release with all assets attached.

Alternative installation (pipx / venv)

This package requires Python 3.14 or later. Python 3.14 is not yet bundled by most system package managers, so you will need to install it separately before using pipx or a plain virtual environment.

Installing Python 3.14

The easiest cross-platform option is python-build-standalone via pyenv, or by downloading directly from python.org.

On Ubuntu/Debian you can use the deadsnakes PPA:

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.14 python3.14-venv

On Fedora/RHEL, check whether your version ships 3.14 via dnf, otherwise build from source or use pyenv.

Using pipx

Once Python 3.14 is available on your system:

pipx install uk-bank-statement-parser --python python3.14

To upgrade later:

pipx upgrade uk-bank-statement-parser

Using a virtual environment manually

python3.14 -m venv ~/.venvs/bsp
~/.venvs/bsp/bin/pip install uk-bank-statement-parser

Then either activate the environment or invoke bsp directly:

# Activate (adds bsp to PATH for the session)
source ~/.venvs/bsp/bin/activate
bsp --help

# Or run without activating
~/.venvs/bsp/bin/bsp --help

To upgrade:

~/.venvs/bsp/bin/pip install --upgrade uk-bank-statement-parser

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uk_bank_statement_parser-0.2.1b2.tar.gz (4.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uk_bank_statement_parser-0.2.1b2-py3-none-any.whl (4.6 MB view details)

Uploaded Python 3

File details

Details for the file uk_bank_statement_parser-0.2.1b2.tar.gz.

File metadata

  • Download URL: uk_bank_statement_parser-0.2.1b2.tar.gz
  • Upload date:
  • Size: 4.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_parser-0.2.1b2.tar.gz
Algorithm Hash digest
SHA256 004653cc7fe6f3f60d5f29663bc069c7c6fc95ee6db0dab1d2ce4eb7b1dd0c8e
MD5 b6d38899ec189916232c1e5715c026b9
BLAKE2b-256 d79e8de58f38a308f8f15d0a798c89ae2a4c7630060779725483680896289cb6

See more details on using hashes here.

File details

Details for the file uk_bank_statement_parser-0.2.1b2-py3-none-any.whl.

File metadata

  • Download URL: uk_bank_statement_parser-0.2.1b2-py3-none-any.whl
  • Upload date:
  • Size: 4.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.10 {"installer":{"name":"uv","version":"0.11.10","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_parser-0.2.1b2-py3-none-any.whl
Algorithm Hash digest
SHA256 14135866533646ba5a46b79b3265538bd66f62fe6f9551203315b99da52a88d2
MD5 03625b624cfbda3e7ab9cb0d2a5c90b6
BLAKE2b-256 1dc68c38ced30311b3b7602c08d085752f6d5321f23c9e78de5b26b932d1dc39

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page