Skip to main content

Parse bank statement PDFs, extract transactions, and persist to Parquet and SQLite.

Project description

uk-bank-statement-parser

License: MIT Python 3.14+

Parse bank statement PDFs, extract structured transaction data, validate financial information through checks and balances, and persist results to Parquet files and a SQLite star-schema data mart. Export reports as Excel workbooks or CSV files.

Features

  • PDF extraction — configurable pattern-based parsing of bank statement PDFs using pdfplumber.
  • Checks and balances — automatic validation of opening/closing balances, payment totals, and running balances against statement header values.
  • Dual persistence — write results to Parquet files, a SQLite database, or both.
  • Star-schema data mart — automatically builds dimension and fact tables (DimTime, DimAccount, DimStatement, FactTransaction, FactBalance) plus a GapReport for detecting missing statements.
  • Dual report backends — read the same report classes from either Parquet or SQLite, with identical schemas.
  • Export — single flat transactions table (default) or separate star-schema tables, as Excel and/or CSV.
  • PDF anonymisation — redact personally identifiable information from statement PDFs using a user-supplied mapping file. Transaction descriptions are scrambled so merchant names cannot be recovered.
  • Parallel processing — async + multiprocess batch mode for large PDF sets.
  • Cross-platform — pure Python with no OS-specific dependencies.

Scope

Bank Statement Parser focuses exclusively on UK bank statements. See VISION.md for what we do and don't support. Planning to request a feature? Check the vision first to avoid out-of-scope requests.

Installation

Using uv (recommended)

uv manages its own Python installations, so no system Python 3.14 is required. If uv is not already installed, follow the uv installation guide.

uv tool install uk-bank-statement-parser

This creates an isolated environment and puts bsp on your $PATH.

To upgrade later:

uv tool upgrade uk-bank-statement-parser

Debian / Ubuntu (.deb)

Download the .deb from the latest GitHub Release, then install:

sudo dpkg -i uk-bank-statement-parser_*_all.deb

This installs a self-contained virtualenv to /opt/uk-bank-statement-parser/ and a bsp wrapper to /usr/bin/bsp. No system Python is required. Uninstall with sudo dpkg -r uk-bank-statement-parser.

Fedora / RHEL (.rpm)

Download the .rpm from the latest GitHub Release, then install:

sudo rpm -i uk-bank-statement-parser-*-1.noarch.rpm

No system Python is required. Uninstall with sudo rpm -e uk-bank-statement-parser.

From source

git clone https://github.com/boscorat/bank_statement_parser.git
cd bank_statement_parser
uv sync

Prefer not to use uv? See Alternative installation (pipx / venv) for instructions using pipx or a manually created virtual environment.

Quick Start

Command line

Process all PDFs in a folder and export an Excel workbook and CSV file:

bsp process --pdfs ~/statements/

This creates a bsp_project/ directory in your current working directory containing the SQLite database, Parquet files, and exported reports.

Python API

import bank_statement_parser as bsp
from pathlib import Path

# Process a batch of PDFs
batch = bsp.StatementBatch(pdfs=sorted(Path("~/statements").expanduser().glob("*.pdf")))

# Persist to Parquet + SQLite
batch.update_data()

# Export a flat transactions table as Excel and CSV
batch.export(filetype="both")

# Copy source PDFs into the project tree
batch.copy_statements_to_project()

# Clean up temporary files
batch.delete_temp_files()

Read reports directly:

import bank_statement_parser as bsp

# From the SQLite backend
flat = bsp.db.FlatTransaction().all.collect()

# From the Parquet backend
flat = bsp.parquet.FlatTransaction().all.collect()

Both backends return Polars LazyFrames with identical schemas.

Documentation

Full documentation is available at boscorat.github.io/bank_statement_parser.

Guides

  • Adding a New Bank — TOML configuration for parsing statements from a new bank.
  • Anonymisation — redacting PII from statement PDFs, config setup, and output review.
  • Project Structure — directory layout, SQLite schema, and Parquet file organisation.
  • Export Options — simple vs. full export presets, CSV and Excel output.

Reference

  • CLI Reference — all bsp process and bsp anonymise options with examples.
  • Python API ReferenceStatementBatch, report backends, export helpers, and database utilities.

Contributing

We welcome contributions! Whether you're adding a new bank, fixing bugs, improving documentation, or submitting test data, please see:

Quick Start for Development

# Install dependencies
uv sync

# Run the test suite
pytest -v

# Lint and format
ruff check .
ruff format .

Contributing a New Bank Configuration

If you want to add support for a bank not currently supported:

  1. Read CONTRIBUTING.md to understand the workflow
  2. Follow Adding a New Bank for technical steps
  3. Test locally with Local Testing Guide
  4. Prepare 3+ anonymised test PDFs (see Test Data Submission Guide)
  5. Submit a PR with your config files

Your test PDFs will become permanent regression tests that protect your configuration against future changes.

Releasing a new version

  1. Bump the version in pyproject.toml (the single source of truth).
  2. Commit and tag:
    # Include uv.lock only if dependencies changed since the last commit
    git add pyproject.toml
    git add uv.lock  # omit if no dependency changes
    git commit -m "release: v0.2.0"
    git tag v0.2.0
    git push origin master --tags
    
  3. The release.yml workflow runs automatically — builds and publishes to PyPI, builds .deb and .rpm packages, and creates a GitHub Release with all assets attached.

Alternative installation (pipx / venv)

This package requires Python 3.14 or later. Python 3.14 is not yet bundled by most system package managers, so you will need to install it separately before using pipx or a plain virtual environment.

Installing Python 3.14

The easiest cross-platform option is python-build-standalone via pyenv, or by downloading directly from python.org.

On Ubuntu/Debian you can use the deadsnakes PPA:

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.14 python3.14-venv

On Fedora/RHEL, check whether your version ships 3.14 via dnf, otherwise build from source or use pyenv.

Using pipx

Once Python 3.14 is available on your system:

pipx install uk-bank-statement-parser --python python3.14

To upgrade later:

pipx upgrade uk-bank-statement-parser

Using a virtual environment manually

python3.14 -m venv ~/.venvs/bsp
~/.venvs/bsp/bin/pip install uk-bank-statement-parser

Then either activate the environment or invoke bsp directly:

# Activate (adds bsp to PATH for the session)
source ~/.venvs/bsp/bin/activate
bsp --help

# Or run without activating
~/.venvs/bsp/bin/bsp --help

To upgrade:

~/.venvs/bsp/bin/pip install --upgrade uk-bank-statement-parser

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uk_bank_statement_parser-0.2.1rc11.tar.gz (125.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uk_bank_statement_parser-0.2.1rc11-py3-none-any.whl (156.7 kB view details)

Uploaded Python 3

File details

Details for the file uk_bank_statement_parser-0.2.1rc11.tar.gz.

File metadata

  • Download URL: uk_bank_statement_parser-0.2.1rc11.tar.gz
  • Upload date:
  • Size: 125.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_parser-0.2.1rc11.tar.gz
Algorithm Hash digest
SHA256 6ade7295067da695a2a12ef0922fe15670f4089796886baf15136ec986e10467
MD5 345f648e8d1d99395b7606d86f7636a2
BLAKE2b-256 cfc2da93e118b48ceca200723d689c50a807ee198f104bdd53ea1595b576631b

See more details on using hashes here.

File details

Details for the file uk_bank_statement_parser-0.2.1rc11-py3-none-any.whl.

File metadata

  • Download URL: uk_bank_statement_parser-0.2.1rc11-py3-none-any.whl
  • Upload date:
  • Size: 156.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_parser-0.2.1rc11-py3-none-any.whl
Algorithm Hash digest
SHA256 ab5fab0525bb9c92440cd0167c4518a21f993ed59f097cf722da72de486b47db
MD5 628ef173b1acde47d8880225fa2ecb75
BLAKE2b-256 bc62b5a40239561a9a59731caf09e3f9fedcb2e77c8692e4ebb3ca6a89f1c27e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page