Skip to main content

Parse bank statement PDFs, extract transactions, and persist to Parquet and SQLite.

Project description

uk-bank-statement-parser

License: MIT Python 3.14+

Parse bank statement PDFs, extract structured transaction data, validate financial information through checks and balances, and persist results to Parquet files and a SQLite star-schema data mart. Export reports as Excel workbooks or CSV files.

Features

  • PDF extraction — configurable pattern-based parsing of bank statement PDFs using pdfplumber.
  • Checks and balances — automatic validation of opening/closing balances, payment totals, and running balances against statement header values.
  • Dual persistence — write results to Parquet files, a SQLite database, or both.
  • Star-schema data mart — automatically builds dimension and fact tables (DimTime, DimAccount, DimStatement, FactTransaction, FactBalance) plus a GapReport for detecting missing statements.
  • Dual report backends — read the same report classes from either Parquet or SQLite, with identical schemas.
  • Export — single flat transactions table (default) or separate star-schema tables, as Excel and/or CSV.
  • PDF anonymisation — redact personally identifiable information from statement PDFs using a user-supplied mapping file. Transaction descriptions are scrambled so merchant names cannot be recovered.
  • Parallel processing — async + multiprocess batch mode for large PDF sets.
  • Cross-platform — pure Python with no OS-specific dependencies.

Installation

Using uv (recommended)

uv manages its own Python installations, so no system Python 3.14 is required. If uv is not already installed, follow the uv installation guide.

uv tool install uk-bank-statement-parser

This creates an isolated environment and puts bsp on your $PATH.

To upgrade later:

uv tool upgrade uk-bank-statement-parser

Debian / Ubuntu (.deb)

Download the .deb from the latest GitHub Release, then install:

sudo dpkg -i uk-bank-statement-parser_*_all.deb

This installs a self-contained virtualenv to /opt/uk-bank-statement-parser/ and a bsp wrapper to /usr/bin/bsp. No system Python is required. Uninstall with sudo dpkg -r uk-bank-statement-parser.

Fedora / RHEL (.rpm)

Download the .rpm from the latest GitHub Release, then install:

sudo rpm -i uk-bank-statement-parser-*-1.noarch.rpm

No system Python is required. Uninstall with sudo rpm -e uk-bank-statement-parser.

From source

git clone https://github.com/boscorat/bank_statement_parser.git
cd bank_statement_parser
uv sync

Prefer not to use uv? See Alternative installation (pipx / venv) for instructions using pipx or a manually created virtual environment.

Quick Start

Command line

Process all PDFs in a folder and export an Excel workbook and CSV file:

bsp process --pdfs ~/statements/

This creates a bsp_project/ directory in your current working directory containing the SQLite database, Parquet files, and exported reports.

Python API

import bank_statement_parser as bsp
from pathlib import Path

# Process a batch of PDFs
batch = bsp.StatementBatch(pdfs=sorted(Path("~/statements").expanduser().glob("*.pdf")))

# Persist to Parquet + SQLite
batch.update_data()

# Export a flat transactions table as Excel and CSV
batch.export(filetype="both")

# Copy source PDFs into the project tree
batch.copy_statements_to_project()

# Clean up temporary files
batch.delete_temp_files()

Read reports directly:

import bank_statement_parser as bsp

# From the SQLite backend
flat = bsp.db.FlatTransaction().all.collect()

# From the Parquet backend
flat = bsp.parquet.FlatTransaction().all.collect()

Both backends return Polars LazyFrames with identical schemas.

Documentation

Full documentation is available at boscorat.github.io/bank_statement_parser.

Guides

  • Adding a New Bank — TOML configuration for parsing statements from a new bank.
  • Anonymisation — redacting PII from statement PDFs, config setup, and output review.
  • Project Structure — directory layout, SQLite schema, and Parquet file organisation.
  • Export Options — simple vs. full export presets, CSV and Excel output.

Reference

  • CLI Reference — all bsp process and bsp anonymise options with examples.
  • Python API ReferenceStatementBatch, report backends, export helpers, and database utilities.

Contributing

Developer guidelines, architecture notes, code style rules, and test commands are documented in AGENTS.md.

# Run the test suite
pytest -v

# Lint and format
ruff check .
ruff format .

Releasing a new version

  1. Bump the version in pyproject.toml (the single source of truth).
  2. Commit and tag:
    # Include uv.lock only if dependencies changed since the last commit
    git add pyproject.toml
    git add uv.lock  # omit if no dependency changes
    git commit -m "release: v0.2.0"
    git tag v0.2.0
    git push origin master --tags
    
  3. The release.yml workflow runs automatically — builds and publishes to PyPI, builds .deb and .rpm packages, and creates a GitHub Release with all assets attached.

Alternative installation (pipx / venv)

This package requires Python 3.14 or later. Python 3.14 is not yet bundled by most system package managers, so you will need to install it separately before using pipx or a plain virtual environment.

Installing Python 3.14

The easiest cross-platform option is python-build-standalone via pyenv, or by downloading directly from python.org.

On Ubuntu/Debian you can use the deadsnakes PPA:

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.14 python3.14-venv

On Fedora/RHEL, check whether your version ships 3.14 via dnf, otherwise build from source or use pyenv.

Using pipx

Once Python 3.14 is available on your system:

pipx install uk-bank-statement-parser --python python3.14

To upgrade later:

pipx upgrade uk-bank-statement-parser

Using a virtual environment manually

python3.14 -m venv ~/.venvs/bsp
~/.venvs/bsp/bin/pip install uk-bank-statement-parser

Then either activate the environment or invoke bsp directly:

# Activate (adds bsp to PATH for the session)
source ~/.venvs/bsp/bin/activate
bsp --help

# Or run without activating
~/.venvs/bsp/bin/bsp --help

To upgrade:

~/.venvs/bsp/bin/pip install --upgrade uk-bank-statement-parser

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uk_bank_statement_parser-0.2.1a2.tar.gz (106.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uk_bank_statement_parser-0.2.1a2-py3-none-any.whl (128.7 kB view details)

Uploaded Python 3

File details

Details for the file uk_bank_statement_parser-0.2.1a2.tar.gz.

File metadata

  • Download URL: uk_bank_statement_parser-0.2.1a2.tar.gz
  • Upload date:
  • Size: 106.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_parser-0.2.1a2.tar.gz
Algorithm Hash digest
SHA256 313af3c4637508b1a422e9d79273b80c4c0695798322919b0729f4bcf5f71bfc
MD5 33d074633b6adb3ddae043235c53049c
BLAKE2b-256 39078043a365d96df836107ade86ec82ae2bd26841f761c0698a674ba21191f6

See more details on using hashes here.

File details

Details for the file uk_bank_statement_parser-0.2.1a2-py3-none-any.whl.

File metadata

  • Download URL: uk_bank_statement_parser-0.2.1a2-py3-none-any.whl
  • Upload date:
  • Size: 128.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_parser-0.2.1a2-py3-none-any.whl
Algorithm Hash digest
SHA256 7aec45fcefd4c0a187032afc949343073deb04761a382c17d45e07c04e0b815e
MD5 d505212cb999f9136c6e4fb625c42bab
BLAKE2b-256 3f47ebbbfb625769a26889f75f7db6d63a9850601c30b39ef57da306ac2e27b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page