Skip to main content

Parse bank statement PDFs, extract transactions, and persist to Parquet and SQLite.

Project description

uk-bank-statement-parser

License: MIT Python 3.14+

Parse bank statement PDFs, extract structured transaction data, validate financial information through checks and balances, and persist results to Parquet files and a SQLite star-schema data mart. Export reports as Excel workbooks or CSV files.

Features

  • PDF extraction — configurable pattern-based parsing of bank statement PDFs using pdfplumber.
  • Checks and balances — automatic validation of opening/closing balances, payment totals, and running balances against statement header values.
  • Dual persistence — write results to Parquet files, a SQLite database, or both.
  • Star-schema data mart — automatically builds dimension and fact tables (DimTime, DimAccount, DimStatement, FactTransaction, FactBalance) plus a GapReport for detecting missing statements.
  • Dual report backends — read the same report classes from either Parquet or SQLite, with identical schemas.
  • Export — single flat transactions table (default) or separate star-schema tables, as Excel and/or CSV.
  • PDF anonymisation — redact personally identifiable information from statement PDFs using a user-supplied mapping file. Transaction descriptions are scrambled so merchant names cannot be recovered.
  • Parallel processing — async + multiprocess batch mode for large PDF sets.
  • Cross-platform — pure Python with no OS-specific dependencies.

Installation

Using uv (recommended)

uv manages its own Python installations, so no system Python 3.14 is required. If uv is not already installed, follow the uv installation guide.

uv tool install uk-bank-statement-parser

This creates an isolated environment and puts bsp on your $PATH.

To upgrade later:

uv tool upgrade uk-bank-statement-parser

Debian / Ubuntu (.deb)

Download the .deb from the latest GitHub Release, then install:

sudo dpkg -i uk-bank-statement-parser_*_all.deb

This installs a self-contained virtualenv to /opt/uk-bank-statement-parser/ and a bsp wrapper to /usr/bin/bsp. No system Python is required. Uninstall with sudo dpkg -r uk-bank-statement-parser.

Fedora / RHEL (.rpm)

Download the .rpm from the latest GitHub Release, then install:

sudo rpm -i uk-bank-statement-parser-*-1.noarch.rpm

No system Python is required. Uninstall with sudo rpm -e uk-bank-statement-parser.

From source

git clone https://github.com/boscorat/bank_statement_parser.git
cd bank_statement_parser
uv sync

Prefer not to use uv? See Alternative installation (pipx / venv) for instructions using pipx or a manually created virtual environment.

Quick Start

Command line

Process all PDFs in a folder and export an Excel workbook and CSV file:

bsp process --pdfs ~/statements/

This creates a bsp_project/ directory in your current working directory containing the SQLite database, Parquet files, and exported reports.

Python API

import bank_statement_parser as bsp
from pathlib import Path

# Process a batch of PDFs
batch = bsp.StatementBatch(pdfs=sorted(Path("~/statements").expanduser().glob("*.pdf")))

# Persist to Parquet + SQLite
batch.update_data()

# Export a flat transactions table as Excel and CSV
batch.export(filetype="both")

# Copy source PDFs into the project tree
batch.copy_statements_to_project()

# Clean up temporary files
batch.delete_temp_files()

Read reports directly:

import bank_statement_parser as bsp

# From the SQLite backend
flat = bsp.db.FlatTransaction().all.collect()

# From the Parquet backend
flat = bsp.parquet.FlatTransaction().all.collect()

Both backends return Polars LazyFrames with identical schemas.

Documentation

Full documentation is available at boscorat.github.io/bank_statement_parser.

Guides

  • Adding a New Bank — TOML configuration for parsing statements from a new bank.
  • Anonymisation — redacting PII from statement PDFs, config setup, and output review.
  • Project Structure — directory layout, SQLite schema, and Parquet file organisation.
  • Export Options — simple vs. full export presets, CSV and Excel output.

Reference

  • CLI Reference — all bsp process and bsp anonymise options with examples.
  • Python API ReferenceStatementBatch, report backends, export helpers, and database utilities.

Contributing

Developer guidelines, architecture notes, code style rules, and test commands are documented in AGENTS.md.

# Run the test suite
pytest -v

# Lint and format
ruff check .
ruff format .

Releasing a new version

  1. Bump the version in pyproject.toml (the single source of truth).
  2. Commit and tag:
    # Include uv.lock only if dependencies changed since the last commit
    git add pyproject.toml
    git add uv.lock  # omit if no dependency changes
    git commit -m "release: v0.2.0"
    git tag v0.2.0
    git push origin master --tags
    
  3. The release.yml workflow runs automatically — builds and publishes to PyPI, builds .deb and .rpm packages, and creates a GitHub Release with all assets attached.

Alternative installation (pipx / venv)

This package requires Python 3.14 or later. Python 3.14 is not yet bundled by most system package managers, so you will need to install it separately before using pipx or a plain virtual environment.

Installing Python 3.14

The easiest cross-platform option is python-build-standalone via pyenv, or by downloading directly from python.org.

On Ubuntu/Debian you can use the deadsnakes PPA:

sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.14 python3.14-venv

On Fedora/RHEL, check whether your version ships 3.14 via dnf, otherwise build from source or use pyenv.

Using pipx

Once Python 3.14 is available on your system:

pipx install uk-bank-statement-parser --python python3.14

To upgrade later:

pipx upgrade uk-bank-statement-parser

Using a virtual environment manually

python3.14 -m venv ~/.venvs/bsp
~/.venvs/bsp/bin/pip install uk-bank-statement-parser

Then either activate the environment or invoke bsp directly:

# Activate (adds bsp to PATH for the session)
source ~/.venvs/bsp/bin/activate
bsp --help

# Or run without activating
~/.venvs/bsp/bin/bsp --help

To upgrade:

~/.venvs/bsp/bin/pip install --upgrade uk-bank-statement-parser

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uk_bank_statement_parser-0.2.1b5.tar.gz (4.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uk_bank_statement_parser-0.2.1b5-py3-none-any.whl (4.6 MB view details)

Uploaded Python 3

File details

Details for the file uk_bank_statement_parser-0.2.1b5.tar.gz.

File metadata

  • Download URL: uk_bank_statement_parser-0.2.1b5.tar.gz
  • Upload date:
  • Size: 4.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_parser-0.2.1b5.tar.gz
Algorithm Hash digest
SHA256 357ac3f0b62f8464cdb0e5a16e3c1e1f88a50d3c6e535603c538432811248e18
MD5 8bf51fc1f68e27f3b4e2a64abf91c12a
BLAKE2b-256 80acf914bd56384a4cc300cce0472144dc305d16b70a08e0239551bde33c7036

See more details on using hashes here.

File details

Details for the file uk_bank_statement_parser-0.2.1b5-py3-none-any.whl.

File metadata

  • Download URL: uk_bank_statement_parser-0.2.1b5-py3-none-any.whl
  • Upload date:
  • Size: 4.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_parser-0.2.1b5-py3-none-any.whl
Algorithm Hash digest
SHA256 50c9e4360238df0e4b4eb6a6751c938bcca0f07ede1b73e0bca3d8fb138305f0
MD5 9a17c61982c1cf4aa1f0ee3995569ba0
BLAKE2b-256 bdbc428cc2f95fe4559dc4e179f9258af5b8ea95a746d25bed8cf362e6701ef8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page