Parse bank statement PDFs, extract transactions, and persist to Parquet and SQLite.
Project description
uk-bank-statement-parser
Parse bank statement PDFs, extract structured transaction data, validate financial information through checks and balances, and persist results to Parquet files and a SQLite star-schema data mart. Export reports as Excel workbooks or CSV files.
Features
- PDF extraction — configurable pattern-based parsing of bank statement PDFs using pdfplumber.
- Checks and balances — automatic validation of opening/closing balances, payment totals, and running balances against statement header values.
- Dual persistence — write results to Parquet files, a SQLite database, or both.
- Star-schema data mart — automatically builds dimension and fact tables
(
DimTime,DimAccount,DimStatement,FactTransaction,FactBalance) plus aGapReportfor detecting missing statements. - Dual report backends — read the same report classes from either Parquet or SQLite, with identical schemas.
- Export — single flat transactions table (default) or separate star-schema tables, as Excel and/or CSV.
- PDF anonymisation — redact personally identifiable information from statement PDFs using a user-supplied mapping file. Transaction descriptions are scrambled so merchant names cannot be recovered.
- Parallel processing — async + multiprocess batch mode for large PDF sets.
- Cross-platform — pure Python with no OS-specific dependencies.
Installation
Using uv (recommended)
uv manages its own Python installations, so no system Python 3.14 is required. If uv is not already installed, follow the uv installation guide.
uv tool install uk-bank-statement-parser
This creates an isolated environment and puts bsp on your $PATH.
To upgrade later:
uv tool upgrade uk-bank-statement-parser
Debian / Ubuntu (.deb)
Download the .deb from the
latest GitHub Release,
then install:
sudo dpkg -i uk-bank-statement-parser_*_all.deb
This installs a self-contained virtualenv to /opt/uk-bank-statement-parser/
and a bsp wrapper to /usr/bin/bsp. No system Python is required. Uninstall
with sudo dpkg -r uk-bank-statement-parser.
Fedora / RHEL (.rpm)
Download the .rpm from the
latest GitHub Release,
then install:
sudo rpm -i uk-bank-statement-parser-*-1.noarch.rpm
No system Python is required. Uninstall with
sudo rpm -e uk-bank-statement-parser.
From source
git clone https://github.com/boscorat/bank_statement_parser.git
cd bank_statement_parser
uv sync
Prefer not to use uv? See Alternative installation (pipx / venv) for instructions using pipx or a manually created virtual environment.
Quick Start
Command line
Process all PDFs in a folder and export an Excel workbook and CSV file:
bsp process --pdfs ~/statements/
This creates a bsp_project/ directory in your current working directory
containing the SQLite database, Parquet files, and exported reports.
Python API
import bank_statement_parser as bsp
from pathlib import Path
# Process a batch of PDFs
batch = bsp.StatementBatch(pdfs=sorted(Path("~/statements").expanduser().glob("*.pdf")))
# Persist to Parquet + SQLite
batch.update_data()
# Export a flat transactions table as Excel and CSV
batch.export(filetype="both")
# Copy source PDFs into the project tree
batch.copy_statements_to_project()
# Clean up temporary files
batch.delete_temp_files()
Read reports directly:
import bank_statement_parser as bsp
# From the SQLite backend
flat = bsp.db.FlatTransaction().all.collect()
# From the Parquet backend
flat = bsp.parquet.FlatTransaction().all.collect()
Both backends return Polars LazyFrames with identical schemas.
Documentation
Full documentation is available at boscorat.github.io/bank_statement_parser.
Guides
- Adding a New Bank — TOML configuration for parsing statements from a new bank.
- Anonymisation — redacting PII from statement PDFs, config setup, and output review.
- Project Structure — directory layout, SQLite schema, and Parquet file organisation.
- Export Options — simple vs. full export presets, CSV and Excel output.
Reference
- CLI Reference —
all
bsp processandbsp anonymiseoptions with examples. - Python API Reference —
StatementBatch, report backends, export helpers, and database utilities.
Contributing
Developer guidelines, architecture notes, code style rules, and test commands are documented in AGENTS.md.
# Run the test suite
pytest -v
# Lint and format
ruff check .
ruff format .
Releasing a new version
- Bump the version in
pyproject.toml(the single source of truth). - Commit and tag:
# Include uv.lock only if dependencies changed since the last commit git add pyproject.toml git add uv.lock # omit if no dependency changes git commit -m "release: v0.2.0" git tag v0.2.0 git push origin master --tags
- The
release.ymlworkflow runs automatically — builds and publishes to PyPI, builds.deband.rpmpackages, and creates a GitHub Release with all assets attached.
Alternative installation (pipx / venv)
This package requires Python 3.14 or later. Python 3.14 is not yet bundled by most system package managers, so you will need to install it separately before using pipx or a plain virtual environment.
Installing Python 3.14
The easiest cross-platform option is python-build-standalone via pyenv, or by downloading directly from python.org.
On Ubuntu/Debian you can use the deadsnakes PPA:
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt update
sudo apt install python3.14 python3.14-venv
On Fedora/RHEL, check whether your version ships 3.14 via dnf, otherwise
build from source or use pyenv.
Using pipx
Once Python 3.14 is available on your system:
pipx install uk-bank-statement-parser --python python3.14
To upgrade later:
pipx upgrade uk-bank-statement-parser
Using a virtual environment manually
python3.14 -m venv ~/.venvs/bsp
~/.venvs/bsp/bin/pip install uk-bank-statement-parser
Then either activate the environment or invoke bsp directly:
# Activate (adds bsp to PATH for the session)
source ~/.venvs/bsp/bin/activate
bsp --help
# Or run without activating
~/.venvs/bsp/bin/bsp --help
To upgrade:
~/.venvs/bsp/bin/pip install --upgrade uk-bank-statement-parser
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uk_bank_statement_parser-0.2.1a2.tar.gz.
File metadata
- Download URL: uk_bank_statement_parser-0.2.1a2.tar.gz
- Upload date:
- Size: 106.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
313af3c4637508b1a422e9d79273b80c4c0695798322919b0729f4bcf5f71bfc
|
|
| MD5 |
33d074633b6adb3ddae043235c53049c
|
|
| BLAKE2b-256 |
39078043a365d96df836107ade86ec82ae2bd26841f761c0698a674ba21191f6
|
File details
Details for the file uk_bank_statement_parser-0.2.1a2-py3-none-any.whl.
File metadata
- Download URL: uk_bank_statement_parser-0.2.1a2-py3-none-any.whl
- Upload date:
- Size: 128.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.10.9 {"installer":{"name":"uv","version":"0.10.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7aec45fcefd4c0a187032afc949343073deb04761a382c17d45e07c04e0b815e
|
|
| MD5 |
d505212cb999f9136c6e4fb625c42bab
|
|
| BLAKE2b-256 |
3f47ebbbfb625769a26889f75f7db6d63a9850601c30b39ef57da306ac2e27b5
|