Anonymise UK bank statement PDFs by scrambling personal data while preserving document structure.
Project description
uk-bank-statement-anonymiser
Anonymise UK bank statement PDFs by scrambling personal data while preserving the document's visual structure and layout. All letters in transaction descriptions are replaced with random alternatives; dates, payment codes, protected phrases, and numeric identifiers (sort codes, account numbers, IBANs, card numbers) are handled deterministically so the anonymised output remains internally consistent across pages.
Supported statement types
- HSBC UK current account
- HSBC UK savings account
- Natwest current account
- TSB Spend & Save account
- TSB credit card
Other UK bank PDFs may work, but have not been tested.
Requirements
- Python 3.14+
- pikepdf (installed automatically)
Installation
pip install uk-bank-statement-anonymiser
Quick start
By default, the library automatically detects and anonymises dates, sort codes, account numbers, card numbers, and other sensitive patterns. For custom rules—to force specific replacements or protect additional phrases—see User config files below.
from bank_statement_anonymiser import anonymise_pdf
# Minimal — output written alongside input as "anonymised_<original_name>.pdf"
anonymise_pdf("statement.pdf")
# Explicit output path (recommended — avoids exposing the original filename)
anonymise_pdf("statement.pdf", "safe_output_name.pdf")
User config files
The library ships two system config files (bundled in the package, committed to source control) that cover common protected phrases and known numeric patterns:
| File | Purpose |
|---|---|
always_anonymise_system.toml |
Force specific strings to a known replacement value |
never_anonymise_system.toml |
Protect specific phrases from being scrambled |
You can supplement these with your own files passed as arguments to anonymise_pdf:
anonymise_pdf(
"statement.pdf",
"output.pdf",
always_anonymise_path="my_always_anonymise.toml",
never_anonymise_path="my_never_anonymise.toml",
)
System config provides defaults; your custom config overrides or extends them. For always_anonymise: your rules win on any clash. For never_anonymise: both system and user lists are combined (union).
User config files should not be committed to source control — they will typically contain real account numbers, sort codes, or names that you are trying to protect.
always_anonymise.toml format
# Force exact string replacements before the scramble pass.
# User file wins over system file on a clash.
"40-37-28" = "00-00-00"
"12345678" = "00000000"
"Jason Farrar" = "John Doe"
never_anonymise.toml format
# Phrases listed here are left exactly as-is during the scramble pass.
# Matching is case-insensitive and whitespace-insensitive.
exclude = [
"My Bank",
"My Employer Ltd",
]
API reference
anonymise_pdf
def anonymise_pdf(
input_path: str | Path,
output_path: str | Path | None = None,
always_anonymise_path: str | Path | None = None,
never_anonymise_path: str | Path | None = None,
debug: bool = False,
) -> Path
Anonymises a single PDF and returns the path to the output file.
| Parameter | Description |
|---|---|
input_path |
Path to the input PDF |
output_path |
Path for the output PDF. If omitted, writes anonymised_<stem><suffix> in the same directory as the input |
always_anonymise_path |
Path to a user always_anonymise.toml (optional) |
never_anonymise_path |
Path to a user never_anonymise.toml (optional) |
debug |
Print diagnostic information to stdout when True |
| Returns | Path to the output PDF file |
How it works
The anonymiser works in three steps:
-
Identify sensitive data — Detects sort codes, account numbers, IBANs, card numbers, and other patterns defined in config. Each gets a deterministic fake replacement (e.g.
40-37-28→28-28-28— last two digits repeated). This ensures the same data point is always replaced with the same fake value, even across multiple pages. -
Protect structural text — Dates, payment type codes, bank URLs, and any phrases in your
never_anonymiseconfig are left unchanged. This preserves the document's readability and structure. -
Scramble remaining text — All other letters are scrambled (e.g.
Barclays→Dqhyqbvd), while digits and symbols stay intact. The PDF's layout, fonts, images, and line breaks remain unchanged.
Licence
MIT — see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file uk_bank_statement_anonymiser-0.1.3.tar.gz.
File metadata
- Download URL: uk_bank_statement_anonymiser-0.1.3.tar.gz
- Upload date:
- Size: 23.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
841bca0c65cefde46db76959760f5f366548869d8ab7a83a7579272fd5a1643d
|
|
| MD5 |
080c73f08226f0fe4f2a5b99cdc605fd
|
|
| BLAKE2b-256 |
8887b02f729f65aeb5f1745ff92ef98422550e5e48b3e8bf5a715052cfea0697
|
File details
Details for the file uk_bank_statement_anonymiser-0.1.3-py3-none-any.whl.
File metadata
- Download URL: uk_bank_statement_anonymiser-0.1.3-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3050ea3669ad1a70f83009207809a85b0bc09a0335e73a85be6a5e958ff84532
|
|
| MD5 |
d96ca331f85fd28bbe497857426fd499
|
|
| BLAKE2b-256 |
bf4a49e09e0d696dd90b3d2f03013ad261f5b36c45ffac5f0f5f8fb239562fb8
|