Skip to main content

Anonymise UK bank statement PDFs by scrambling personal data while preserving document structure.

Project description

uk-bank-statement-anonymiser

Anonymise UK bank statement PDFs by scrambling personal data while preserving the document's visual structure and layout. All letters in transaction descriptions are replaced with random alternatives; dates, payment codes, protected phrases, and numeric identifiers (sort codes, account numbers, IBANs, card numbers) are handled deterministically so the anonymised output remains internally consistent across pages.

Supported statement types

  • HSBC UK current account
  • HSBC UK savings account
  • Natwest current account
  • TSB Spend & Save account
  • TSB credit card

Other UK bank PDFs may work, but have not been tested.

Requirements

  • Python 3.14+
  • pikepdf (installed automatically)

Installation

pip install uk-bank-statement-anonymiser

Quick start

By default, the library automatically detects and anonymises dates, sort codes, account numbers, card numbers, and other sensitive patterns. For custom rules—to force specific replacements or protect additional phrases—see User config files below.

from bank_statement_anonymiser import anonymise_pdf

# Minimal — output written alongside input as "anonymised_<original_name>.pdf"
anonymise_pdf("statement.pdf")

# Explicit output path (recommended — avoids exposing the original filename)
anonymise_pdf("statement.pdf", "safe_output_name.pdf")

User config files

The library ships two system config files (bundled in the package, committed to source control) that cover common protected phrases and known numeric patterns:

File Purpose
always_anonymise_system.toml Force specific strings to a known replacement value
never_anonymise_system.toml Protect specific phrases from being scrambled

You can supplement these with your own files passed as arguments to anonymise_pdf:

anonymise_pdf(
    "statement.pdf",
    "output.pdf",
    always_anonymise_path="my_always_anonymise.toml",
    never_anonymise_path="my_never_anonymise.toml",
)

System config provides defaults; your custom config overrides or extends them. For always_anonymise: your rules win on any clash. For never_anonymise: both system and user lists are combined (union).

User config files should not be committed to source control — they will typically contain real account numbers, sort codes, or names that you are trying to protect.

always_anonymise.toml format

# Force exact string replacements before the scramble pass.
# User file wins over system file on a clash.

"40-37-28" = "00-00-00"
"12345678" = "00000000"
"Jason Farrar" = "John Doe"

never_anonymise.toml format

# Phrases listed here are left exactly as-is during the scramble pass.
# Matching is case-insensitive and whitespace-insensitive.

exclude = [
    "My Bank",
    "My Employer Ltd",
]

API reference

anonymise_pdf

def anonymise_pdf(
    input_path: str | Path,
    output_path: str | Path | None = None,
    always_anonymise_path: str | Path | None = None,
    never_anonymise_path: str | Path | None = None,
    debug: bool = False,
) -> Path

Anonymises a single PDF and returns the path to the output file.

Parameter Description
input_path Path to the input PDF
output_path Path for the output PDF. If omitted, writes anonymised_<stem><suffix> in the same directory as the input
always_anonymise_path Path to a user always_anonymise.toml (optional)
never_anonymise_path Path to a user never_anonymise.toml (optional)
debug Print diagnostic information to stdout when True
Returns Path to the output PDF file

How it works

The anonymiser works in three steps:

  1. Identify sensitive data — Detects sort codes, account numbers, IBANs, card numbers, and other patterns defined in config. Each gets a deterministic fake replacement (e.g. 40-37-2828-28-28 — last two digits repeated). This ensures the same data point is always replaced with the same fake value, even across multiple pages.

  2. Protect structural text — Dates, payment type codes, bank URLs, and any phrases in your never_anonymise config are left unchanged. This preserves the document's readability and structure.

  3. Scramble remaining text — All other letters are scrambled (e.g. BarclaysDqhyqbvd), while digits and symbols stay intact. The PDF's layout, fonts, images, and line breaks remain unchanged.

Licence

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

uk_bank_statement_anonymiser-0.1.2.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

uk_bank_statement_anonymiser-0.1.2-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file uk_bank_statement_anonymiser-0.1.2.tar.gz.

File metadata

  • Download URL: uk_bank_statement_anonymiser-0.1.2.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_anonymiser-0.1.2.tar.gz
Algorithm Hash digest
SHA256 57ecbe6ad1e29d93ed66b9b211f1e329f7fcc29a99c27f0a593a843dd21ce083
MD5 048d4f677c7dcd2e6dfa0c20884e6515
BLAKE2b-256 bd470748fac248476fa2b6d3734b4814425f3512146f1b92745328317b7e3e5e

See more details on using hashes here.

File details

Details for the file uk_bank_statement_anonymiser-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: uk_bank_statement_anonymiser-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for uk_bank_statement_anonymiser-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 17d86d51bec1988caec6bf6e9ec00a53eb28800e72fc5fa30dc1ad18bc276960
MD5 1b2aeb4c96ec3a828dfc6a6c92e98a04
BLAKE2b-256 e9b8b456593a9a8f19e64a9c2fd0269432a2258a74f200806628de5f2c486463

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page