Skip to main content

BankStatementParser is your essential tool for easy bank statement management. Designed with finance and treasury experts in mind, it offers a simple way to handle CAMT (ISO 20022) formats and more. Get quick, accurate insights from your financial data and spend less time on processing. It's the smart, hassle-free way to stay on top of your transactions.

Project description

Bank Statement Parser

Parse bank statements across six formats — CAMT, PAIN.001, CSV, OFX/QFX, and MT940 — into structured DataFrames. Process ZIP archives safely. Redact PII by default. Stream files of any size.

Built for finance teams, treasury analysts, and fintech developers who need reliable, auditable extraction from ISO 20022 and legacy banking formats without sending data to external services.

PyPI PyPI Downloads Codecov License

Key Features

Feature Description
6 formats CAMT.053, PAIN.001, CSV, OFX, QFX, MT940
Auto-detection detect_statement_format() identifies the format; create_parser() returns the right parser
Deduplication Deduplicator detects exact duplicates and suspected matches across sources with explainable confidence scores
PII redaction Names, IBANs, and addresses masked by default — opt in with --show-pii
Streaming parse_streaming() at 27,000+ tx/s (CAMT) and 52,000+ tx/s (PAIN.001) with bounded memory
Parallel parse_files_parallel() for multi-file batch processing across CPU cores
Secure ZIP iter_secure_xml_entries() rejects zip bombs, encrypted entries, and suspicious compression ratios
In-memory parsing from_string() and from_bytes() parse XML without touching disk
Export CSV, JSON, Excel (.xlsx), and optional Polars DataFrames
100% coverage 467 tests, 100% branch coverage, property-based fuzzing with Hypothesis

Requirements

  • Python 3.9 through 3.14
  • Poetry (for local development)

Install

pip install bankstatementparser

Local Development

Clone and install on macOS, Linux, or WSL:

git clone https://github.com/sebastienrousseau/bankstatementparser.git
cd bankstatementparser
python3 -m venv .venv
source .venv/bin/activate
pip install poetry
poetry install --with dev

Quick Start

Parse a CAMT statement

from bankstatementparser import CamtParser

parser = CamtParser("statement.xml")
transactions = parser.parse()
print(transactions)
   Amount Currency DrCr  Debtor Creditor      ValDt      AccountId
 105678.5      SEK CRDT MUELLER          2010-10-18 50000000054910
-200000.0      SEK DBIT                  2010-10-18 50000000054910
  30000.0      SEK CRDT                  2010-10-18 50000000054910

Parse a PAIN.001 payment file

from bankstatementparser import Pain001Parser

parser = Pain001Parser("payment.xml")
payments = parser.parse()
print(payments)
  PmtInfId PmtMtd  InstdAmt Currency  CdtrNm         EndToEndId
  PMT-001  TRF     1500.00  EUR       ACME Corp      E2E-001
  PMT-001  TRF     2300.50  EUR       Global Ltd     E2E-002

Auto-detect the format

from bankstatementparser import create_parser, detect_statement_format

fmt = detect_statement_format("transactions.ofx")
parser = create_parser("transactions.ofx", fmt)
records = parser.parse()

Works with .xml, .csv, .ofx, .qfx, and .mt940 files.

Parse from memory (no disk I/O)

from bankstatementparser import CamtParser

xml_bytes = download_from_sftp()  # your own function
parser = CamtParser.from_bytes(xml_bytes, source_name="daily.xml")
transactions = parser.parse()

Pass only decompressed XML to from_string() or from_bytes(). For ZIP archives, use iter_secure_xml_entries().

Parse XML files inside a ZIP archive

from bankstatementparser import CamtParser, iter_secure_xml_entries

for entry in iter_secure_xml_entries("statements.zip"):
    parser = CamtParser.from_bytes(entry.xml_bytes, source_name=entry.source_name)
    transactions = parser.parse()
    print(entry.source_name, len(transactions), "transactions")

The iterator enforces size limits, blocks encrypted entries, and rejects suspicious compression ratios before any XML parsing occurs.

PII Redaction

PII (names, IBANs, addresses) is redacted by default in console output and streaming mode.

# Redacted by default
for tx in parser.parse_streaming(redact_pii=True):
    print(tx)  # Names and addresses show as ***REDACTED***

# Opt in to see full data
for tx in parser.parse_streaming(redact_pii=False):
    print(tx)

File exports (CSV, JSON, Excel) always contain the full unredacted data.

Streaming

Process large files incrementally. Memory stays bounded regardless of file size — tested at 50,000 transactions with sub-2x memory scaling.

from bankstatementparser import CamtParser

parser = CamtParser("large_statement.xml")
for transaction in parser.parse_streaming():
    process(transaction)  # each transaction is a dict

Works with both CamtParser and Pain001Parser. PAIN.001 files over 50 MB use chunk-based namespace stripping via a temporary file — the full document is never loaded into memory.

Performance

Metric CAMT PAIN.001
Throughput 27,000+ tx/s 52,000+ tx/s
Per-transaction latency 37 us 19 us
Time to first result < 1 ms < 2 ms
Memory scaling Constant (1K–50K) Constant (1K–50K)

Performance is flat from 1,000 to 50,000 transactions. CI enforces minimum TPS and latency thresholds.

Parallel Parsing

Process multiple files simultaneously across CPU cores:

from bankstatementparser import parse_files_parallel

results = parse_files_parallel([
    "statements/jan.xml",
    "statements/feb.xml",
    "statements/mar.xml",
])

for r in results:
    print(r.path, r.status, len(r.transactions), "rows")

Uses ProcessPoolExecutor to bypass the GIL. Each file is parsed in its own worker process. Auto-detects format per file, or force with format_name="camt".

Command Line

# Parse and display
python -m bankstatementparser.cli --type camt --input statement.xml

# Export to CSV
python -m bankstatementparser.cli --type camt --input statement.xml --output transactions.csv

# Stream with PII visible
python -m bankstatementparser.cli --type camt --input statement.xml --streaming --show-pii

Supports --type camt and --type pain001.

Deduplication

Detect duplicate transactions across multiple sources:

from bankstatementparser import CamtParser, Deduplicator

parser = CamtParser("statement.xml")
dedup = Deduplicator()
result = dedup.deduplicate(dedup.from_dataframe(parser.parse()))

print(f"Unique: {len(result.unique_transactions)}")
print(f"Exact duplicates: {len(result.exact_duplicates)}")
print(f"Suspected matches: {len(result.suspected_matches)}")

The Deduplicator uses deterministic hashing for exact matches and configurable similarity thresholds for suspected matches. Each match group includes a confidence score and reason for auditability.

Export

parser = CamtParser("statement.xml")
parser.parse()

# CSV
parser.export_csv("output.csv")

# JSON (includes summary + transactions)
parser.export_json("output.json")

# Excel
parser.camt_to_excel("output.xlsx")

Polars (optional)

Convert any parser output to a Polars DataFrame:

polars_df = parser.to_polars()
lazy_df = parser.to_polars_lazy()

Install with pip install bankstatementparser[polars].

Examples

See examples/ for 14 runnable scripts:

Example What it demonstrates
parse_camt_basic.py Load a CAMT.053 file and print transactions
parse_camt_from_string.py Parse CAMT from an in-memory XML string
inspect_camt.py Extract balances, stats, and summaries
export_camt.py Export to CSV and JSON
export_camt_excel.py Export to Excel workbook
stream_camt.py Stream transactions incrementally
parse_camt_zip.py Secure ZIP archive processing
parse_detected_formats.py Auto-detect CSV, OFX, MT940, and XML formats
parse_pain001_basic.py Parse a PAIN.001 payment file
export_pain001.py Export PAIN.001 to CSV and JSON
stream_pain001.py Stream payments incrementally
validate_input.py Validate file paths with InputValidator
compatibility_wrappers.py Legacy API wrappers
cli_examples.sh CLI commands for CAMT and PAIN.001

XML Tag Mapping

See docs/MAPPING.md for a complete reference of ISO 20022 XML tags to DataFrame columns across all six formats. Use this when integrating with ERP systems or building reconciliation pipelines.

Project Layout

bankstatementparser/   Source code (13 modules, 100% branch coverage)
docs/compliance/       ISO 13485 validation, risk register, traceability
examples/              14 runnable example scripts
scripts/               SBOM generation, checksums, signature verification
tests/                 467 tests (unit, integration, property-based, security)

Security

Bank statement files contain sensitive financial and personal data. This library is designed with security as a primary constraint:

  • XXE protectionresolve_entities=False, no_network=True, load_dtd=False
  • ZIP bomb protection — compression ratio limits, entry size caps, encrypted entry rejection
  • Path traversal prevention — dangerous pattern blocklist, symlink resolution
  • PII redaction — default masking of names, IBANs, and addresses
  • Signed commits — enforced in CI via GitHub API verification
  • Supply chain — SHA-256 hash-locked dependencies, CycloneDX SBOM, build provenance attestation

For vulnerability reports, see SECURITY.md.

For the full compliance suite, see docs/compliance/.

Verify the Repository

Run the full validation suite locally:

ruff check bankstatementparser tests examples scripts
python -m mypy bankstatementparser
python -m pytest
bandit -r bankstatementparser examples scripts -q

Contributing

Signed commits required. See CONTRIBUTING.md.

License

Apache License 2.0. See LICENSE.

FAQ

What formats are supported? CAMT.053, PAIN.001, CSV, OFX, QFX, and MT940.

Does any data leave my infrastructure? No. Zero network calls. XML parsers enforce no_network=True. No cloud, no telemetry.

Is PII redacted automatically? Yes. Names, IBANs, and addresses are masked by default in console output and streaming. File exports retain full data.

Is the extraction deterministic? Yes. Same input produces byte-identical output. Critical for financial auditing.

Can it handle large files? Yes. parse_streaming() is tested at 50,000 transactions (~25 MB) with bounded memory. Files over 50 MB use chunk-based streaming.

See FAQ.md for the complete FAQ covering data privacy, technical specs, and treasury workflows.


THE ARCHITECT ᛫ Sebastien Rousseau ᛫ https://sebastienrousseau.com THE ENGINE ᛞ EUXIS ᛫ Enterprise Unified Execution Intelligence System ᛫ https://euxis.co

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bankstatementparser-0.0.4.tar.gz (46.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bankstatementparser-0.0.4-py3-none-any.whl (53.1 kB view details)

Uploaded Python 3

File details

Details for the file bankstatementparser-0.0.4.tar.gz.

File metadata

  • Download URL: bankstatementparser-0.0.4.tar.gz
  • Upload date:
  • Size: 46.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.3 CPython/3.12.13 Darwin/25.4.0

File hashes

Hashes for bankstatementparser-0.0.4.tar.gz
Algorithm Hash digest
SHA256 ec9e739d2d9f43c2e5e42aa0397382fa5efaf312fc276197d7dc749afaad6ab2
MD5 5cb16f6f22244b4392309eeef4a688bd
BLAKE2b-256 1a501b958c576c3948806350f7b82f3f86d174f68f0767f509e0935bd6a34a08

See more details on using hashes here.

File details

Details for the file bankstatementparser-0.0.4-py3-none-any.whl.

File metadata

File hashes

Hashes for bankstatementparser-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b2d2583c0adc40a8f1c5157379aaff9ec66ad310b5efc64ef8ff8a29733b7488
MD5 d001a2d0bbd9f26380e3d306faa2f09f
BLAKE2b-256 e0d0009c9b8e8b8435d7b4186661db47c23590be4685b958ee298c6a9339ac59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page