Skip to main content

Command-line tool for converting PDF bank statements into CSV

Project description

PDF Bank Statement Parser

Downloads

Command-line tool for robustly converting PDF bank statements into clean usable CSV. Currently only works for statements from First National Bank (FNB) South Africa (please let me know if you want me to expand the scope).

Install

pip install pdf-bank-statement-parser

Example Usage

# parse a single PDF bank statement #
parse-bank-statement-pdf \
  --input_filepath 'bank_statements/2024_03_27 - 2024_06_28.pdf' \
  --output_path 'bank_statements/csv/2024_03_27 - 2024_06_28.csv'

# parse all PDF bank statements in a given directory #
parse-bank-statement-pdf \
  --input_dir 'bank_statements/' \
  --output_path 'bank_statements/csv/' \
  --csv_sep_char ';'

The only format available from FNB for downloading historical bank statements is PDF, which is a useless format for any kind of downstream data task other than reading.

This tool uses pypdfium2 for text extraction from PDF and native python for everything else. Transactions are extracted using RegEx.

The parsed results are verified as follows:

  1. It is checked (for every transaction extracted) that the balance amount is the sum of the previous balance and the transaction amount.

  2. It is checked that the opening balance reported on the statement plus the sum of extracted transactions is equal to the closing balance reported on the statement.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pdf_bank_statement_parser-0.1.1.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

pdf_bank_statement_parser-0.1.1-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file pdf_bank_statement_parser-0.1.1.tar.gz.

File metadata

File hashes

Hashes for pdf_bank_statement_parser-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1c19bd7ed723ac5a29a91af880901e07afad2b4b0e15ed699da0d37fed5bb1ac
MD5 10cc37ad3897a01420cf746494414a7a
BLAKE2b-256 aa43d10dbf8b7faf80842fcd2c14031f5acea813a31c91b63556cf0b648b7044

See more details on using hashes here.

File details

Details for the file pdf_bank_statement_parser-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pdf_bank_statement_parser-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8bc7f6dffafc09ccbed6f28fddb60a3df1e9d7e7ed1e13dabdcf7215f8835d9d
MD5 3a3e7e4559b144d47145d70814679a7b
BLAKE2b-256 d5aabc3a920e275848096efbadf41dc55f098f76089a305c00a713b2b6abdbb9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page