library for interacting with Czech Raiffeisen Bank's text bank statements
rbcz is a Python library for parsing the plain-text bank statements that Raiffeisen Bank send out via email. It exposes a simple API to either parse statements stored on your local filesystem or to search through your email and retrieve them via IMAP.
Either retrieve from pypi using pip:
$ pip install rbcz
or clone this repo, and install using setup.py:
$ git clone https://github.com/smcl/rbcz.py $ cd rbcz.py $ python setup.py install
There are three simple functions - read_statement, read_statements and read_statements_from_imap. To parse a single statement we can use the read_statement function, which takes a single parameter - the path to the bank statement on the local filesystem - and returns a Statement object:
from rbcz import * statement = rbcz.read_statement("/path/to/stmt_january_czk.txt")
If we have a number of statements locally we can use read_statements which accepts a list of filenames to parse, and returns a list of Statement:
from rbcz import * statement_filenames = [ "stmt_jan_czk.txt", "stmt_feb_czk.txt", "stmt_mar_czk.txt" ] statements = rbcz.read_statements(statement_filenames)
If we don’t have all our statements stored locally we can use read_statements_from_imap to connect to an IMAP server and search it for emails from the “email@example.com” address, download and parse the attachments and return a list of Statement.
from rbcz import * statements = read_statements_from_imap("imap.gmail.com", "firstname.lastname@example.org", "password123", "inbox")
There are two types - Statement and Movement.
A Statement represents a monthly statement:
- account_name - (string) the name of the main account holder (your name!)
- account_number - (string) your account number
- iban - (string) the IBAN of your account
- currency - (string) the currency the account holds
- number - (int) the number of the statement (your first statement will be 1)
- from_date - (datetime) the opening date of the statement
- to_date - (datetime) the closing date of the statement
- opening_balance - (Decimal) the balance at the opening date of the statement
- income - (Decimal) the income you’ve received during the statement’s reporting period
- expenses - (Decimal) the expenses you’ve paid out during the statement’s reporting period
- closing_balance - (Decimal) the balance at the closing date of the statement
- blocked - (Decimal) amount ringfenced for payments out
- receivable - (Decimal) amount received but yet to clear/settle
- available_balance - (Decimal) amount of money available to withdraw at the closing date of the statement
- movements - (List of Movement) the individual cash movements (payments in or out) during the reporting period
A Movement is an individual transaction - for example an ATM withdrawal or Debit Card payment. Each Statement will have a list of Movement called movements for all the transactions during the reporting period. Each Movement has the following: * number - (int) id of the movement in the current statement * amount - (Decimal) amount of the thing * date_deducted - (datetime) the date the transaction was submitted originally * date_completed - (datetime) the date + time the transaction was finalised at * counterparty_account_number - (string) the account the payment was sent to or received from * counterparty_details - (string) information about the account the payment was sent to or received from, if available * narrative - (string) additional information about the transaction * transaction_type - (string) what type of transaction occurred * specific_symbol - (string) specific symbol for movement * variable_symbol - (string) variable symbol for movement * constant_symbol - (string) constant symbol for movement
The following script will attempt to parse all the statements in the ./rb directory, then take the closing balance and high/low water marks of each period and plot it on a graph.
#!/usr/bin/python # system/lib imports import os import numpy as np import matplotlib.pyplot as plt from matplotlib.dates import YearLocator, MonthLocator, DateFormatter, drange, date2num from numpy import arange # rbcz library from rbcz import * # load and sort the statements statements = sorted( rbcz.read_statements([ "./rb/" + f for f in os.listdir("./rb") ]), key=lambda stmt: stmt.from_date) # function to deterine high/low-water mark on account def high_low_water(stmt): bal = stmt.opening_balance hwm = bal lwm = bal for m in stmt.movements: bal += m.amount if bal > hwm: hwm = bal if bal < lwm: lwm = bal return (lwm, hwm) #plt.gca().set_color_cycle(['green', 'black', 'red']) # extract high/low-water marks water_marks = [ high_low_water(s) for s in statements ] low_water_marks = [ wm for wm in water_marks ] high_water_marks = [ wm for wm in water_marks ] # extract closing balance and dates closing_balances = [ s.closing_balance for s in statements ] dates = date2num([ s.from_date for s in statements ]) # prepare and display the chart using matplotlib y = arange(len(dates)*1.0) # plot the data fig, ax = plt.subplots() ax.set_color_cycle(['green', 'black', 'red']) ax.plot_date(dates, high_water_marks, "o-") ax.plot_date(dates, closing_balances, "o-") ax.plot_date(dates, low_water_marks, "o-") # fix up the axes ax.xaxis.set_major_locator(YearLocator()) ax.xaxis.set_minor_locator(MonthLocator()) ax.xaxis.set_major_formatter(DateFormatter('%Y-%m-%d')) ax.fmt_xdata = DateFormatter('%Y-%m-%d') fig.autofmt_xdate() # add a legend ax.legend(['highest', 'closing', 'lowest'], loc='upper left') plt.show()
Depending on the content of the bank statements this will generate a graph like the following:
- get coverage to 100%
- decide if error parsing an imap statement should be eaten, printed or an exception
- check if it’s possible to improve the parsing - there are a LOT of regexes that I throw around and it’s not pretty…
- check if anyone I know gets Czech statements, see if we can parse them too. Is there any other languages - German?
- check if it works for non-Czech-Republic Raiffeisen
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.