Package for generating and evaluating patterns in quantitative reports
Project description
data-patterns
Package for generating and evaluating data-patterns in quantitative reports
Free software: MIT/X license
Documentation: https://data-patterns.readthedocs.io.
Features
Here is what the package does:
Generating and evaluating patterns in structured datasets and exporting to Excel and JSON
Transforming generated patterns into XBRL validation rules and Pandas code
Evaluating reporting data with data quality rules published by De Nederlandsche Bank (to be provided)
Quick overview
To install the package
pip install data_patterns
To introduce the features of the this package define the following Pandas DataFrame:
df = pd.DataFrame(columns = ['Name', 'Type', 'Assets', 'TV-life', 'TV-nonlife' , 'Own funds', 'Excess'], data = [['Insurer 1', 'life insurer', 1000, 800, 0, 200, 200], ['Insurer 2', 'non-life insurer', 4000, 0, 3200, 800, 800], ['Insurer 3', 'non-life insurer', 800, 0, 700, 100, 100], ['Insurer 4', 'life insurer', 2500, 1800, 0, 700, 700], ['Insurer 5', 'non-life insurer', 2100, 0, 2200, 200, 200], ['Insurer 6', 'life insurer', 9000, 8800, 0, 200, 200], ['Insurer 7', 'life insurer', 9000, 0, 8800, 200, 200], ['Insurer 8', 'life insurer', 9000, 8800, 0, 200, 200], ['Insurer 9', 'non-life insurer', 9000, 0, 8800, 200, 200], ['Insurer 10', 'non-life insurer', 9000, 0, 8800, 200, 199.99]]) df.set_index('Name', inplace = True)
Start by defining a PatternMiner:
miner = data_patterns.PatternMiner(df)
To generate patterns use the find-function of this object:
df_patterns = miner.find({'name' : 'equal values', 'pattern' : '=', 'parameters': {"min_confidence": 0.5, "min_support" : 2}})
The result is a DataFrame with the patterns that were found. The first part of the DataFrame now contains
id |
pattern_id |
P columns |
relation type |
Q columns |
support |
exceptions |
confidence |
---|---|---|---|---|---|---|---|
0 |
equal values |
[Own funds] |
= |
[Excess] |
9 |
1 |
0.9 |
1 |
equal values |
[Excess] |
= |
[Own funds] |
9 |
1 |
0.9 |
The miner finds two patterns; the first states that the ‘Own funds’-column is identical to the ‘Excess’-column in 9 of the 10 cases (with a confidence of 90 %, there is one case where the equal-pattern does not hold), and the second pattern is identical to the first but with the columns reversed.
We can also find patterns using expressions:
df_patterns = miner.find({'name' : 'equal values', 'expression' : '{.*.*}={.*.*}', 'parameters': {"min_confidence": 0.5, "min_support" : 2}})
This will give the same result.
Expressions can be written as followed:
Put it in a structure like above
Columns are given with ‘{}’, example: ‘{Assests} > 0’
If you want to find matches with columns you can do ‘{.*.*}’ (this will match all columns), example: ‘{.*TV.*} > 0’ (will match TV-life and TV-nonlife)
Conditional statements go with IF, THEN together with & and | (and/or), example: ‘IF ({.*TV-life.*} = 0) THEN ({.*TV-nonlife.*} = 8800) & {.*As.*} > 0)’ Note: AND is only used when you want the reverse of this statement, such as ‘IF ({.*TV-life.*} = 0) THEN ({.*TV-nonlife.*} = 8800) & {.*As.*} > 0) AND IF ({.*TV-life.*} = 0) THEN ~({.*TV-nonlife.*} = 8800) & {.*As.*} > 0)’
Use “@” if you do not have a specific value, example: ‘IF ({.*Ty.*} = “@”) THEN ({.*As.*} = “@”)’
To analyze data with the generated set of data-patterns use the analyze function with the dataframe with the data as input:
df_results = miner.analyze(df)
The result is a DataFrame with the results. If we select result_type = False then the first part of the output contains
index |
result_type |
pattern_id |
P columns |
relation type |
Q columns |
P values |
Q values |
Insurer 10 |
False |
equal values |
[Own funds] |
= |
[Excess] |
[200] |
[199.99] |
Insurer 10 |
False |
equal values |
[Excess] |
= |
[Own funds] |
[199.99] |
[200] |
Other patterns you can use are ‘>’, ‘<’, ‘<=’, ‘>=’, ‘!=’, ‘sum’, and ‘–>’.
Read the documentation for more features.
History
0.1.0 (2019-10-27)
Development release.
0.1.11 (2019-11-6)
First release on PyPI.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for data_patterns-0.1.14-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfbeb2795d5d3960c6d101b9725f652e02689b84af6bd342d227aeb7a2275ccf |
|
MD5 | 77c72bafe014da3bddb49b0a3229b529 |
|
BLAKE2b-256 | 3b5ee264f8cc3a0c0c8ce441efdc85a2edfd46b4c2142b07655a9e8dbaef3fd1 |