Skip to main content

Package for generating and evaluating patterns in quantitative reports

Project description

data-patterns

Pypi Version Build Status Documentation Status License

Package for generating and evaluating data-patterns in quantitative reports

Features

Here is what the package does:

  • Generating and evaluating patterns in structured datasets and exporting to Excel and JSON

  • Transforming generated patterns into XBRL validation rules and Pandas code

  • Evaluating reporting data with data quality rules published by De Nederlandsche Bank (to be provided)

Quick overview

To install the package

pip install data_patterns

To introduce the features of the this package define the following Pandas DataFrame:

df = pd.DataFrame(columns = ['Name',       'Type',             'Assets', 'TV-life', 'TV-nonlife' , 'Own funds', 'Excess'],
                  data   = [['Insurer  1', 'life insurer',     1000,     800,       0,             200,         200],
                            ['Insurer  2', 'non-life insurer', 4000,     0,         3200,          800,         800],
                            ['Insurer  3', 'non-life insurer', 800,      0,         700,           100,         100],
                            ['Insurer  4', 'life insurer',     2500,     1800,      0,             700,         700],
                            ['Insurer  5', 'non-life insurer', 2100,     0,         2200,          200,         200],
                            ['Insurer  6', 'life insurer',     9000,     8800,      0,             200,         200],
                            ['Insurer  7', 'life insurer',     9000,     0,         8800,          200,         200],
                            ['Insurer  8', 'life insurer',     9000,     8800,      0,             200,         200],
                            ['Insurer  9', 'non-life insurer', 9000,     0,         8800,          200,         200],
                            ['Insurer 10', 'non-life insurer', 9000,     0,         8800,          200,         199.99]])
df.set_index('Name', inplace = True)

Start by defining a PatternMiner:

miner = data_patterns.PatternMiner(df)

To generate patterns use the find-function of this object:

df_patterns = miner.find({'name'      : 'equal values',
                          'pattern'   : '=',
                          'parameters': {"min_confidence": 0.5,
                                         "min_support"   : 2}})

The result is a DataFrame with the patterns that were found. The first part of the DataFrame now contains

id

pattern_id

P columns

relation type

Q columns

support

exceptions

confidence

0

equal values

[Own funds]

=

[Excess]

9

1

0.9

1

equal values

[Excess]

=

[Own funds]

9

1

0.9

The miner finds two patterns; the first states that the ‘Own funds’-column is identical to the ‘Excess’-column in 9 of the 10 cases (with a confidence of 90 %, there is one case where the equal-pattern does not hold), and the second pattern is identical to the first but with the columns reversed.

We can also find patterns using expressions:

df_patterns = miner.find({'name'      : 'equal values',
                          'expression'   : '{.*.*}={.*.*}',
                          'parameters': {"min_confidence": 0.5,
                                         "min_support"   : 2}})

This will give the same result.

Expressions can be written as followed:

  1. Put it in a structure like above

  2. Columns are given with ‘{}’, example: ‘{Assests} > 0’

  3. If you want to find matches with columns you can do ‘{.*.*}’ (this will match all columns), example: ‘{.*TV.*} > 0’ (will match TV-life and TV-nonlife)

  4. Conditional statements go with IF, THEN together with & and | (and/or), example: ‘IF ({.*TV-life.*} = 0) THEN ({.*TV-nonlife.*} = 8800) & {.*As.*} > 0)’ Note: AND is only used when you want the reverse of this statement, such as ‘IF ({.*TV-life.*} = 0) THEN ({.*TV-nonlife.*} = 8800) & {.*As.*} > 0) AND IF ({.*TV-life.*} = 0) THEN ~({.*TV-nonlife.*} = 8800) & {.*As.*} > 0)’

  5. Use “@” if you do not have a specific value, example: ‘IF ({.*Ty.*} = “@”) THEN ({.*As.*} = “@”)’

To analyze data with the generated set of data-patterns use the analyze function with the dataframe with the data as input:

df_results = miner.analyze(df)

The result is a DataFrame with the results. If we select result_type = False then the first part of the output contains

index

result_type

pattern_id

P columns

relation type

Q columns

P values

Q values

Insurer 10

False

equal values

[Own funds]

=

[Excess]

[200]

[199.99]

Insurer 10

False

equal values

[Excess]

=

[Own funds]

[199.99]

[200]

Other patterns you can use are ‘>’, ‘<’, ‘<=’, ‘>=’, ‘!=’, ‘sum’, and ‘–>’.

Read the documentation for more features.

History

0.1.0 (2019-10-27)

  • Development release.

0.1.11 (2019-11-6)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_patterns-0.1.14.tar.gz (30.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_patterns-0.1.14-py2.py3-none-any.whl (20.8 kB view details)

Uploaded Python 2Python 3

File details

Details for the file data_patterns-0.1.14.tar.gz.

File metadata

  • Download URL: data_patterns-0.1.14.tar.gz
  • Upload date:
  • Size: 30.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/40.2.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.0

File hashes

Hashes for data_patterns-0.1.14.tar.gz
Algorithm Hash digest
SHA256 df836da35e60b961bea6098f42418269b2b6f501b32fc63e710ae7078bcbd835
MD5 001194c068e0d404b925759a7c32c053
BLAKE2b-256 2fd814ef3dca9d2fe89ddd705778e49f77fb9a629a6887de6dbf172a0e43bc17

See more details on using hashes here.

File details

Details for the file data_patterns-0.1.14-py2.py3-none-any.whl.

File metadata

  • Download URL: data_patterns-0.1.14-py2.py3-none-any.whl
  • Upload date:
  • Size: 20.8 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.0

File hashes

Hashes for data_patterns-0.1.14-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 dfbeb2795d5d3960c6d101b9725f652e02689b84af6bd342d227aeb7a2275ccf
MD5 77c72bafe014da3bddb49b0a3229b529
BLAKE2b-256 3b5ee264f8cc3a0c0c8ce441efdc85a2edfd46b4c2142b07655a9e8dbaef3fd1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page