Skip to main content

Package for generating and evaluating patterns in quantitative reports

Project description

data-patterns

Pypi Version Build Status Documentation Status License

Package for generating and evaluating data-patterns in quantitative reports

Features

Here is what the package does:

  • Generating and evaluating patterns in structured datasets and exporting to Excel and JSON

  • Transforming generated patterns into Pandas code

Quick overview

To install the package

pip install data_patterns

To introduce the features of the this package define the following Pandas DataFrame:

df = pd.DataFrame(columns = ['Name',       'Type',             'Assets', 'TV-life', 'TV-nonlife' , 'Own funds', 'Excess'],
                  data   = [['Insurer  1', 'life insurer',     1000,     800,       0,             200,         200],
                            ['Insurer  2', 'non-life insurer', 4000,     0,         3200,          800,         800],
                            ['Insurer  3', 'non-life insurer', 800,      0,         700,           100,         100],
                            ['Insurer  4', 'life insurer',     2500,     1800,      0,             700,         700],
                            ['Insurer  5', 'non-life insurer', 2100,     0,         2200,          200,         200],
                            ['Insurer  6', 'life insurer',     9000,     8800,      0,             200,         200],
                            ['Insurer  7', 'life insurer',     9000,     0,         8800,          200,         200],
                            ['Insurer  8', 'life insurer',     9000,     8800,      0,             200,         200],
                            ['Insurer  9', 'non-life insurer', 9000,     0,         8800,          200,         200],
                            ['Insurer 10', 'non-life insurer', 9000,     0,         8800,          200,         199.99]])
df.set_index('Name', inplace = True)

Start by defining a PatternMiner:

miner = data_patterns.PatternMiner(df)

To generate patterns use the find-function of this object:

df_patterns = miner.find({'name'      : 'equal values',
                          'pattern'   : '=',
                          'parameters': {"min_confidence": 0.5,
                                         "min_support"   : 2,
                                         "decimal" : 8}})

The result is a DataFrame with the patterns that were found. The first part of the DataFrame now contains

id

pattern_id

pattern_def

support

exceptions

confidence

0

equal values

{Own funds} = {Excess}

9

1

0.9

The miner finds one patterns; it states that the ‘Own funds’-column is identical to the ‘Excess’-column in 9 of the 10 cases (with a confidence of 90 %, there is one case where the equal-pattern does not hold).

To analyze data with the generated set of data-patterns use the analyze function with the dataframe with the data as input:

df_results = miner.analyze(df)

The result is a DataFrame with the results. If we select result_type = False then the first part of the output contains

index

result_type

pattern_id

pattern_def

support

exceptions

confidence

P values

Q values

Insurer 10

False

equal values

{Own funds} = {Excess}

9

1

0.9

200

199.99

Other patterns you can use are ‘>’, ‘<’, ‘<=’, ‘>=’, ‘!=’, ‘sum’, and ‘–>’.

Read the documentation for more features.

Upload to Pypi (for developers)

  1. Change the version in setup.py and setup.cfg

  2. Go to github.com and navigate to the repository. Next, click on the tab “releases” and then on “Create a new release”. Now, define a Tag verion (it is best to use the same number as you used in your setup.py version-field: v0.1.15 for example). Then click on “publish release”.

  3. Make a Pypi account here: https://pypi.org/manage/projects/

  4. Download twine by typing in your command prompt:

    pip install twine
  5. Get admin rights of the owner of the data_patterns package.

  6. Delete the old files in the dist folder

  7. Open your command prompt and go to the folder of data_patterns. Then type

    python setup.py sdist

    twine upload dist/*

A good reference is here: https://medium.com/@joel.barmettler/how-to-upload-your-python-package-to-pypi-65edc5fe9c56

History

0.1.0 (2019-10-27)

  • Development release.

0.1.11 (2019-11-6)

  • First release on PyPI.

< 0.1.17 (2020-10-6)

Expression

You can now use expressions to find patterns. This is a string such as ‘{.*}={.*}’ (this one will find columns that are equal to eachother). See example in usage as how to do it, also with unknown values.

Patterns of the for IF THEN will be done through a pandas expression and quantitative patterns will be found using numpy (quicker). Expression will be split up in parts if it is quantitative

Function

Added the function correct_data. This corrects data based on the most common value if grouped with another column, e.g. changes the names in a column if there are multiple names per LEI code.

Other

  1. Added P and Q values to analyze

  2. highest_conf option to find the pattern with the highest conf based on P value.

  3. Possible to use with EVA2 rules

0.1.17 (2020-10-6)

Parameters

  1. ‘window’ (boolean): Only compares columns in a window of n, so [column-n, column+n].

  2. ‘disable’ (boolean): If you set this to True, it will disable all tqdm progress bars for finding and analyzing patterns.

  3. ‘expres’ (boolean): If you use an expression, it will only directly work with the expression if it is an IF THEN statement. Otherwise it is a quantitative pattern and it will be split up in parts and it uses numpy to find the patterns (this is quicker). However sometimes you want to work with an expression directly, such as the difference between two columns is lower than 5%. If you set expres to True, it will work directly with the expression.

    Expression

  1. You can use ABS in expressions. This calculates the absolute value. So something like ‘ABS({‘X’} - {‘Y’}) = {‘Z’})’

    cluster

  1. You can now add the column name on which you want to cluster

    Function

  1. Convert_to_time: merge periodes together by adding suffix to columns (t-1) and (t).

  2. convert_columns_to_time: Make the periods into columns so that you have years as columns.

    Other

  1. Add tqdm progress bars

0.1.18 (16-11-2020)

variables to miner

You can now add a boolean to the miner. If you give the boolean True to the miner, it will get rid of all the “ and ‘ in the string data. This is needed for some data where name have those characters in their name. This will give errors later on if not removed.

Function to read overzicht

Changed the IF THEN expression so that we can use decimals when numeric

Parameters

  1. ‘notNaN’ (boolean): Only takes not NaN columns

    Function changes

  1. Convert_to_time: add boolean set_year. If true then only use the years (this is for yearly data), otherwise keep whole date. Set to True standard

  2. update_statistics: Remove patterns that contain columns which are not in the data. This is necessary for some insurers so that they do not get errors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_patterns-0.1.18.tar.gz (37.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_patterns-0.1.18-py2.py3-none-any.whl (25.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file data_patterns-0.1.18.tar.gz.

File metadata

  • Download URL: data_patterns-0.1.18.tar.gz
  • Upload date:
  • Size: 37.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.30.0 CPython/3.7.0

File hashes

Hashes for data_patterns-0.1.18.tar.gz
Algorithm Hash digest
SHA256 5dd670e805c0437d828abb8a450a087b0cddfe0ed794c5a1e7fd947fc6fd4364
MD5 a2c6afc0b0c040c668fb326b5c7b38cf
BLAKE2b-256 444b19cca1e4004cc5b6d4e9255ad51c736644ac14025f2420b7dc25f4d71bce

See more details on using hashes here.

File details

Details for the file data_patterns-0.1.18-py2.py3-none-any.whl.

File metadata

  • Download URL: data_patterns-0.1.18-py2.py3-none-any.whl
  • Upload date:
  • Size: 25.9 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.51.0 CPython/3.8.0

File hashes

Hashes for data_patterns-0.1.18-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 13615401b0239f9ed00efe3b643035f4190dfdd7fc361684760fdcd68a8daff4
MD5 8b55e35f823873311dad0857b799b450
BLAKE2b-256 80318118f4f56dfd85946f7bff784d3ff298e91bb802be6b8687ffebfe1c6dec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page