data-patterns·PyPI

Package for generating and evaluating patterns in quantitative reports

These details have not been verified by PyPI

Project links

Homepage

Project description

data-patterns

Package for generating and evaluating data-patterns in quantitative reports

Free software: MIT/X license
Documentation: https://data-patterns.readthedocs.io.

Features

Here is what the package does:

Generating and evaluating patterns in structured datasets and exporting to Excel and JSON
Transforming generated patterns into Pandas code

Quick overview

To install the package

pip install data_patterns

To introduce the features of the this package define the following Pandas DataFrame:

df = pd.DataFrame(columns = ['Name',       'Type',             'Assets', 'TV-life', 'TV-nonlife' , 'Own funds', 'Excess'],
                  data   = [['Insurer  1', 'life insurer',     1000,     800,       0,             200,         200],
                            ['Insurer  2', 'non-life insurer', 4000,     0,         3200,          800,         800],
                            ['Insurer  3', 'non-life insurer', 800,      0,         700,           100,         100],
                            ['Insurer  4', 'life insurer',     2500,     1800,      0,             700,         700],
                            ['Insurer  5', 'non-life insurer', 2100,     0,         2200,          200,         200],
                            ['Insurer  6', 'life insurer',     9000,     8800,      0,             200,         200],
                            ['Insurer  7', 'life insurer',     9000,     0,         8800,          200,         200],
                            ['Insurer  8', 'life insurer',     9000,     8800,      0,             200,         200],
                            ['Insurer  9', 'non-life insurer', 9000,     0,         8800,          200,         200],
                            ['Insurer 10', 'non-life insurer', 9000,     0,         8800,          200,         199.99]])
df.set_index('Name', inplace = True)

Start by defining a PatternMiner:

miner = data_patterns.PatternMiner(df)

To generate patterns use the find-function of this object:

df_patterns = miner.find({'name'      : 'equal values',
                          'pattern'   : '=',
                          'parameters': {"min_confidence": 0.5,
                                         "min_support"   : 2,
                                         "decimal" : 8}})

The result is a DataFrame with the patterns that were found. The first part of the DataFrame now contains

id	pattern_id	pattern_def	support	exceptions	confidence
0	equal values	{Own funds} = {Excess}	9	1	0.9

The miner finds one patterns; it states that the ‘Own funds’-column is identical to the ‘Excess’-column in 9 of the 10 cases (with a confidence of 90 %, there is one case where the equal-pattern does not hold).

To analyze data with the generated set of data-patterns use the analyze function with the dataframe with the data as input:

df_results = miner.analyze(df)

The result is a DataFrame with the results. If we select result_type = False then the first part of the output contains

index	result_type	pattern_id	pattern_def	support	exceptions	confidence	P values	Q values
Insurer 10	False	equal values	{Own funds} = {Excess}	9	1	0.9	200	199.99

Other patterns you can use are ‘>’, ‘<’, ‘<=’, ‘>=’, ‘!=’, ‘sum’, and ‘–>’.

Read the documentation for more features.

Upload to Pypi (for developers)

Change the version in setup.py and setup.cfg
Go to github.com and navigate to the repository. Next, click on the tab “releases” and then on “Create a new release”. Now, define a Tag verion (it is best to use the same number as you used in your setup.py version-field: v0.1.15 for example). Then click on “publish release”.
Make a Pypi account here: https://pypi.org/manage/projects/
Download twine by typing in your command prompt:
```
pip install twine
```
Get admin rights of the owner of the data_patterns package.
Delete the old files in the dist folder
Open your command prompt and go to the folder of data_patterns. Then type

python setup.py sdist

twine upload dist/*

A good reference is here: https://medium.com/@joel.barmettler/how-to-upload-your-python-package-to-pypi-65edc5fe9c56

History

0.1.0 (2019-10-27)

Development release.

0.1.11 (2019-11-6)

First release on PyPI.

< 0.1.17 (2020-10-6)

Expression

You can now use expressions to find patterns. This is a string such as ‘{.*}={.*}’ (this one will find columns that are equal to eachother). See example in usage as how to do it, also with unknown values.

Patterns of the for IF THEN will be done through a pandas expression and quantitative patterns will be found using numpy (quicker). Expression will be split up in parts if it is quantitative

Function

Added the function correct_data. This corrects data based on the most common value if grouped with another column, e.g. changes the names in a column if there are multiple names per LEI code.

Other

Added P and Q values to analyze
highest_conf option to find the pattern with the highest conf based on P value.
Possible to use with EVA2 rules

0.1.17 (2020-10-6)

Parameters

‘window’ (boolean): Only compares columns in a window of n, so [column-n, column+n].
‘disable’ (boolean): If you set this to True, it will disable all tqdm progress bars for finding and analyzing patterns.
‘expres’ (boolean): If you use an expression, it will only directly work with the expression if it is an IF THEN statement. Otherwise it is a quantitative pattern and it will be split up in parts and it uses numpy to find the patterns (this is quicker). However sometimes you want to work with an expression directly, such as the difference between two columns is lower than 5%. If you set expres to True, it will work directly with the expression.

Expression

You can use ABS in expressions. This calculates the absolute value. So something like ‘ABS({‘X’} - {‘Y’}) = {‘Z’})’

cluster

You can now add the column name on which you want to cluster

Function

Convert_to_time: merge periodes together by adding suffix to columns (t-1) and (t).
convert_columns_to_time: Make the periods into columns so that you have years as columns.

Other

Add tqdm progress bars

0.1.18 (16-11-2020)

variables to miner

You can now add a boolean to the miner. If you give the boolean True to the miner, it will get rid of all the “ and ‘ in the string data. This is needed for some data where name have those characters in their name. This will give errors later on if not removed.

Function to read overzicht

Changed the IF THEN expression so that we can use decimals when numeric

Parameters

‘notNaN’ (boolean): Only takes not NaN columns

Function changes

Convert_to_time: add boolean set_year. If true then only use the years (this is for yearly data), otherwise keep whole date. Set to True standard
update_statistics: Remove patterns that contain columns which are not in the data. This is necessary for some insurers so that they do not get errors

0.1.19 (10-2-2020)

Bug fixes with expressions including regex

0.1.20 (29-4-2021)

Suppress Pandas slice error is some cases

Deleted logging.basicConfig (to avoid that initial config is overwritten)

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.24

Jan 11, 2022

0.1.23

Nov 17, 2021

0.1.22

Nov 17, 2021

0.1.21

Nov 14, 2021

0.1.20

Apr 29, 2021

0.1.19

Feb 10, 2021

0.1.18

Nov 16, 2020

0.1.17

Oct 6, 2020

0.1.16

Sep 1, 2020

0.1.15

Jul 30, 2020

0.1.14

Jul 30, 2020

0.1.13

Nov 20, 2019

0.1.12

Nov 12, 2019

0.1.11

Nov 6, 2019

0.1.6

Nov 8, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

data_patterns-0.1.24-py2.py3-none-any.whl (27.9 kB view details)

Uploaded Jan 11, 2022 Python 2Python 3

File details

Details for the file data_patterns-0.1.24-py2.py3-none-any.whl.

File metadata

Download URL: data_patterns-0.1.24-py2.py3-none-any.whl
Upload date: Jan 11, 2022
Size: 27.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.3.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/45.2.0.post20200210 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for data_patterns-0.1.24-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`be0bade28fd4b55458f1ba85c0db432d857485cc7f7119f30486c0acbd1d7acb`
MD5	`54bff2f13cbdad21b61915a73441de6b`
BLAKE2b-256	`a27c753ef9bd64dbdc706a2aa56211965bea0a71204fba31ffb7ba8bf77935be`

See more details on using hashes here.

data-patterns 0.1.24

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

data-patterns

Features

Quick overview

Upload to Pypi (for developers)

History

0.1.0 (2019-10-27)

0.1.11 (2019-11-6)

< 0.1.17 (2020-10-6)

0.1.17 (2020-10-6)

0.1.18 (16-11-2020)

0.1.19 (10-2-2020)

0.1.20 (29-4-2021)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes