th2_data_services_utils

Project description

Since v0.4 this repo use DS-core v2 structure and can be used only with DS-core v2

Table of Contents
1. Introduction
2. Getting started
3. API

1. Introduction

This repository is a set of auxiliary functions for th2-data-services library helping to solve the most common task.

Common tasks:

Aggregating data
Creating graphs
Searching
And other.

2. Getting started

2.1. Installation

From PyPI (pip)

This package can be found on PyPI.

pip install th2-data-services-utils

From Source

git clone https://github.com/th2-net/th2-data-services-utils
pip install th2-data-services-utils

2.2. Example

2.2.1. Aggregating

For our example, we're using this data.

raw_data = [
    {'attachedMessageIds': False, 'eventName': 'test run 1', 'successful': True,
        'time': datetime.datetime(2021, 1, 1, 1, 1, 1), 'type': 'Test Run'},
    {'attachedMessageIds': True, 'eventName': 'heartbeat', 'successful': True,
        'time': datetime.datetime(2021, 1, 1, 1, 10, 2), 'type': 'Heartbeat'},
    {'attachedMessageIds': False, 'eventName': 'test run 2', 'successful': False,
      'time': datetime.datetime(2021, 1, 1, 1, 2, 12), 'type': 'Test Run'},
      . . . . . 
]

aggregate_by_fields

A method aggregate records by fields. It's using Vaex library so using lazy method.

Here we group the data for 'type', 'successful' fields and count them.

>> data = aggregate_by_fields(raw_data, "type", "successful")
                                count
type            successful
Heartbeat       False           2
                True            1
Receive message False           2
                True            3
Send message    False           1
                True            4
Test Case       False           1
                True            4
Test Run        False           2
                True            1
Verification    False           1
                True            2

We can add a total row to the table.

>> data_total = aggregate_by_fields(raw_data, "type", "successful", total_row=True)
                                count
type            successful
Heartbeat       False           2.0
                True            1.0
Receive message False           2.0
                True            3.0
Send message    False           1.0
                True            4.0
Test Case       False           1.0
                True            4.0
Test Run        False           2.0
                True            1.0
Verification    False           1.0
                True            2.0
Total           Total           24.0

Or we can rotate the field for comfortable viewing.

>> data_pivot = aggregate_by_fields(raw_data, "type", "successful", pivot="successful")

successful       False  True
type
Heartbeat            2      1
Receive message      2      3
Send message         1      4
Test Case            1      4
Test Run             2      1
Verification         1      2

aggregate_several_group:

A method aggregate each field in the data separately. It's using Pandas library so for the big data can be problems with performance. The method can use DataFrame type for output and Html type for output.

Here we aggregate all fields in the data. This example we use DataFrame type.

>> output = aggregate_several_group(raw_data, display_html_df=False, receive_df=True).fillna("-")

                   time count             type  ... count attachedMessageIds count
0   2021-01-01 01:01:01   1.0        Heartbeat  ...     9              False    10
1   2021-01-01 01:01:59   1.0  Receive message  ...    15               True    14
2   2021-01-01 01:02:12   1.0     Send message  ...    24              Total    24
3   2021-01-01 01:03:54   1.0        Test Case  ...     -                  -     -
4   2021-01-01 01:04:30   1.0         Test Run  ...     -                  -     -
5   2021-01-01 01:10:02   1.0     Verification  ...     -                  -     -
6   2021-01-01 01:11:11   1.0            Total  ...     -                  -     -
7   2021-01-01 01:13:40   1.0                -  ...     -                  -     -
8   2021-01-01 01:23:23   1.0                -  ...     -                  -     -
9   2021-01-01 01:32:42   1.0                -  ...     -                  -     -
10  2021-01-01 01:33:12   1.0                -  ...     -                  -     -
11  2021-01-01 01:33:33   1.0                -  ...     -                  -     -
12  2021-01-01 01:40:10   1.0                -  ...     -                  -     -
13  2021-01-01 01:41:19   1.0                -  ...     -                  -     -
14  2021-01-01 01:43:43   1.0                -  ...     -                  -     -
15  2021-01-01 01:44:44   1.0                -  ...     -                  -     -
16  2021-01-01 01:45:22   1.0                -  ...     -                  -     -
17  2021-01-01 01:54:52   1.0                -  ...     -                  -     -
18  2021-01-01 01:55:55   1.0                -  ...     -                  -     -
19  2021-01-01 01:56:32   1.0                -  ...     -                  -     -
20  2021-01-01 02:10:01   1.0                -  ...     -                  -     -
21  2021-01-01 02:12:11   1.0                -  ...     -                  -     -
22  2021-01-01 02:12:32   1.0                -  ...     -                  -     -
23  2021-01-01 02:33:01   1.0                -  ...     -                  -     -
24                Total  24.0                -  ...     -                  -     -

aggregate_by_intervals_lazy:

A method aggregate all records in intervals. It's using Vaex library so using lazy method.

Here we aggregate records at 30-minute intervals.

# resolution is time series. every is frequency.
# m - Minute
# h - Hour
# D - Day
# W - Week
# M - Month
# Y - Year
>> output = aggregate_by_intervals_lazy(data_for_analyzing, "time", resolution="m", every=30)

time  count
0 2021-01-01 01:02:00      7
1 2021-01-01 01:32:00     11
2 2021-01-01 02:02:00      3
3 2021-01-01 02:32:00      1

aggregate_by_intervals:

A method aggregate records by specified fields in intervals. It's using Pandas library so for the big data can be problems with performance.

Here we aggregate records at a minute intervals. All alias intervals can be viewed here. Note that intervals of function aggregate_by_intervals_lazy and aggregate_by_intervals is different.

>> data = aggregate_by_intervals(data_for_analyzing, "time", "eventName", "type", intervals="1min")
                                                        count
time                eventName      type
2021-01-01 01:01:00 test case 3    Test Case            1
                    test run 1     Test Run             1
2021-01-01 01:02:00 test run 2     Test Run             1
2021-01-01 01:03:00 message        Send message         1
2021-01-01 01:04:00 test case 1    Test Case            1
2021-01-01 01:10:00 heartbeat      Heartbeat            1
2021-01-01 01:11:00 message 444    Receive message      1
2021-01-01 01:13:00 message123     Receive message      1
2021-01-01 01:23:00 message 333    Receive message      1
2021-01-01 01:32:00 test run 3     Test Case            1
2021-01-01 01:33:00 heartbeat      Heartbeat            1
                    test run 4     Test Run             1
2021-01-01 01:40:00 test case 4    Test Case            1
2021-01-01 01:41:00 message122     Receive message      1
2021-01-01 01:43:00 message 444    Send message         1
2021-01-01 01:44:00 message122     Send message         1
2021-01-01 01:45:00 verification32 Verification         1
2021-01-01 01:54:00 verification33 Verification         1
2021-01-01 01:55:00 message 333    Send message         1
2021-01-01 01:56:00 message 444    Receive message      1
2021-01-01 02:10:00 test case 2    Test Case            1
2021-01-01 02:12:00 heartbeat      Heartbeat            1
                    message123     Send message         1
2021-01-01 02:33:00 verification   Verification         1

Here we can also add a total row.

>> data = aggregate_by_intervals(data_for_analyzing, "time", "eventName", "type", intervals="min", total_row=True)
                                                        count
time                eventName      type
2021-01-01 01:01:00 test case 3    Test Case            1
                    test run 1     Test Run             1
2021-01-01 01:02:00 test run 2     Test Run             1
2021-01-01 01:03:00 message        Send message         1
2021-01-01 01:04:00 test case 1    Test Case            1
2021-01-01 01:10:00 heartbeat      Heartbeat            1
2021-01-01 01:11:00 message 444    Receive message      1
2021-01-01 01:13:00 message123     Receive message      1
2021-01-01 01:23:00 message 333    Receive message      1
2021-01-01 01:32:00 test run 3     Test Case            1
2021-01-01 01:33:00 heartbeat      Heartbeat            1
                    test run 4     Test Run             1
2021-01-01 01:40:00 test case 4    Test Case            1
2021-01-01 01:41:00 message122     Receive message      1
2021-01-01 01:43:00 message 444    Send message         1
2021-01-01 01:44:00 message122     Send message         1
2021-01-01 01:45:00 verification32 Verification         1
2021-01-01 01:54:00 verification33 Verification         1
2021-01-01 01:55:00 message 333    Send message         1
2021-01-01 01:56:00 message 444    Receive message      1
2021-01-01 02:10:00 test case 2    Test Case            1
2021-01-01 02:12:00 heartbeat      Heartbeat            1
                    message123     Send message         1
2021-01-01 02:33:00 verification   Verification         1
Total               Total          Total             24.0

Or we can rotate the field for comfortable viewing.

>> data = aggregate_by_intervals(data_for_analyzing, "time", "eventName", "type", intervals="min", pivot="type")

type                                Heartbeat  ...  Verification
time                eventName                  ...
2021-01-01 01:01:00 test case 3           NaN  ...           NaN
                    test run 1            NaN  ...           NaN
2021-01-01 01:02:00 test run 2            NaN  ...           NaN
2021-01-01 01:03:00 message               NaN  ...           NaN
2021-01-01 01:04:00 test case 1           NaN  ...           NaN
2021-01-01 01:10:00 heartbeat             1.0  ...           NaN
2021-01-01 01:11:00 message 444           NaN  ...           NaN
2021-01-01 01:13:00 message123            NaN  ...           NaN
2021-01-01 01:23:00 message 333           NaN  ...           NaN
2021-01-01 01:32:00 test run 3            NaN  ...           NaN
2021-01-01 01:33:00 heartbeat             1.0  ...           NaN
                    test run 4            NaN  ...           NaN
2021-01-01 01:40:00 test case 4           NaN  ...           NaN
2021-01-01 01:41:00 message122            NaN  ...           NaN
2021-01-01 01:43:00 message 444           NaN  ...           NaN
2021-01-01 01:44:00 message122            NaN  ...           NaN
2021-01-01 01:45:00 verification32        NaN  ...           1.0
2021-01-01 01:54:00 verification33        NaN  ...           1.0
2021-01-01 01:55:00 message 333           NaN  ...           NaN
2021-01-01 01:56:00 message 444           NaN  ...           NaN
2021-01-01 02:10:00 test case 2           NaN  ...           NaN
2021-01-01 02:12:00 heartbeat             1.0  ...           NaN
                    message123            NaN  ...           NaN
2021-01-01 02:33:00 verification          NaN  ...           1.0

2.3. Links

3. API

If you are looking for functions description see the API Documentation.

Project details

Release history Release notifications | RSS feed

0.4.0.dev11764705181 pre-release

Nov 10, 2024

This version

0.4.0.dev10898428522 pre-release

Sep 17, 2024

0.4.0.dev10898322172 pre-release

Sep 17, 2024

0.4.0.dev10898138034 pre-release

Sep 17, 2024

0.4.0.dev10888645878 pre-release

Sep 16, 2024

0.3.6.dev3749644272 pre-release

Dec 21, 2022

0.3.6.dev3749134211 pre-release

Dec 21, 2022

0.3.5

Mar 24, 2022

0.3.5.dev3799881022 pre-release

Dec 29, 2022

0.3.5.dev2797484311 pre-release

Aug 4, 2022

0.3.5.dev2795523642 pre-release

Aug 4, 2022

0.3.5.dev2792108984 pre-release

Aug 3, 2022

0.3.5.dev2790411284 pre-release

Aug 3, 2022

0.3.5.dev2659611866 pre-release

Jul 12, 2022

0.3.5.dev2659331595 pre-release

Jul 12, 2022

0.3.5.dev2035013009 pre-release

Mar 24, 2022

0.3.4

Dec 16, 2021

0.3.4.dev2034235696 pre-release

Mar 24, 2022

0.3.4.dev2034222800 pre-release

Mar 24, 2022

0.3.4.dev2029202142 pre-release

Mar 23, 2022

0.3.4.dev2028815465 pre-release

Mar 23, 2022

0.3.4.dev1760464473 pre-release

Jan 28, 2022

0.3.4.dev1586802199 pre-release

Dec 16, 2021

0.3.3

Nov 19, 2021

0.3.3.dev1578443612 pre-release

Dec 14, 2021

0.3.3.dev1480446130 pre-release

Nov 19, 2021

0.3.2

Sep 27, 2021

0.3.2.dev1463023321 pre-release

Nov 15, 2021

0.3.2.dev1447946253 pre-release

Nov 11, 2021

0.3.2.dev1447945134 pre-release

Nov 11, 2021

0.3.2.dev1447942786 pre-release

Nov 11, 2021

0.3.2.dev1399184381 pre-release

Oct 29, 2021

0.3.1

Sep 20, 2021

0.3.1.dev1261268414 pre-release

Sep 22, 2021

0.3.1.dev1261214033 pre-release

Sep 22, 2021

0.3.1.dev1261203577 pre-release

Sep 22, 2021

0.3.1.dev1261172534 pre-release

Sep 22, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

th2_data_services_utils-0.4.0.dev10898428522.tar.gz (18.2 kB view details)

Uploaded Sep 17, 2024 Source

File details

Details for the file th2_data_services_utils-0.4.0.dev10898428522.tar.gz.

File metadata

Download URL: th2_data_services_utils-0.4.0.dev10898428522.tar.gz
Upload date: Sep 17, 2024
Size: 18.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.9.20

File hashes

Hashes for th2_data_services_utils-0.4.0.dev10898428522.tar.gz
Algorithm	Hash digest
SHA256	`3bfc8a48b6894343d5d484d0a2a37fa91833f4bc8e50ec99ce1ea03634c87e8b`
MD5	`0fdb468fb408ac83462fa8e5ebe7f0f2`
BLAKE2b-256	`6f572fc0cd5ea3e3bc87701c25f3775ca86514c9c384b4e47d1a698064a600ac`