Utilities for IBIS applications in data science and engineering

These details have not been verified by PyPI

Project description

i38e-utils

i38e-utils is a collection of utility functions and classes that I use in my BI projects. It is a work in progress and will be updated as I add more functionality.

The utilities are designed to work with Django, OpenStreetMaps and NetworkX

Currently, it includes the following:

DfHelper: A class designed to facilitate data handling and operations within a Django project, particularly focusing on loading data from both parquet files and a database, and potentially saving data to parquet format.
GeoLocationService: A class that provides a set of utility functions for working with GeoPy and Nominatim.
OsmxHelper: A class that provides a set of utility functions for working with Osmnx maps.
data_utils: A set of utility functions/classes for working with data.
date_utils: A set of utility functions for working with dates.
df_utils: A set of utility functions for working with pandas DataFrames.
file_utils: A set of utility functions for working with files.
log_utils: A set of utility functions for working with logs.

Installation

To install this project, follow these steps:

pip install i38e-utils

Usage

DfHelper: Dataframe Helper Class

DfHelper is designed to be subclassed. For example, the following use case, connects to a table containing gps transactions and encapsulates data cleaning operations. The resulting object can be queried via the "load" method using Django's query language syntax. The object can also be instantiated in debug and verbose mode.

The object returns Dataframe objects either as pandas (by default) or dask. It is recommended to use dask for large datasets which may benefit from dask parallelization architecture. Scenarios:

Connect to a database table using a Django's ORM connection, query, transform and convert the data to a pandas DataFrame.

import pandas as pd
import numpy as np
from i38e_utils.df_helper import DfHelper

phone_mobile_gps_fields = {
    'id_tracking': 'id',
    'id_producto': 'product_id',
    'pk_empleado': 'associate_id',
    'latitud': 'latitude',
    'longitud': 'longitude',
    'fecha_hora_servidor': 'server_dt',
    'fecha_hora': 'date_time',
    'accion': 'action',
    'descripcion': 'description',
    'imei': 'imei'
}


class GpsCube(DfHelper):
    df: pd.DataFrame = None
    live: bool = False
    save_parquet = True
    
    config={
        'connection_name': 'replica',
        'table': 'asm_tracking_movil_gps',
        'field_map': phone_mobile_gps_fields,
        'legacy_filters': True,
    }

    def __init__(self, **opts):
        config = {**self.config, **opts}
        super().__init__(**config)
        
    def load(self, **kwargs):
        self.df = super().load(**kwargs)
        self.fix_data()
        return self.df

    def fix_data(self):
        self.df['latitude'] = self.df['latitude'].astype(np.float64)
        self.df['longitude'] = self.df['longitude'].astype(np.float64)```python

gps_cube=GpsCube(live=True, debug=False,df_as_dask=True)
df=gps_cube.load(date_time__date='2023-03-04').compute()
# to save to a parquet file
gps_cube.save_to_parquet(df, parquet_full_path='gpscube.parquet')

Use a parquet storage file or folder structure to load data and perform some transformations.

import pandas as pd
from i38e_utils.df_helper import DfHelper

class GpsParquetCube(DfHelper):
    df: pd.DataFrame = None
    
    config={
        'use_parquet': True,
        'df_as_dask': True,
        'parquet_storage_path': '/storage/data/parquet/gps',
        'parquet_start_date': '2024-01-01',
        'parquet_end_date': '2024-03-31',
    }

    def __init__(self, **opts):
        config = {**self.config, **opts}
        super().__init__(**config)
        
    def load(self, **kwargs):
        self.df = super().load(**kwargs)
        return self.df


# The following example would load all the parquet files in the folder structure described in parquet_storage_path matching the date range and return a single dask dataframe for associate_id 27 for the month of March.
# The class converts Django style filters to dask compatible filters.
# The class also converts the parquet files to a dask dataframe for faster processing.

params = {
    'associate_id': 27,
    'date_time__date__range': ['2024-03-01','2024-03-31']
}

dask_df = GpsParquetCube().load(**params)
# to convert to a pandas dataframe
df = dask_df.compute()

Usage

osmnx_helper: Base Map and Utilities

Use case: Create a heat map with time using a DfHelper cube with gps data

from i38e_utils.osmnx_helper import BaseOsmMap
from i38e_utils.osmnx_helper.utils import get_graph
import folium

options = {
    'ox_files_save_path': 'path/to/pbf/files',
    'network_type': 'all',
    'place': 'Costa Rica',
    'files_prefix': 'costa-rica-',
    'rebuild': False,
    'verbose': False
}

class ActivityHeatMapWithTime(BaseOsmMap):
    def __init__(self, df, **kwargs):
        kwargs.setdefault('dt_field', 'date_time')
        G, _, _ = get_graph(**options)
        self.heat_time_index = []
        super().__init__(G, df, **kwargs)

    def process_map(self):
        self.heat_time_index = sorted(list(self.df[self.dt_field].dt.hour.unique()))
        heat_data_time = [[[row[self.lat_col], row[self.lon_col]] for index, row in
                           self.df[self.df[self.dt_field].apply(lambda x: x.hour == j)].iterrows()] for j in self.heat_time_index]

        hm = folium.plugins.HeatMapWithTime(heat_data_time, index=self.heat_time_index)
        # hm = HeatMap(gps_points)
        hm.add_to(self.osm_map)

to create a heatmap using a Dataframe of GPS Data

df=GpsCube().load(date_time__date="2024-06-30")
map_options={}
map_options.setdefault("map_html_title","Activity Heatmap")
map_options.setdefault("dt_field", "date_time")
map_options.setdefault("max_bounds", False)
heat_map=ActivityHeatMapWithTime(df, **map_options)
heat_map.generate_map()

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.48

Oct 17, 2024

1.0.47

Oct 17, 2024

1.0.46

Oct 16, 2024

1.0.45

Oct 16, 2024

1.0.44

Aug 23, 2024

1.0.43

Aug 9, 2024

1.0.42

Aug 7, 2024

1.0.41

Jul 26, 2024

1.0.40

Apr 24, 2024

1.0.39

Apr 24, 2024

1.0.38

Apr 22, 2024

1.0.37

Apr 19, 2024

1.0.36

Apr 18, 2024

1.0.35

Apr 16, 2024

1.0.34

Apr 6, 2024

1.0.33

Apr 5, 2024

1.0.32

Apr 5, 2024

1.0.31

Apr 5, 2024

1.0.30

Apr 5, 2024

1.0.29

Apr 1, 2024

1.0.28

Mar 19, 2024

1.0.27

Mar 19, 2024

1.0.26

Mar 19, 2024

1.0.25

Mar 19, 2024

1.0.24

Mar 19, 2024

1.0.23

Mar 18, 2024

1.0.22

Mar 18, 2024

1.0.21

Mar 18, 2024

1.0.20

Mar 18, 2024

1.0.19

Mar 15, 2024

1.0.18

Mar 15, 2024

1.0.17

Mar 15, 2024

1.0.16

Mar 14, 2024

1.0.15

Mar 14, 2024

1.0.14

Mar 12, 2024

1.0.13

Mar 12, 2024

1.0.12

Mar 12, 2024

1.0.11

Mar 11, 2024

1.0.10

Mar 11, 2024

1.0.9

Mar 11, 2024

1.0.8

Mar 11, 2024

1.0.7

Mar 11, 2024

1.0.6

Mar 8, 2024

1.0.5

Mar 8, 2024

1.0.4

Mar 7, 2024

1.0.3

Mar 7, 2024

1.0.2

Mar 5, 2024

1.0.1

Jan 19, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

i38e_utils-1.0.48.tar.gz (34.4 kB view details)

Uploaded Oct 17, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

i38e_utils-1.0.48-py3-none-any.whl (40.0 kB view details)

Uploaded Oct 17, 2024 Python 3

File details

Details for the file i38e_utils-1.0.48.tar.gz.

File metadata

Download URL: i38e_utils-1.0.48.tar.gz
Upload date: Oct 17, 2024
Size: 34.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.2 CPython/3.11.2 Darwin/24.0.0

File hashes

Hashes for i38e_utils-1.0.48.tar.gz
Algorithm	Hash digest
SHA256	`c5c7825975bb4e71649dca2f3acf6771c4eb55eab2cb2db4ce6a39e3b304492b`
MD5	`715dc0f86e282f7fd48280ec6847f3a6`
BLAKE2b-256	`1e2c8b5ac66637ccda006b4edba0fddfd982af58e80541d727339cd6135e6bb5`

See more details on using hashes here.

File details

Details for the file i38e_utils-1.0.48-py3-none-any.whl.

File metadata

Download URL: i38e_utils-1.0.48-py3-none-any.whl
Upload date: Oct 17, 2024
Size: 40.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.8.2 CPython/3.11.2 Darwin/24.0.0

File hashes

Hashes for i38e_utils-1.0.48-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a9b9d840a4468b0b4da79aaf08815e43bb0dd0d3b572f0bbdf282c523c1cbc54`
MD5	`62e19a7a0c34cd26c0973f4f0a46cc08`
BLAKE2b-256	`30e506a3ee694a73f1a050710cc9711bf4afe6d41fae2466699ea3e57ab815a6`

See more details on using hashes here.

i38e-utils 1.0.48

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

i38e-utils

Installation

Usage

DfHelper: Dataframe Helper Class

Usage

osmnx_helper: Base Map and Utilities

Use case: Create a heat map with time using a DfHelper cube with gps data

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes