Skip to main content

Variou1s utilities for IBIS applications in data science and engineering

Project description

i38e-utils

i38e-utils is a collection of utility functions and classes that I use in my projects. It is a work in progress and will be updated as I add more functionality.

Currently, it includes the following:

  1. DfHelper: A class designed to facilitate data handling and operations within a Django project, particularly focusing on loading data from both parquet files and a database, and potentially saving data to parquet format.
  2. GeoPyHelper: A class that provides a set of utility functions for working with GeoPy.
  3. OsmxHelper: A class that provides a set of utility functions for working with Osmnx.
  4. data_utils: A set of utility functions/classes for working with data.
  5. date_utils: A set of utility functions for working with dates.
  6. df_utils: A set of utility functions for working with pandas DataFrames.
  7. file_utils: A set of utility functions for working with files.
  8. log_utils: A set of utility functions for working with logs.

Installation

To install this project, follow these steps:

pip install i38e-utils

Usage

DfHelper: Dataframe Helper Class

Scenarios:

  • Connect to a database table using a Django's ORM connection, query, transform and convert the data to a pandas DataFrame.
import pandas as pd
import numpy as np
from i38e_utils.df_helper import DfHelper

phone_mobile_gps_fields = {
    'id_tracking': 'id',
    'id_producto': 'product_id',
    'pk_empleado': 'associate_id',
    'latitud': 'latitude',
    'longitud': 'longitude',
    'fecha_hora_servidor': 'server_dt',
    'fecha_hora': 'date_time',
    'accion': 'action',
    'descripcion': 'description',
    'imei': 'imei'
}


class GpsCube(DfHelper):
    df: pd.DataFrame = None
    live: bool = False
    save_parquet = True
    
    config={
        'connection_name': 'replica',
        'table': 'asm_tracking_movil_gps',
        'field_map': phone_mobile_gps_fields,
        'legacy_filters': True,
    }

    def __init__(self, **opts):
        config = {**self.config, **opts}
        super().__init__(**config)
        
    def load(self, **kwargs):
        self.df = super().load(**kwargs)
        self.fix_data()
        return self.df

    def fix_data(self):
        self.df['latitude'] = self.df['latitude'].astype(np.float64)
        self.df['longitude'] = self.df['longitude'].astype(np.float64)```python

gps_cube=GpsCube(live=True, debug=False)
df=gps_cube.load(date_time__date='2023-03-04')
# to save to a parquet file
gps_cube.save_to_parquet(df, parquet_full_path='gpscube.parquet')
  • Use a parquet storage file or folder structure to load data and perform some transformations.
import pandas as pd
from i38e_utils.df_helper import DfHelper

class GpsParquetCube(DfHelper):
    df: pd.DataFrame = None
    
    config={
        'use_parquet': True,
        'df_as_dask': True,
        'parquet_storage_path': '/storage/data/parquet/gps',
        'parquet_start_date': '2024-01-01',
        'parquet_end_date': '2024-03-31',
    }

    def __init__(self, **opts):
        config = {**self.config, **opts}
        super().__init__(**config)
        
    def load(self, **kwargs):
        self.df = super().load(**kwargs)
        return self.df


# The following example would load all the parquet files in the folder structure described in parquet_storage_path matching the date range and return a single dask dataframe for associate_id 27 for the month of March.
# The class converts Django style filters to dask compatible filters.
# The class also converts the parquet files to a dask dataframe for faster processing.

params = {
    'associate_id': 27,
    'date_time__date__range': ['2024-03-01','2024-03-31']
}

dask_df = GpsParquetCube().load(**params)
# to convert to a pandas dataframe
df = dask_df.compute()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

i38e_utils-1.0.37.tar.gz (27.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

i38e_utils-1.0.37-py3-none-any.whl (31.1 kB view details)

Uploaded Python 3

File details

Details for the file i38e_utils-1.0.37.tar.gz.

File metadata

  • Download URL: i38e_utils-1.0.37.tar.gz
  • Upload date:
  • Size: 27.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.2 Darwin/23.4.0

File hashes

Hashes for i38e_utils-1.0.37.tar.gz
Algorithm Hash digest
SHA256 e5f07e6743850de4b17cc58fdc36b8304af98d02a468eb1ecff7bd3776f8d963
MD5 dcf4673d6c5c2fe81e8197ea2213b1c3
BLAKE2b-256 42b5c490e5db98b8fadb1917143eff9ccc35d2a31a7ce7124aad7db205352845

See more details on using hashes here.

File details

Details for the file i38e_utils-1.0.37-py3-none-any.whl.

File metadata

  • Download URL: i38e_utils-1.0.37-py3-none-any.whl
  • Upload date:
  • Size: 31.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.2 Darwin/23.4.0

File hashes

Hashes for i38e_utils-1.0.37-py3-none-any.whl
Algorithm Hash digest
SHA256 2228b7161dd2dc2c53901a35579d7d527056b2e0b06ca401d8e8a69cae3ae34d
MD5 b747524fde96a9714e4b7a4ddb581e89
BLAKE2b-256 b9eb8e76cc61ca48e0e0fcc7c3b896c609e70b8010a724bcd02c62514b0684a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page