Skip to main content

Functions utils to perform data processing

Project description

dataprocess

dataprocess é um pacote Python que oferece utilitários simples e eficientes para o processamento e a limpeza de dados.

Recursos

  • Processamento de dados: Transforme dados utilizando funções dedicadas.
  • Limpeza de dados: Remova valores nulos e prepare dados para análise.
  • Estrutura modular para fácil extensão.

Instalação

Instale o pacote diretamente do repositório GitHub:

pip install etl-dataprocess

ou

git clone https://github.com/botlorien/dataprocess.git
cd dataprocess
pip install .

Exemplo de uso

from dataprocess import dataprocessing as hd


if __name__ == '__main__':

    def process_something_here():
        """Only a single example to use dataprocess"""
        # handle importation files verifying if .xlsx, .csv, .xls, .json, .txt
        # and returning its content as 'DataFrame' to (.xlsx, .csv, .xls), 'dict' to (.json) and 'str' to .txt
        # if only the directory folder was passed as argument it get the first file in that folder
        table = hd.import_file(PATH_DOWNLOADS)

        # clear all table removing white spaces and another trashes
        # and return a 'DataFrame' with all columns astype('str')
        table = hd.clear_table(table)

        # Now after the cleaning convert the columns to the apropriate types
        # it accepts a mapping argument "dtypes" to list columns to be cast to
        # 'datetime' and 'time'. Another common types as 'int', 'float' and 'str' are
        # handled automatically analysing its values.
        dtype = {
            'datetime':[
                'date_name_column' # replace it with the name of the column to be cast do 'datetime'
            ],
            'time':[
                'hour_and_minute_name_column' # replace it with the name of the column to be cast do 'time'
            ]
        }
        table = hd.convert_table_types(
            table,
            dtypes=dtype
        )
        print(table)
        print(table.info())
        return table

    process_something_here()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etl_dataprocess-0.2.5.tar.gz (17.1 kB view details)

Uploaded Source

File details

Details for the file etl_dataprocess-0.2.5.tar.gz.

File metadata

  • Download URL: etl_dataprocess-0.2.5.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for etl_dataprocess-0.2.5.tar.gz
Algorithm Hash digest
SHA256 c90ab0cc449af0355649161f62e76c77ef01e7e9adeae3177e81a8b20329cd3e
MD5 b132ae6952414b9461f281d60db143b0
BLAKE2b-256 bbb042bddcff54356e151215b1c9bcae0c9d8e0034d80322bb3da9b4df2fa0a8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page