Skip to main content

Functions utils to perform data processing

Project description

dataprocess

dataprocess é um pacote Python que oferece utilitários simples e eficientes para o processamento e a limpeza de dados.

Recursos

  • Processamento de dados: Transforme dados utilizando funções dedicadas.
  • Limpeza de dados: Remova valores nulos e prepare dados para análise.
  • Estrutura modular para fácil extensão.

Instalação

Instale o pacote diretamente do repositório GitHub:

pip install etl-dataprocess

ou

git clone https://github.com/botlorien/dataprocess.git
cd dataprocess
pip install .

Exemplo de uso

from dataprocess import dataprocessing as hd


if __name__ == '__main__':

    def process_something_here():
        """Only a single example to use dataprocess"""
        # handle importation files verifying if .xlsx, .csv, .xls, .json, .txt
        # and returning its content as 'DataFrame' to (.xlsx, .csv, .xls), 'dict' to (.json) and 'str' to .txt
        # if only the directory folder was passed as argument it get the first file in that folder
        table = hd.import_file(PATH_DOWNLOADS)

        # clear all table removing white spaces and another trashes
        # and return a 'DataFrame' with all columns astype('str')
        table = hd.clear_table(table)

        # Now after the cleaning convert the columns to the apropriate types
        # it accepts a mapping argument "dtypes" to list columns to be cast to
        # 'datetime' and 'time'. Another common types as 'int', 'float' and 'str' are
        # handled automatically analysing its values.
        dtype = {
            'datetime':[
                'date_name_column' # replace it with the name of the column to be cast do 'datetime'
            ],
            'time':[
                'hour_and_minute_name_column' # replace it with the name of the column to be cast do 'time'
            ]
        }
        table = hd.convert_table_types(
            table,
            dtypes=dtype
        )
        print(table)
        print(table.info())
        return table

    process_something_here()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etl_dataprocess-0.2.4.tar.gz (17.1 kB view details)

Uploaded Source

File details

Details for the file etl_dataprocess-0.2.4.tar.gz.

File metadata

  • Download URL: etl_dataprocess-0.2.4.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for etl_dataprocess-0.2.4.tar.gz
Algorithm Hash digest
SHA256 59c6d700fc5dc8cfd5915e9471dc7b45f949972e1cdabd2c9b59ed97128e64aa
MD5 1f96cdc38d2841dda00b1798ab24be32
BLAKE2b-256 bee75d54da732704e13c03a5e30449a5340140b033b2d3a367ddb0e5ddb54a83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page