Skip to main content

Functions utils to perform data processing

Project description

dataprocess

dataprocess é um pacote Python que oferece utilitários simples e eficientes para o processamento e a limpeza de dados.

Recursos

  • Processamento de dados: Transforme dados utilizando funções dedicadas.
  • Limpeza de dados: Remova valores nulos e prepare dados para análise.
  • Estrutura modular para fácil extensão.

Instalação

Instale o pacote diretamente do repositório GitHub:

pip install etl-dataprocess

ou

git clone https://github.com/botlorien/dataprocess.git
cd dataprocess
pip install .

Exemplo de uso

from dataprocess import dataprocessing as hd


if __name__ == '__main__':

    def process_something_here():
        """Only a single example to use dataprocess"""
        # handle importation files verifying if .xlsx, .csv, .xls, .json, .txt
        # and returning its content as 'DataFrame' to (.xlsx, .csv, .xls), 'dict' to (.json) and 'str' to .txt
        # if only the directory folder was passed as argument it get the first file in that folder
        table = hd.import_file(PATH_DOWNLOADS)

        # clear all table removing white spaces and another trashes
        # and return a 'DataFrame' with all columns astype('str')
        table = hd.clear_table(table)

        # Now after the cleaning convert the columns to the apropriate types
        # it accepts a mapping argument "dtypes" to list columns to be cast to
        # 'datetime' and 'time'. Another common types as 'int', 'float' and 'str' are
        # handled automatically analysing its values.
        dtype = {
            'datetime':[
                'date_name_column' # replace it with the name of the column to be cast do 'datetime'
            ],
            'time':[
                'hour_and_minute_name_column' # replace it with the name of the column to be cast do 'time'
            ]
        }
        table = hd.convert_table_types(
            table,
            dtypes=dtype
        )
        print(table)
        print(table.info())
        return table

    process_something_here()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etl_dataprocess-0.1.3.tar.gz (17.1 kB view details)

Uploaded Source

File details

Details for the file etl_dataprocess-0.1.3.tar.gz.

File metadata

  • Download URL: etl_dataprocess-0.1.3.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.2

File hashes

Hashes for etl_dataprocess-0.1.3.tar.gz
Algorithm Hash digest
SHA256 16db593858efcf3917815b11b355be39be80fbd5d322c20751840544e63a507e
MD5 49c39c7585be44d69ebbb51337959cb0
BLAKE2b-256 aba561f153a1d56916c1619b5014b2cc0ab7c6c8f516031fa3f3f5693ab91469

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page