Skip to main content

Functions utils to perform data processing

Project description

dataprocess

dataprocess é um pacote Python que oferece utilitários simples e eficientes para o processamento e a limpeza de dados.

Recursos

  • Processamento de dados: Transforme dados utilizando funções dedicadas.
  • Limpeza de dados: Remova valores nulos e prepare dados para análise.
  • Estrutura modular para fácil extensão.

Instalação

Instale o pacote diretamente do repositório GitHub:

pip install etl-dataprocess

ou

git clone https://github.com/botlorien/dataprocess.git
cd dataprocess
pip install .

Exemplo de uso

from dataprocess import dataprocessing as hd


if __name__ == '__main__':

    def process_something_here():
        """Only a single example to use dataprocess"""
        # handle importation files verifying if .xlsx, .csv, .xls, .json, .txt
        # and returning its content as 'DataFrame' to (.xlsx, .csv, .xls), 'dict' to (.json) and 'str' to .txt
        # if only the directory folder was passed as argument it get the first file in that folder
        table = hd.import_file(PATH_DOWNLOADS)

        # clear all table removing white spaces and another trashes
        # and return a 'DataFrame' with all columns astype('str')
        table = hd.clear_table(table)

        # Now after the cleaning convert the columns to the apropriate types
        # it accepts a mapping argument "dtypes" to list columns to be cast to
        # 'datetime' and 'time'. Another common types as 'int', 'float' and 'str' are
        # handled automatically analysing its values.
        dtype = {
            'datetime':[
                'date_name_column' # replace it with the name of the column to be cast do 'datetime'
            ],
            'time':[
                'hour_and_minute_name_column' # replace it with the name of the column to be cast do 'time'
            ]
        }
        table = hd.convert_table_types(
            table,
            dtypes=dtype
        )
        print(table)
        print(table.info())
        return table

    process_something_here()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

etl_dataprocess-0.2.7.tar.gz (17.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

etl_dataprocess-0.2.7-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file etl_dataprocess-0.2.7.tar.gz.

File metadata

  • Download URL: etl_dataprocess-0.2.7.tar.gz
  • Upload date:
  • Size: 17.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for etl_dataprocess-0.2.7.tar.gz
Algorithm Hash digest
SHA256 30456021376db4f45e8de4a52399819429291df40fb11de0eabee881c2470cc5
MD5 ef7f28e8f1eec7d7bdd4627041c069c5
BLAKE2b-256 b56aaf8108192d66e8a1c48670e1b9445bca75c193eccf18732a4ea442c58b41

See more details on using hashes here.

File details

Details for the file etl_dataprocess-0.2.7-py3-none-any.whl.

File metadata

File hashes

Hashes for etl_dataprocess-0.2.7-py3-none-any.whl
Algorithm Hash digest
SHA256 2be9a56394fc3b08cbbdbbce94bbaa5f07b03e3915e357f2521f72e014a55107
MD5 7e817f3336f5ccf1a617ee785f46430f
BLAKE2b-256 dc4cb13e092cd5490d0a85824a2359eea8eb2faa012cc6315664bc617830e36d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page