Skip to main content

Cuoco is a tool for automatic data preprocessing. Cuoco comes from Italy, means chef.

Project description

CUOCO

Cuoco is a tool for automatic processing of data.

Example

Import the library

import cuoco

from cuoco import dataPipeline

Use the dataPipeline

dataPipeline.readJson('/content/biostats.csv', '/content/jsonTESTFILE.json')

Documentation

How it works: Cuoco uses a json created by the user to automatically apply data-processing functions to the desired dataset. The Json has the next values:

  • input_format: format of the input dataset. Can be csv, parquet, orc and txt
  • output_format: format of the resulted dataset. Can be csv, parquet, orc and txt
  • new_fileName: name of the new dataset the DataChef will write
  • new_file_route: route where to store the new data file
  • index: if you want your final dataset to have a row index. Can be:
    • yes
    • no
  • header: if yor datasets has a header. Can be yes or none
  • separator: the separator of your dataset. Only applies if its csv o txt format.
  • num_nans: method you want to use against possible numerical nans (include empties). Can be:
    • drop: drop rows that contains nans
    • yes: dont do anything with rows that contains nans
    • mean: fill nans with the mean value of the column
    • median: fill nans with the median value of the column
    • mode: fill nans with the mode value of the column
  • str_nans: method you want to use against possible string nans (include empties). Can be:
    • yes: keep nans columns
    • no: drop nans columns
  • caps: method you want to use with strings that contains Upper and Lower case letters:
    • no: dont do anything
    • upper: put all strings of string columns to uppercase
    • lower: put all strings of string columns to lowercase
  • normalize_method: method to use to normalize numerical columns. Can be:
    • no: dont normalize
    • max_abs: uses max absolute value to normalize
    • min_max: uses min - max value method to normalize
    • z_score: uses z-score value method to normalize
  • normalize:
    • write the name of the columns you want to normalize
    • Note: if yor dataset does not have a header, you must write the columns's names you want to normalize in number format, if it has a header you must write the columns's names between ""
  • balance_data: if you want to balance your data (recomended for AI datasets). Can be:
    • yes
    • no
  • Inside balance_params there are two items:
    • balance_method: mehod you want for oversampling. Can be:
      • random: random oversampling
      • smote: perform SMOTE technique for oversampling.
    • y_col: column of the dataset you want to use as target for the balance

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cuoco-0.1.5.tar.gz (18.0 kB view hashes)

Uploaded Source

Built Distribution

cuoco-0.1.5-py3-none-any.whl (18.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page