Skip to main content

Cuoco is a tool for automatic data preprocessing. Cuoco comes from Italy, means chef.

Project description

DataChef

DataChef is a tool for automatic processing of data.

Documentation

For installing do: pip install datachef

How it works: DataChef uses a json created by the user to automatically apply data-processing functions to the desired dataset. The Json has the next values:

  • input_format: format of the input dataset. Can be csv, parquet, orc and txt
  • output_format: format of the resulted dataset. Can be csv, parquet, orc and txt
  • new_fileName: name of the new dataset the DataChef will write
  • new_file_route: route where to store the new data file
  • header: if yor datasets has a header. Can be yes or none
  • separator: the separator of your dataset. Only applies if its csv o txt format.
  • num_nans: method you want to use against possible numerical nans (include empties). Can be:
    • drop: drop rows that contains nans
    • yes: dont do anything with rows that contains nans
    • mean: fill nans with the mean value of the column
    • median: fill nans with the median value of the column
    • mode: fill nans with the mode value of the column
  • str_nans: method you want to use against possible string nans (include empties). Can be:
    • yes: keep nans columns
    • no: drop nans columns
  • caps: method you want to use with strings that contains Upper and Lower case letters:
    • no: dont do anything
    • upper: put all strings of string columns to uppercase
    • lower: put all strings of string columns to lowercase
  • normalize_method: method to use to normalize numerical columns. Can be:
    • no: dont normalize
    • max_abs: uses max absolute value to normalize
    • min_max: uses min - max value method to normalize
    • z_score: uses z-score value method to normalize
  • normalize:
    • write the name of the columns you want to normalize
    • Note: if yor dataset does not have a header, you must write the columns's names you want to normalize in number format, if it has a header you must write the columns's names between ""

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cuoco-0.0.1.tar.gz (17.0 kB view hashes)

Uploaded Source

Built Distribution

cuoco-0.0.1-py3-none-any.whl (18.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page