Skip to main content

Cuoco is a tool for automatic data preprocessing. Cuoco comes from Italy, means chef.

Project description

CUOCO

Cuoco is a tool for automatic processing of data.

Example

Json example:

{
  "input_format": "csv",
  "output_format": "csv",
  "new_fileName": "new file",
  "new_file_route": "path/you/want/to/save/the/file",
  "index": "True",
  "header": "yes",
  "separator": ",",
  "num_nans": "mean",
  "str_nans": "yes",
  "caps": "lower",
  "normalize_method": "min_max",
  "normalize": [
    "Age"
  ],
  "balance_data": "yes",
  "balance_params": {
    "balance_method": "random",
    "y_col": "Age"
  }
}

Import the library

import cuoco
from cuoco import dataPipeline

Use the dataPipeline

dataPipeline.readJson('/content/biostats.csv', '/content/jsonTESTFILE.json')

Documentation

How it works: Cuoco uses a json created by the user to automatically apply data-processing functions to the desired dataset. The Json has the next values:

  • input_format: format of the input dataset. Can be csv, parquet, orc and txt
  • output_format: format of the resulted dataset. Can be csv, parquet, orc and txt
  • new_fileName: name of the new dataset the DataChef will write
  • new_file_route: route where to store the new data file
  • index: if you want your final dataset to have a row index. Can be:
    • True
    • False
  • header: if yor datasets has a header. Can be yes or none
  • separator: the separator of your dataset. Only applies if its csv o txt format.
  • num_nans: method you want to use against possible numerical nans (include empties). Can be:
    • drop: drop rows that contains nans
    • yes: dont do anything with rows that contains nans
    • mean: fill nans with the mean value of the column
    • median: fill nans with the median value of the column
    • mode: fill nans with the mode value of the column
  • str_nans: method you want to use against possible string nans (include empties). Can be:
    • yes: keep nans columns
    • no: drop nans columns
  • caps: method you want to use with strings that contains Upper and Lower case letters:
    • no: dont do anything
    • upper: put all strings of string columns to uppercase
    • lower: put all strings of string columns to lowercase
  • normalize_method: method to use to normalize numerical columns. Can be:
    • no: dont normalize
    • max_abs: uses max absolute value to normalize
    • min_max: uses min - max value method to normalize
    • z_score: uses z-score value method to normalize
  • normalize:
    • write the name of the columns you want to normalize
    • Note: if yor dataset does not have a header, you must write the columns's names you want to normalize in number format, if it has a header you must write the columns's names between ""
  • balance_data: if you want to balance your data (recomended for AI datasets). Can be:
    • yes
    • no
  • Inside balance_params there are two items:
    • balance_method: mehod you want for oversampling. Can be:
      • random: random oversampling
      • smote: perform SMOTE technique for oversampling.
    • y_col: column of the dataset you want to use as target for the balance

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cuoco-1.0.0.tar.gz (18.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cuoco-1.0.0-py3-none-any.whl (19.2 kB view details)

Uploaded Python 3

File details

Details for the file cuoco-1.0.0.tar.gz.

File metadata

  • Download URL: cuoco-1.0.0.tar.gz
  • Upload date:
  • Size: 18.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.6

File hashes

Hashes for cuoco-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6fb861a7ce90631347a58a6bc09d8175579f5ce04b55664be9e845a43271ac2d
MD5 fc318befb0739ec144b0a23b1da07771
BLAKE2b-256 b2a07dc39fe237cab71d6184ad1e336ad019a09be984a74edc2f5d74458ed27f

See more details on using hashes here.

File details

Details for the file cuoco-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: cuoco-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 19.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.6

File hashes

Hashes for cuoco-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e1c5c72d5ec16a439456f1c24f2bdd173f477bda6a49cbeca727f881db14a4c
MD5 bd55982595f05f90c1ecf94128fcc4fa
BLAKE2b-256 16eadffb2c1ec0edcd226d06f357018319710ecd5c9fc803dd29ecdcad19ae77

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page