Cuoco is a tool for automatic data preprocessing. Cuoco comes from Italy, means chef.
Project description
CUOCO
Cuoco is a tool for automatic processing of data.
Example
Json example:
{
"input_format": "csv",
"output_format": "csv",
"new_fileName": "new file",
"new_file_route": "path/you/want/to/save/the/file",
"index": "True",
"header": "yes",
"separator": ",",
"num_nans": "mean",
"str_nans": "yes",
"caps": "lower",
"normalize_method": "min_max",
"normalize": [
"Age"
],
"balance_data": "yes",
"balance_params": {
"balance_method": "random",
"y_col": "Age"
}
}
Import the library
import cuoco
from cuoco import dataPipeline
Use the dataPipeline
dataPipeline.readJson('/content/biostats.csv', '/content/jsonTESTFILE.json')
Documentation
How it works: Cuoco uses a json created by the user to automatically apply data-processing functions to the desired dataset. The Json has the next values:
- input_format: format of the input dataset. Can be csv, parquet, orc and txt
- output_format: format of the resulted dataset. Can be csv, parquet, orc and txt
- new_fileName: name of the new dataset the DataChef will write
- new_file_route: route where to store the new data file
- index: if you want your final dataset to have a row index. Can be:
- True
- False
- header: if yor datasets has a header. Can be yes or none
- separator: the separator of your dataset. Only applies if its csv o txt format.
- num_nans: method you want to use against possible numerical nans (include empties). Can be:
- drop: drop rows that contains nans
- yes: dont do anything with rows that contains nans
- mean: fill nans with the mean value of the column
- median: fill nans with the median value of the column
- mode: fill nans with the mode value of the column
- str_nans: method you want to use against possible string nans (include empties). Can be:
- yes: keep nans columns
- no: drop nans columns
- caps: method you want to use with strings that contains Upper and Lower case letters:
- no: dont do anything
- upper: put all strings of string columns to uppercase
- lower: put all strings of string columns to lowercase
- normalize_method: method to use to normalize numerical columns. Can be:
- no: dont normalize
- max_abs: uses max absolute value to normalize
- min_max: uses min - max value method to normalize
- z_score: uses z-score value method to normalize
- normalize:
- write the name of the columns you want to normalize
- Note: if yor dataset does not have a header, you must write the columns's names you want to normalize in number format, if it has a header you must write the columns's names between ""
- balance_data: if you want to balance your data (recomended for AI datasets). Can be:
- yes
- no
- Inside balance_params there are two items:
- balance_method: mehod you want for oversampling. Can be:
- random: random oversampling
- smote: perform SMOTE technique for oversampling.
- y_col: column of the dataset you want to use as target for the balance
- balance_method: mehod you want for oversampling. Can be:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cuoco-1.0.0.tar.gz.
File metadata
- Download URL: cuoco-1.0.0.tar.gz
- Upload date:
- Size: 18.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6fb861a7ce90631347a58a6bc09d8175579f5ce04b55664be9e845a43271ac2d
|
|
| MD5 |
fc318befb0739ec144b0a23b1da07771
|
|
| BLAKE2b-256 |
b2a07dc39fe237cab71d6184ad1e336ad019a09be984a74edc2f5d74458ed27f
|
File details
Details for the file cuoco-1.0.0-py3-none-any.whl.
File metadata
- Download URL: cuoco-1.0.0-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e1c5c72d5ec16a439456f1c24f2bdd173f477bda6a49cbeca727f881db14a4c
|
|
| MD5 |
bd55982595f05f90c1ecf94128fcc4fa
|
|
| BLAKE2b-256 |
16eadffb2c1ec0edcd226d06f357018319710ecd5c9fc803dd29ecdcad19ae77
|