MLData is used to clean data before machine learning process!
Project description
#MLData
MLData, is a project to clean and normalize data for machine learning process.
## How to install
```pip install mldata```
## Usage Example
Usage Example:
```
from mldata import Processor
new_file_path = "outputs/new.csv"
processor = Processor("resource/raw_dataset.csv", target_column="APPROVE/NOT", exclude_column_list=["id"],
category_list=["Work Class", "FnlWgt", "Education", "Maried Status", "Occupation",
"Relationship", "Race", "Gender", "Native Country", "Flag"],
invalid_values=["?", "", "null", None],
positive_tag=1)
processor.normalize()
processor.save_to_file(new_file_path)
```
## API Description
1, Init function
```
Processor(csv_file_path, target_column, exclude_column_list=None, category_list=None, positive_tag=1,
csv_header=0, invalid_values=None)
```
Parameters:
csv_file_path: The origin csv file path
target_column: The column name of the target
exclude_column_list: Columns no need to normalize
category_list: A column name list which are category based columns
positive_tag: The positive tag for the target column value, default value is 1
invalid_values: values in csv not valid, such as "?", "", "null", None
2, Norm the list
```buildoutcfg
Processor.normalize()
```
This function is used to do norm to the csv file.
3, Save result to csv file.
```buildoutcfg
Processor.save_to_file(new_file_name)
```
This function is used to save normalized output to csv file.
Parameters:
new_file_name: The new file name to save the normalized data
MLData, is a project to clean and normalize data for machine learning process.
## How to install
```pip install mldata```
## Usage Example
Usage Example:
```
from mldata import Processor
new_file_path = "outputs/new.csv"
processor = Processor("resource/raw_dataset.csv", target_column="APPROVE/NOT", exclude_column_list=["id"],
category_list=["Work Class", "FnlWgt", "Education", "Maried Status", "Occupation",
"Relationship", "Race", "Gender", "Native Country", "Flag"],
invalid_values=["?", "", "null", None],
positive_tag=1)
processor.normalize()
processor.save_to_file(new_file_path)
```
## API Description
1, Init function
```
Processor(csv_file_path, target_column, exclude_column_list=None, category_list=None, positive_tag=1,
csv_header=0, invalid_values=None)
```
Parameters:
csv_file_path: The origin csv file path
target_column: The column name of the target
exclude_column_list: Columns no need to normalize
category_list: A column name list which are category based columns
positive_tag: The positive tag for the target column value, default value is 1
invalid_values: values in csv not valid, such as "?", "", "null", None
2, Norm the list
```buildoutcfg
Processor.normalize()
```
This function is used to do norm to the csv file.
3, Save result to csv file.
```buildoutcfg
Processor.save_to_file(new_file_name)
```
This function is used to save normalized output to csv file.
Parameters:
new_file_name: The new file name to save the normalized data
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file MLData-2.0.0-py2.py3-none-any.whl
.
File metadata
- Download URL: MLData-2.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d905200efc46be2e8d44f99156cb3eb230a10f3e56025a5996d3fa7b75201fd6 |
|
MD5 | 598fcf3cad0405789602cd98332c7353 |
|
BLAKE2b-256 | b01f40cd24cedcd3deb8f225569a889a3c2a84cca13d29264b336734e868ae9c |