Skip to main content

MLData is used to clean data before machine learning process!

Project description

#MLData
MLData, is a project to clean and normalize data for machine learning process.


## How to install
```pip install mldata```




## Usage Example
Usage Example:
```
from mldata import Processor
new_file_path = "outputs/new.csv"
processor = Processor("resource/raw_dataset.csv", target_column="APPROVE/NOT", exclude_column_list=["id"],
category_list=["Work Class", "FnlWgt", "Education", "Maried Status", "Occupation",
"Relationship", "Race", "Gender", "Native Country", "Flag"],
invalid_values=["?", "", "null", None],
positive_tag=1)
processor.normalize()
processor.save_to_file(new_file_path)
```


## API Description
1, Init function
```
Processor(csv_file_path, target_column, exclude_column_list=None, category_list=None, positive_tag=1,
csv_header=0, invalid_values=None)

```
Parameters:
csv_file_path: The origin csv file path
target_column: The column name of the target
exclude_column_list: Columns no need to normalize
category_list: A column name list which are category based columns
positive_tag: The positive tag for the target column value, default value is 1
invalid_values: values in csv not valid, such as "?", "", "null", None

2, Norm the list
```buildoutcfg
Processor.normalize()
```
This function is used to do norm to the csv file.


3, Save result to csv file.
```buildoutcfg
Processor.save_to_file(new_file_name)
```
This function is used to save normalized output to csv file.
Parameters:
new_file_name: The new file name to save the normalized data

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for MLData, version 2.0.0
Filename, size File type Python version Upload date Hashes
Filename, size MLData-2.0.0-py2.py3-none-any.whl (9.4 kB) File type Wheel Python version py2.py3 Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page