csv and json file preprocessor
Project description
Preprocessor
Preprocessor is a python library for preprocessing the csv file and flattening the json file
- Preprocess csv file for missing value handling, missing value replacement
- Preprocess csv file having textual column for text preprocessing and word normalization
- Automatically detects the columns data type for csv file and do the preprocessing
- Flatten any level complex json file .
Documentation
Preprocessor Class :
Preprocessor.preprocessor(file,filetype=None,encoding=None)
Parameters:
- file : str,csv,dict
File to be preprocessed
- filetype : str
Type of the input file.Valid options are either dataframe or json
- encoding : str
encoding scheme for reading file.Default is ISO-8859-1
Methods :
Preprocessor.preprocessor.csv_preprocessor(threshold_4_delete_null=0.5,no_null_columns=None, numeric_null_replace=None,textual_column_word_tokenize=False,textual_column_word_normalize=False)
Parameters:
- threshold_4_delete_null : float
Ratio of the null values to number of rows for columns to be deleted.
- no_null_columns :list
List of columns which must not have any null values
- numeric_null_replace : dict
Logic for replacement of null values in numeric column. When None all
numeric column's null value will be replaced by mean. Dict format
should be {"mean":[list of column name],"median":[list of
columname],"mode":[list of column names]}
In case of giving input as dict format, users need to provide
exaustivelist of column combining all three keys mean,median and mode.
- textual_column_word_tokenize : Boolean
Whether tokenization of word needed in case of textual column
- textual_column_word_normalize : str
Type of normalization of words needed in Textual columns.Either stem
or lemma for word stemming and word lemmatization respectively.
Preprocessor.preprocessor.json_preprocessing()
parameters
-No parameters needed
Code Samples
csv file preprocessing using file path
from Pre_processor import preprocessor
pps = preprocessor(file="example.csv")
data = pps.csv_preprocessor(threshold_4_delete_null=0.7,textual_column_word_tokenize=True)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Pre_processor-0.0.4.tar.gz
(4.5 kB
view details)
Built Distribution
File details
Details for the file Pre_processor-0.0.4.tar.gz
.
File metadata
- Download URL: Pre_processor-0.0.4.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4afd28c42e424340613fb4a6e67cf424c3e5750fa49e7c098627fc0d2a05a3d0 |
|
MD5 | 0b5737cd2badc41658e6a8b7ae316949 |
|
BLAKE2b-256 | 4dcc7fbc78b5eee883563263c0f8867bfee668d6b188044a7e255197047e9b25 |
File details
Details for the file Pre_processor-0.0.4-py3-none-any.whl
.
File metadata
- Download URL: Pre_processor-0.0.4-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8abb60a9ccbd0b9832091e207e2c80c8d25fac5344bc7dee84c6474114fcfe35 |
|
MD5 | f78624ab460ceef76c41403e28f9bdd0 |
|
BLAKE2b-256 | 7278d0eda489c4573ad32cf207a976a9b1e6b4be4048e3b69b0a683a386e5fef |