csv and json file preprocessor
Project description
Preprocessor
Preprocessor is a python library for preprocessing the csv file and flattening the json file
- Preprocess csv file for missing value handling, missing value replacement
- Preprocess csv file having textual column for text preprocessing and word normalization
- Automatically detects the columns data type for csv file and do the preprocessing
- Flatten any level complex json file .
Documentation
Preprocessor Class :
Pre_processor.preprocessor.Preprocessor(file,filetype=None,encoding=None)
Parameters:
- file : str,csv,dict
File to be preprocessed
- filetype : str
Type of the input file.Valid options are either dataframe or json
- encoding : str
encoding scheme for reading file.Default is ISO-8859-1
Methods :
preprocessor.df_preprocessor(threshold_4_delete_null=0.5,no_null_columns=None, numeric_null_replace=None,textual_column_word_tokenize=False,textual_column_word_normalize=False)
Parameters:
- threshold_4_delete_null : float
Ratio of the null values to number of rows for columns to be deleted.
- no_null_columns :list
List of columns which must not have any null values
- numeric_null_replace : dict
Logic for replacement of null values in numeric column. When None all
numeric column's null value will be replaced by mean. Dict format
should be {"mean":[list of column name],"median":[list of
columname],"mode":[list of column names]}
In case of giving input as dict format, users need to provide
exaustivelist of column combining all three keys mean,median and mode.
- textual_column_word_tokenize : Boolean
Whether tokenization of word needed in case of textual column
- textual_column_word_normalize : str
Type of normalization of words needed in Textual columns.Either stem
or lemma for word stemming and word lemmatization respectively.
preprocessor.json_preprocessor()
parameters
-No parameters needed
Code Samples
csv file preprocessing using file path
from Pre_processor.preprocessor import Preprocessor as pps
p = pps(file="example.csv")
data = p.csv_preprocessor(threshold_4_delete_null=0.7,textual_column_word_tokenize=True)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Pre_processor-0.0.5.tar.gz
(4.5 kB
view details)
Built Distribution
File details
Details for the file Pre_processor-0.0.5.tar.gz
.
File metadata
- Download URL: Pre_processor-0.0.5.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 511d4ca4780fe5cf782d29bff66fc24e01127662f0772351ab1519b71191602b |
|
MD5 | fd7ac3c2c515d5a5f1346e9962dda7fd |
|
BLAKE2b-256 | 46834acc57dedf350cdaef3da03224eb88cc85723d63e9a13dc527a847a362e4 |
File details
Details for the file Pre_processor-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: Pre_processor-0.0.5-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c0abcf901b869cd0b94679392b91ec03b9afd056c0ac4f50018e5a17a2572f1d |
|
MD5 | e4b1fd881b1401d8310d82374683f576 |
|
BLAKE2b-256 | b0c9a132e3ae9eb11a7889205a4e7561fe2ede661c5f6e4a908eec4e19c3e5ca |