csv and json file preprocessor
Project description
Preprocessor
Preprocessor is a python library for preprocessing the csv file and flattening the json file
- Preprocess csv file for missing value handling, missing value replacement
- Preprocess csv file having textual column for text preprocessing and word normalization
- Automatically detects the columns data type for csv file and do the preprocessing
- Flatten any level complex json file .
Documentation
Preprocessor Class :
Pre_processor.preprocessor.Preprocessor(file,filetype=None,encoding=None)
Parameters:
- file : str,csv,dict
File to be preprocessed
- filetype : str
Type of the input file.Valid options are either dataframe or json
- encoding : str
encoding scheme for reading file.Default is ISO-8859-1
Methods :
preprocessor.df_preprocessor(threshold_4_delete_null=0.5,no_null_columns=None, numeric_null_replace=None,textual_column_word_tokenize=False,textual_column_word_normalize=None)
Parameters:
- threshold_4_delete_null : float
Ratio of the null values to number of rows for columns to be deleted.
- no_null_columns :list
List of columns which must not have any null values
- numeric_null_replace : dict
Logic for replacement of null values in numeric column. When None all
numeric column's null value will be replaced by mean. Dict format
should be {"mean":[list of column name],"median":[list of
columname],"mode":[list of column names]}
In case of giving input as dict format, users need to provide
exaustivelist of column combining all three keys mean,median and mode.
- textual_column_word_tokenize : Boolean
Whether tokenization of word needed in case of textual column
- textual_column_word_normalize : str
Type of normalization of words needed in Textual columns.Either stem
or lemma for word stemming and word lemmatization respectively.
preprocessor.json_preprocessor()
parameters
-No parameters needed
Code Samples
csv file preprocessing using file path
from Pre_processor.preprocessor import Preprocessor as pps
p = pps(file="example.csv")
data = p.csv_preprocessor(threshold_4_delete_null=0.7,textual_column_word_tokenize=True)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Pre_processor-0.0.6.tar.gz
(4.5 kB
view details)
Built Distribution
File details
Details for the file Pre_processor-0.0.6.tar.gz
.
File metadata
- Download URL: Pre_processor-0.0.6.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 413983b991090d7a7d4547de24b2d4bf6d5b1ac4ffd2d621b202a4957735d2bf |
|
MD5 | fafc5b987cf7597c2cca1a9cf88cd9b0 |
|
BLAKE2b-256 | 444ba309b375631cbaa1e67833feeacada42685ba30768fdd23549295968a3e1 |
File details
Details for the file Pre_processor-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: Pre_processor-0.0.6-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55c6367d10e3531da77a1369fdb48727feef59dd262787129e4d48a0b7a9d4f8 |
|
MD5 | f6711de9092424e11cea6d6f618692d7 |
|
BLAKE2b-256 | 5de64888741f60f44dc5e7401dc4ca82968a750c3798634780cbac365e11f1e3 |