csv and json file preprocessor
Project description
Preprocessor
Preprocessor is a python library for preprocessing the csv file and flattening the json file
- Preprocess csv file for missing value handling, missing value replacement
- Preprocess csv file having textual column for text preprocessing and word normalization
- Automatically detects the columns data type for csv file and do the preprocessing
- Flatten any level complex json file .
Documentation
Preprocessor Class :
Pre_processor.preprocessor.Preprocessor(file,filetype=None,encoding=None)
Parameters:
- file : str,csv,dict
File to be preprocessed
- filetype : str
Type of the input file.Valid options are either dataframe or json
- encoding : str
encoding scheme for reading file.Default is ISO-8859-1
Methods :
preprocessor.df_preprocessor(threshold_4_delete_null=0.5,no_null_columns=None, numeric_null_replace=None,textual_column_word_tokenize=False,textual_column_word_normalize=None)
Parameters:
- threshold_4_delete_null : float
Ratio of the null values to number of rows for columns to be deleted.
- no_null_columns :list
List of columns which must not have any null values
- numeric_null_replace : dict
Logic for replacement of null values in numeric column. When None all
numeric column's null value will be replaced by mean. Dict format
should be {"mean":[list of column name],"median":[list of
columname],"mode":[list of column names]}
In case of giving input as dict format, users need to provide
exaustivelist of column combining all three keys mean,median and mode.
- textual_column_word_tokenize : Boolean
Whether tokenization of word needed in case of textual column
- textual_column_word_normalize : str
Type of normalization of words needed in Textual columns.Either stem
or lemma for word stemming and word lemmatization respectively.
preprocessor.json_preprocessor()
parameters
-No parameters needed
Code Samples
csv file preprocessing using file path
from Pre_processor.preprocessor import Preprocessor as pps
p = pps(file="example.csv")
data = p.csv_preprocessor(threshold_4_delete_null=0.7,textual_column_word_tokenize=True)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file Pre_processor-0.0.7.tar.gz.
File metadata
- Download URL: Pre_processor-0.0.7.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5900340ee851be45c0bbde2f9b68f026853e1d692804715e706b7f140b646fd2
|
|
| MD5 |
ca4e1dd0954db93da232ae40468f1109
|
|
| BLAKE2b-256 |
c6f2a9124770cb2ccecd41e58ea56aad0734fa7d469fad03c10a69cca11c4f07
|
File details
Details for the file Pre_processor-0.0.7-py3-none-any.whl.
File metadata
- Download URL: Pre_processor-0.0.7-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
312e46a097b5c5ef8feaa320657a48b24d7c0ce84fb1a2a8dbf69090e9bc59b6
|
|
| MD5 |
1516ef5b26e0e4c42a23e62f9c81ce99
|
|
| BLAKE2b-256 |
a2055e96f807a00321e134ae45a3a054d2e73919f903282edd82ccdcbd27d103
|