Skip to main content

csv and json file preprocessor

Project description

Preprocessor

Preprocessor is a python library for preprocessing the csv file and flattening the json file

  • Preprocess csv file for missing value handling, missing value replacement
  • Preprocess csv file having textual column for text preprocessing and word normalization
  • Automatically detects the columns data type for csv file and do the preprocessing
  • Flatten any level complex json file .

Documentation

Preprocessor Class :

Pre_processor.preprocessor.Preprocessor(file,filetype=None,encoding=None)

Parameters:
- file : str,csv,dict
        File to be preprocessed
- filetype : str
            Type of the input file.Valid options are either dataframe or json
- encoding : str
            encoding scheme for reading file.Default is ISO-8859-1
Methods :

preprocessor.df_preprocessor(threshold_4_delete_null=0.5,no_null_columns=None, numeric_null_replace=None,textual_column_word_tokenize=False,textual_column_word_normalize=False)

Parameters:
- threshold_4_delete_null : float
                    Ratio of the null values to number of rows for columns to be deleted.
- no_null_columns :list
                    List of columns which must not have any null values
- numeric_null_replace : dict 
                    Logic for replacement of null values in numeric column. When None all
                    numeric column's null value will be replaced by mean. Dict format 
                    should be {"mean":[list of column name],"median":[list of 
                    columname],"mode":[list of column names]}
                    In case of giving input as dict format, users need to provide 
                    exaustivelist of column combining all three keys mean,median and mode.

- textual_column_word_tokenize : Boolean
                    Whether tokenization of word needed in case of textual column
- textual_column_word_normalize : str
                    Type of normalization of words needed in Textual columns.Either stem 
                    or lemma for word stemming and word lemmatization respectively.

preprocessor.json_preprocessor()

parameters
-No parameters needed

Code Samples

csv file preprocessing using file path
from Pre_processor.preprocessor import Preprocessor as pps
p = pps(file="example.csv")
data = p.csv_preprocessor(threshold_4_delete_null=0.7,textual_column_word_tokenize=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pre_processor-0.0.5.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

Pre_processor-0.0.5-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file Pre_processor-0.0.5.tar.gz.

File metadata

  • Download URL: Pre_processor-0.0.5.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.5.tar.gz
Algorithm Hash digest
SHA256 511d4ca4780fe5cf782d29bff66fc24e01127662f0772351ab1519b71191602b
MD5 fd7ac3c2c515d5a5f1346e9962dda7fd
BLAKE2b-256 46834acc57dedf350cdaef3da03224eb88cc85723d63e9a13dc527a847a362e4

See more details on using hashes here.

File details

Details for the file Pre_processor-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: Pre_processor-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 c0abcf901b869cd0b94679392b91ec03b9afd056c0ac4f50018e5a17a2572f1d
MD5 e4b1fd881b1401d8310d82374683f576
BLAKE2b-256 b0c9a132e3ae9eb11a7889205a4e7561fe2ede661c5f6e4a908eec4e19c3e5ca

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page