Skip to main content

csv and json file preprocessor

Project description

Preprocessor

Preprocessor is a python library for preprocessing the csv file and flattening the json file

  • Preprocess csv file for missing value handling, missing value replacement
  • Preprocess csv file having textual column for text preprocessing and word normalization
  • Automatically detects the columns data type for csv file and do the preprocessing
  • Flatten any level complex json file .

Documentation

Preprocessor Class :

Pre_processor.preprocessor.Preprocessor(file,filetype=None,encoding=None)

Parameters:
- file : str,csv,dict
        File to be preprocessed
- filetype : str
            Type of the input file.Valid options are either dataframe or json
- encoding : str
            encoding scheme for reading file.Default is ISO-8859-1
Methods :

preprocessor.df_preprocessor(threshold_4_delete_null=0.5,no_null_columns=None, numeric_null_replace=None,textual_column_word_tokenize=False,textual_column_word_normalize=None)

Parameters:
- threshold_4_delete_null : float
                    Ratio of the null values to number of rows for columns to be deleted.
- no_null_columns :list
                    List of columns which must not have any null values
- numeric_null_replace : dict 
                    Logic for replacement of null values in numeric column. When None all
                    numeric column's null value will be replaced by mean. Dict format 
                    should be {"mean":[list of column name],"median":[list of 
                    columname],"mode":[list of column names]}
                    In case of giving input as dict format, users need to provide 
                    exaustivelist of column combining all three keys mean,median and mode.

- textual_column_word_tokenize : Boolean
                    Whether tokenization of word needed in case of textual column
- textual_column_word_normalize : str
                    Type of normalization of words needed in Textual columns.Either stem 
                    or lemma for word stemming and word lemmatization respectively.

preprocessor.json_preprocessor()

parameters
-No parameters needed

Code Samples

csv file preprocessing using file path
from Pre_processor.preprocessor import Preprocessor as pps
p = pps(file="example.csv")
data = p.csv_preprocessor(threshold_4_delete_null=0.7,textual_column_word_tokenize=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pre_processor-0.0.6.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

Pre_processor-0.0.6-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file Pre_processor-0.0.6.tar.gz.

File metadata

  • Download URL: Pre_processor-0.0.6.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.6.tar.gz
Algorithm Hash digest
SHA256 413983b991090d7a7d4547de24b2d4bf6d5b1ac4ffd2d621b202a4957735d2bf
MD5 fafc5b987cf7597c2cca1a9cf88cd9b0
BLAKE2b-256 444ba309b375631cbaa1e67833feeacada42685ba30768fdd23549295968a3e1

See more details on using hashes here.

File details

Details for the file Pre_processor-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: Pre_processor-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 55c6367d10e3531da77a1369fdb48727feef59dd262787129e4d48a0b7a9d4f8
MD5 f6711de9092424e11cea6d6f618692d7
BLAKE2b-256 5de64888741f60f44dc5e7401dc4ca82968a750c3798634780cbac365e11f1e3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page