Skip to main content

csv and json file preprocessor

Project description

Preprocessor

Preprocessor is a python library for preprocessing the csv file and flattening the json file

  • Preprocess csv file for missing value handling, missing value replacement
  • Preprocess csv file having textual column for text preprocessing and word normalization
  • Automatically detects the columns data type for csv file and do the preprocessing
  • Flatten any level complex json file .

Documentation

Preprocessor Class :

Pre_processor.preprocessor.Preprocessor(file,filetype=None,encoding=None)

Parameters:
- file : str,csv,dict
        File to be preprocessed
- filetype : str
            Type of the input file.Valid options are either dataframe or json
- encoding : str
            encoding scheme for reading file.Default is ISO-8859-1
Methods :

preprocessor.df_preprocessor(threshold_4_delete_null=0.5,no_null_columns=None, numeric_null_replace=None,textual_column_word_tokenize=False,textual_column_word_normalize=None)

Parameters:
- threshold_4_delete_null : float
                    Ratio of the null values to number of rows for columns to be deleted.
- no_null_columns :list
                    List of columns which must not have any null values
- numeric_null_replace : dict 
                    Logic for replacement of null values in numeric column. When None all
                    numeric column's null value will be replaced by mean. Dict format 
                    should be {"mean":[list of column name],"median":[list of 
                    columname],"mode":[list of column names]}
                    In case of giving input as dict format, users need to provide 
                    exaustivelist of column combining all three keys mean,median and mode.

- textual_column_word_tokenize : Boolean
                    Whether tokenization of word needed in case of textual column
- textual_column_word_normalize : str
                    Type of normalization of words needed in Textual columns.Either stem 
                    or lemma for word stemming and word lemmatization respectively.

preprocessor.json_preprocessor()

parameters
-No parameters needed

Code Samples

csv file preprocessing using file path
from Pre_processor.preprocessor import Preprocessor as pps
p = pps(file="example.csv")
data = p.csv_preprocessor(threshold_4_delete_null=0.7,textual_column_word_tokenize=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pre_processor-0.0.7.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

Pre_processor-0.0.7-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file Pre_processor-0.0.7.tar.gz.

File metadata

  • Download URL: Pre_processor-0.0.7.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.7.tar.gz
Algorithm Hash digest
SHA256 5900340ee851be45c0bbde2f9b68f026853e1d692804715e706b7f140b646fd2
MD5 ca4e1dd0954db93da232ae40468f1109
BLAKE2b-256 c6f2a9124770cb2ccecd41e58ea56aad0734fa7d469fad03c10a69cca11c4f07

See more details on using hashes here.

File details

Details for the file Pre_processor-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: Pre_processor-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 312e46a097b5c5ef8feaa320657a48b24d7c0ce84fb1a2a8dbf69090e9bc59b6
MD5 1516ef5b26e0e4c42a23e62f9c81ce99
BLAKE2b-256 a2055e96f807a00321e134ae45a3a054d2e73919f903282edd82ccdcbd27d103

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page