Skip to main content

csv and json file preprocessor

Project description

Preprocessor

Preprocessor is a python library for preprocessing the csv file and flattening the json file

  • Preprocess csv file for missing value handling, missing value replacement
  • Preprocess csv file having textual column for text preprocessing and word normalization
  • Automatically detects the columns data type for csv file and do the preprocessing
  • Flatten any level complex json file .

Documentation

Preprocessor Class :

Preprocessor.preprocessor(file,filetype=None,encoding=None)

Parameters:
- file : str,csv,dict
        File to be preprocessed
- filetype : str
            Type of the input file.Valid options are either dataframe or json
- encoding : str
            encoding scheme for reading file.Default is ISO-8859-1
Methods :

Preprocessor.preprocessor.csv_preprocessor(threshold_4_delete_null=0.5,no_null_columns=None, numeric_null_replace=None,textual_column_word_tokenize=False,textual_column_word_normalize=False)

Parameters:
- threshold_4_delete_null : float
                    Ratio of the null values to number of rows for columns to be deleted.
- no_null_columns :list
                    List of columns which must not have any null values
- numeric_null_replace : dict 
                    Logic for replacement of null values in numeric column. When None all
                    numeric column's null value will be replaced by mean. Dict format 
                    should be {"mean":[list of column name],"median":[list of 
                    columname],"mode":[list of column names]}
                    In case of giving input as dict format, users need to provide 
                    exaustivelist of column combining all three keys mean,median and mode.

- textual_column_word_tokenize : Boolean
                    Whether tokenization of word needed in case of textual column
- textual_column_word_normalize : str
                    Type of normalization of words needed in Textual columns.Either stem 
                    or lemma for word stemming and word lemmatization respectively.

Preprocessor.preprocessor.json_preprocessing()

parameters
-No parameters needed

Code Samples

csv file preprocessing using file path
from Pre_processor import preprocessor
pps = preprocessor(file="example.csv")
data = pps.csv_preprocessor(threshold_4_delete_null=0.7,textual_column_word_tokenize=True)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pre_processor-0.0.4.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

Pre_processor-0.0.4-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file Pre_processor-0.0.4.tar.gz.

File metadata

  • Download URL: Pre_processor-0.0.4.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.4.tar.gz
Algorithm Hash digest
SHA256 4afd28c42e424340613fb4a6e67cf424c3e5750fa49e7c098627fc0d2a05a3d0
MD5 0b5737cd2badc41658e6a8b7ae316949
BLAKE2b-256 4dcc7fbc78b5eee883563263c0f8867bfee668d6b188044a7e255197047e9b25

See more details on using hashes here.

File details

Details for the file Pre_processor-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: Pre_processor-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8abb60a9ccbd0b9832091e207e2c80c8d25fac5344bc7dee84c6474114fcfe35
MD5 f78624ab460ceef76c41403e28f9bdd0
BLAKE2b-256 7278d0eda489c4573ad32cf207a976a9b1e6b4be4048e3b69b0a683a386e5fef

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page