Skip to main content

csv and json file preprocessor

Project description

Preprocessor

Preprocessor is a python library for preprocessing the csv file and flattening the json file

  • Preprocess csv file for missing value handling, missing value replacement
  • Preprocess csv file having textual column for text preprocessing and word normalization
  • Automatically detects the columns data type for csv file and do the preprocessing
  • Flatten any level complex json file .

Tech

Preprocessor Class : Preprocessor.preprocessor(file,filetype=None)

Parameters:
- file : str,csv,dict
        File to be preprocessed
- filetype : str
            Type of the input file.Valid options are either dataframe or json
Methods :

Preprocessor.preprocessor.csv_preprocessor(threshold_4_delete_null=0.5,no_null_columns=None,numeric_null_replace=None,textual_column_word_tokenize=False,textual_column_word_normalize=False)

Parameters:
- threshold_4_delete_null : float
                            Ratio of the null values to number of rows

- no_null_columns :list
                    List of columns which must not have any null values

- numeric_null_replace : str 
                        Logic for replacement of null values in numeric column. Valid options are mean,median and mode

- textual_column_word_tokenize : Boolean
                                Whether tokenization of word needed in case of textual column

- textual_column_word_normalize : str
                                    Type of normalization of words needed in Textual columns

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Pre_processor-0.0.3.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

Pre_processor-0.0.3-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file Pre_processor-0.0.3.tar.gz.

File metadata

  • Download URL: Pre_processor-0.0.3.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.3.tar.gz
Algorithm Hash digest
SHA256 cd5e2025443c6e2eb37430ecda07ac51e5b82336f22413014bca8b39b903fed6
MD5 e036282e4e91f74c1549f9d2a74b070e
BLAKE2b-256 fcf471616002a693122fae95d724b983e9d4fd12a92b132eaac02eb2358f9ebb

See more details on using hashes here.

File details

Details for the file Pre_processor-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: Pre_processor-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.52.0 CPython/3.8.3

File hashes

Hashes for Pre_processor-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 618e1319d6dfd2111ce821f4fd868bd9d79ced1cdb7bf7200c4b7e9f924a093a
MD5 8260015394306c036ead2b9443d93ef0
BLAKE2b-256 f9f02b5f2aeaf70beac8d2af618f11efe9af4cebd603389e09d407f2eee5ec3c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page