Skip to main content

An easy to use tool for Data Preprocessing specially for Text Preprocessing

Project description

Downloads Open In Colab Kaggle

An easy to use tool for Data Preprocessing specially for Text Preprocessing

Table of Contents

Installation

Install the latest stable release
For windows
$ pip install -U data-preprocessors

For Linux/WSL2
$ pip3 install -U data-preprocessors

Quick Start

from data_preprocessors import text_preprocessor as tp
sentence = "bla! bla- ?bla ?bla."
sentence = tp.remove_punc(sentence)
print(sentence)

>> bla bla bla bla

Functions

Split Textfile

from data_preprocessors import text_preprocessor as tp
tp.split_textfile(
    main_file_path="example.txt",
    train_file_path="splitted/train.txt",
    val_file_path="splitted/val.txt",
    test_file_path="splitted/test.txt",
    train_size=0.6,
    val_size=0.2,
    test_size=0.2,
    shuffle=True,
    seed=42
)

# Total lines:  500
# Train set size:  300
# Validation set size:  100
# Test set size:  100

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-preprocessors-0.24.0.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

data_preprocessors-0.24.0-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file data-preprocessors-0.24.0.tar.gz.

File metadata

  • Download URL: data-preprocessors-0.24.0.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.60.1-microsoft-standard-WSL2

File hashes

Hashes for data-preprocessors-0.24.0.tar.gz
Algorithm Hash digest
SHA256 cc24e4260e051abee170e22558654e7cb19d73bab21366e1267614a12e57ee61
MD5 a5cab25c0fe7b843450b6cbee3351474
BLAKE2b-256 2b2b5ebf55274672c77c45eb163761131ed9529f3efced4fa77e9b8a00f9f4b7

See more details on using hashes here.

File details

Details for the file data_preprocessors-0.24.0-py3-none-any.whl.

File metadata

  • Download URL: data_preprocessors-0.24.0-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.60.1-microsoft-standard-WSL2

File hashes

Hashes for data_preprocessors-0.24.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5878fc0aeeef013df4cfa7df3098ecd8ebf4168e6fa79370189d81ce91571ca0
MD5 852543384426cd14b50b01bcfe5485ad
BLAKE2b-256 5e3c670c6c4adbc8af32954cebb3c7a93e10d4329d42f58955ec7fdcc1c7d83f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page