Skip to main content

An easy to use tool for Data Preprocessing specially for Text Preprocessing

Project description

Downloads Open In Colab Kaggle

An easy to use tool for Data Preprocessing specially for Text Preprocessing

Table of Contents

Installation

Install the latest stable release
For windows
$ pip install -U data-preprocessors

For Linux/WSL2
$ pip3 install -U data-preprocessors

Quick Start

from data_preprocessors import text_preprocessor as tp
sentence = "bla! bla- ?bla ?bla."
sentence = tp.remove_punc(sentence)
print(sentence)

>> bla bla bla bla

Functions

Split Textfile

from data_preprocessors import text_preprocessor as tp
tp.split_textfile(
    main_file_path="example.txt",
    train_file_path="splitted/train.txt",
    val_file_path="splitted/val.txt",
    test_file_path="splitted/test.txt",
    train_size=0.6,
    val_size=0.2,
    test_size=0.2,
    shuffle=True,
    seed=42
)

# Total lines:  500
# Train set size:  300
# Validation set size:  100
# Test set size:  100

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data-preprocessors-0.14.0.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

data_preprocessors-0.14.0-py3-none-any.whl (6.8 kB view details)

Uploaded Python 3

File details

Details for the file data-preprocessors-0.14.0.tar.gz.

File metadata

  • Download URL: data-preprocessors-0.14.0.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.60.1-microsoft-standard-WSL2

File hashes

Hashes for data-preprocessors-0.14.0.tar.gz
Algorithm Hash digest
SHA256 132a1e3f496fb5c685e8f8643ae139228aa8b9addf0b9993bd12a5d2a2ec1eca
MD5 3cd3766c296d83dfa209208798c54bdd
BLAKE2b-256 9738845e8d2ab0dc37d335a6e0a71edb36eb53bea089a3828d8bfed0ab445136

See more details on using hashes here.

File details

Details for the file data_preprocessors-0.14.0-py3-none-any.whl.

File metadata

  • Download URL: data_preprocessors-0.14.0-py3-none-any.whl
  • Upload date:
  • Size: 6.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.10 Linux/5.10.60.1-microsoft-standard-WSL2

File hashes

Hashes for data_preprocessors-0.14.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c89d2ff5546351a11e284ff5a5e12f05703548f0539a1794d010f261ac622f9a
MD5 52380512a38a450ce7fc680b9680f25f
BLAKE2b-256 3724af294da2d0046ea10ae946e924dcda21088131be35582a74a560b34300bc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page