An easy to use tool for Data Preprocessing specially for Text Preprocessing
Project description
An easy to use tool for Data Preprocessing specially for Text Preprocessing
Table of Contents
Installation
Install the latest stable release
For windows
$ pip install -U data-preprocessors
For Linux/WSL2
$ pip3 install -U data-preprocessors
Quick Start
from data_preprocessors import text_preprocessor as tp
sentence = "bla! bla- ?bla ?bla."
sentence = tp.remove_punc(sentence)
print(sentence)
>> bla bla bla bla
Functions
Split Textfile
from data_preprocessors import text_preprocessor as tp
tp.split_textfile(
main_file_path="example.txt",
train_file_path="splitted/train.txt",
val_file_path="splitted/val.txt",
test_file_path="splitted/test.txt",
train_size=0.6,
val_size=0.2,
test_size=0.2,
shuffle=True,
seed=42
)
# Total lines: 500
# Train set size: 300
# Validation set size: 100
# Test set size: 100
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for data-preprocessors-0.19.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | eb31e6132b950711309cb3bda30a9071442534ad4410e0f7e42193b347845126 |
|
MD5 | 19f1628f3b6aae8a37aaf747876b7b1f |
|
BLAKE2b-256 | 994dd84f506da2598e6da39ade6a7ca5f1c0061d9024605df4fbaa357f3199d7 |
Close
Hashes for data_preprocessors-0.19.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8a7e7668c8ebd18dc7560696489e8d31abd7043b1cc3fe09176af3ac69e2d02e |
|
MD5 | fde79de61520ba503ce161a6a553e881 |
|
BLAKE2b-256 | 4544313646c84da234519992a5e2cecd3ae2fa4bbb5035050c1433b8cc447b12 |