Pre-processing library for ML models
Project description
lib-ml
This Python library is designed for preprocessing text data in machine learning. It provides functions for tokenizing data, padding sequences, and encoding labels, all essential for training ML models. Additionally, it facilitates storing and loading data in various formats from disk. The library is accessible on PyPI and can be seamlessly integrated into your projects.
Features
- Data Tokenization: Convert text into sequences of integers.
- Sequence Padding: Pad sequences to a consistent fixed length.
- Label Encoding: Convert labels into numerical format.
- Data Storage: Store data to given path under selected format.
- Data Loading: Load data from disk under selected format.
Installation
Install the library (latest version) from PyPI using:
pip install remla-preprocess
Usage
Example of how to use lib-ml
for text processing:
from remla_preprocess.pre_processing import MLPreprocessor
# Instantiate the MLPreprocessor class
preprocessor = MLPreprocessor()
# Now you can use the functions of the MLPreprocessor class
preprocessor.tokenize_pad_encode_data(train_data, validation_data, test_data)
Testing
To run the tests for the pre-processing library use:
pytest
To run the tests with coverage for the pre-processing library use:
coverage run -m pytest
To generate the coverage report use:
coverage report -m
To generate the html of the coverage report use:
coverage html
Support
If you encounter any problems or bugs with lib-ml
, feel free to open an issue on the project repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file remla_preprocess-0.1.2.tar.gz
.
File metadata
- Download URL: remla_preprocess-0.1.2.tar.gz
- Upload date:
- Size: 3.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 265ed854d0a66bf2c52a51203c105a4e349c5bbd36802bc18f23988847b58556 |
|
MD5 | 53af50575bf43d2cc6b1b4c7b075eabc |
|
BLAKE2b-256 | 0dceec2ed77dab774b44e8cdc9119cea978e4c8054233ddd04ac75a59bddf4bf |
File details
Details for the file remla_preprocess-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: remla_preprocess-0.1.2-py3-none-any.whl
- Upload date:
- Size: 4.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3119a01acc8ab4a06481375a91d2fbd9ccdb1d93124779e5b7f4f27a81aedf59 |
|
MD5 | 7f4a1a46faaf5759738c484025a7d437 |
|
BLAKE2b-256 | d931a2bc7fff1e34c53df026c113004af33b00c6da31f262766f04de03c4be9e |