Skip to main content

Pre-processing library for ML models

Project description

lib-ml

This Python library is designed for preprocessing text data in machine learning. It provides functions for tokenizing data, padding sequences, and encoding labels, all essential for training ML models. Additionally, it facilitates storing and loading data in various formats from disk. The library is accessible on PyPI and can be seamlessly integrated into your projects.

Features

  • Data Tokenization: Convert text into sequences of integers.
  • Sequence Padding: Pad sequences to a consistent fixed length.
  • Label Encoding: Convert labels into numerical format.
  • Data Storage: Store data to given path under selected format.
  • Data Loading: Load data from disk under selected format.

Installation

Install the library (latest version) from PyPI using:

pip install remla-preprocess 

Usage

Example of how to use lib-ml for text processing:

from remla_preprocess.pre_processing import MLPreprocessor

# Instantiate the MLPreprocessor class
preprocessor = MLPreprocessor()

# Now you can use the functions of the MLPreprocessor class
preprocessor.tokenize_pad_encode_data(train_data, validation_data, test_data)

Testing

To run the tests for the pre-processing library use:

pytest

To run the tests with coverage for the pre-processing library use:

coverage run -m pytest

To generate the coverage report use:

coverage report -m

To generate the html of the coverage report use:

coverage html

Support

If you encounter any problems or bugs with lib-ml, feel free to open an issue on the project repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

remla_preprocess-0.1.2.tar.gz (3.6 kB view details)

Uploaded Source

Built Distribution

remla_preprocess-0.1.2-py3-none-any.whl (4.2 kB view details)

Uploaded Python 3

File details

Details for the file remla_preprocess-0.1.2.tar.gz.

File metadata

  • Download URL: remla_preprocess-0.1.2.tar.gz
  • Upload date:
  • Size: 3.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.12

File hashes

Hashes for remla_preprocess-0.1.2.tar.gz
Algorithm Hash digest
SHA256 265ed854d0a66bf2c52a51203c105a4e349c5bbd36802bc18f23988847b58556
MD5 53af50575bf43d2cc6b1b4c7b075eabc
BLAKE2b-256 0dceec2ed77dab774b44e8cdc9119cea978e4c8054233ddd04ac75a59bddf4bf

See more details on using hashes here.

File details

Details for the file remla_preprocess-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for remla_preprocess-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3119a01acc8ab4a06481375a91d2fbd9ccdb1d93124779e5b7f4f27a81aedf59
MD5 7f4a1a46faaf5759738c484025a7d437
BLAKE2b-256 d931a2bc7fff1e34c53df026c113004af33b00c6da31f262766f04de03c4be9e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page