Skip to main content

No project description provided

Project description

REMLA Team 6's ML Preprocessing Library

This is the implementation of the phishing detection pre-processing repository for CS4295 Release Engineering for Machine Learning Applications (Team 6) at TU Delft (Q4, 2024). This package supports the preprocessing (tokenisation and label encoding) for the phishing detection dataset so that the same logic can be re-used at both training and inference time. In order to support this, the library is packaged with a tokenizer and label encoder that is fit on the entire dataset. Please refer to the PreprocessingUtil class here for more information on how the tokenizer and label encoder are created.

The package release workflow is automatically triggered when a new Git tag is pushed. We follow semantic versioning for the library in the format v<major>.<minor>.<patch>. The package repository (PyPi) can be found here.

Installation

pip install ml-lib-remla

Usage

To get the current version using this package, execute the following lines in Python -

from ml_lib_remla.preprocessing import Preprocessing
pp = Preprocessing()
print(pp.tokenize_single("www.google.com"))

The Preprocessing class supports the following functionalities -

  • tokenize_batch: Performs the tokenization on a list of urls. This can be used to pre-process the entire dataset.
  • tokenize_single: Performs the tokenization for a single url. This can be used to pre-process a single url at inference time.
  • encode_label_batch: Performs the label encoding for a list of labels. This can be used to pre-process the target labels of the entire dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_lib_remla-1.1.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

ml_lib_remla-1.1.1-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file ml_lib_remla-1.1.1.tar.gz.

File metadata

  • Download URL: ml_lib_remla-1.1.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1022-azure

File hashes

Hashes for ml_lib_remla-1.1.1.tar.gz
Algorithm Hash digest
SHA256 b33fbaeccc4da60fdd117eb8824ed99151513ae871e9e28bfc1e3f7f44dfb087
MD5 3a262c475b38b730ad52c17ab02746a7
BLAKE2b-256 ba2e652f1552323c9294049bf17317e94fd09a1df4b5686ee70959b74d8f3b29

See more details on using hashes here.

File details

Details for the file ml_lib_remla-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: ml_lib_remla-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1022-azure

File hashes

Hashes for ml_lib_remla-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 594c032ce96213ba76d7b0ae4a8fe2d699b8b5edf4e7cd91e3aaecacc0bd323c
MD5 329014611a314d737acb3cd4ff09423b
BLAKE2b-256 0b650b13db083482f798562682c463c46d281d54dede6fe67849c75630fd96ba

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page