Skip to main content

No project description provided

Project description

REMLA Team 6's ML Preprocessing Library

This is the implementation of the phishing detection pre-processing repository for CS4295 Release Engineering for Machine Learning Applications (Team 6) at TU Delft (Q4, 2024). This package supports the preprocessing (tokenisation and label encoding) for the phishing detection dataset so that the same logic can be re-used at both training and inference time. In order to support this, the library is packaged with a tokenizer and label encoder that is fit on the entire dataset. Please refer to the PreprocessingUtil class here for more information on how the tokenizer and label encoder are created.

The package release workflow is automatically triggered when a new Git tag is pushed. We follow semantic versioning for the library in the format v<major>.<minor>.<patch>. The package repository (PyPi) can be found here.

Installation

pip install ml-lib-remla

Usage

To get the current version using this package, execute the following lines in Python -

from ml_lib_remla.preprocessing import Preprocessing
pp = Preprocessing()
print(pp.tokenize_single("www.google.com"))

The Preprocessing class supports the following functionalities -

  • tokenize_batch: Performs the tokenization on a list of urls. This can be used to pre-process the entire dataset.
  • tokenize_single: Performs the tokenization for a single url. This can be used to pre-process a single url at inference time.
  • encode_label_batch: Performs the label encoding for a list of labels. This can be used to pre-process the target labels of the entire dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_lib_remla-1.1.0.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

ml_lib_remla-1.1.0-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file ml_lib_remla-1.1.0.tar.gz.

File metadata

  • Download URL: ml_lib_remla-1.1.0.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1021-azure

File hashes

Hashes for ml_lib_remla-1.1.0.tar.gz
Algorithm Hash digest
SHA256 8b4a7b6406a3b808b34cdfd6b998cdf1046be25efc256c5be01207eacdecd35a
MD5 9e2831b76c1adececc9e1afd9793f332
BLAKE2b-256 44fe32d9f8aa816ad768d1a987650d7e9bb2db0b431c331bdd0002a1ada8610c

See more details on using hashes here.

File details

Details for the file ml_lib_remla-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: ml_lib_remla-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1021-azure

File hashes

Hashes for ml_lib_remla-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 34b0584da65130615eaf609f6b782e33df09e574ec2db4faade62862692c0812
MD5 d53b880e3323ad52e420bc1177f881f4
BLAKE2b-256 dc32a9b872ecf8ec993914643f8eefe4eb5e274c28ef694f0efcd41dd9e7c1f0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page