Skip to main content

No project description provided

Project description

REMLA Team 6's ML Preprocessing Library

This is the implementation of the phishing detection pre-processing repository for CS4295 Release Engineering for Machine Learning Applications (Team 6) at TU Delft (Q4, 2024). This package supports the preprocessing (tokenisation and label encoding) for the phishing detection dataset so that the same logic can be re-used at both training and inference time. In order to support this, the library is packaged with a tokenizer and label encoder that is fit on the entire dataset. Please refer to the PreprocessingUtil class here for more information on how the tokenizer and label encoder are created.

The package release workflow is automatically triggered when a new Git tag is pushed. We follow semantic versioning for the library in the format v<major>.<minor>.<patch>. The package repository (PyPi) can be found here.

Installation

pip install ml-lib-remla

Usage

To get the current version using this package, execute the following lines in Python -

from ml_lib_remla.preprocessing import Preprocessing
pp = Preprocessing()
print(pp.tokenize_single("www.google.com"))

The Preprocessing class supports the following functionalities -

  • tokenize_batch: Performs the tokenization on a list of urls. This can be used to pre-process the entire dataset.
  • tokenize_single: Performs the tokenization for a single url. This can be used to pre-process a single url at inference time.
  • encode_label_batch: Performs the label encoding for a list of labels. This can be used to pre-process the target labels of the entire dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_lib_remla-1.1.4.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

ml_lib_remla-1.1.4-py3-none-any.whl (6.5 kB view details)

Uploaded Python 3

File details

Details for the file ml_lib_remla-1.1.4.tar.gz.

File metadata

  • Download URL: ml_lib_remla-1.1.4.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1022-azure

File hashes

Hashes for ml_lib_remla-1.1.4.tar.gz
Algorithm Hash digest
SHA256 6d2084d2b7fbfb26d9baf6f872a1a15a1d169f8a409c4f9dc7cbde147b62c9f5
MD5 5ecb2da92e8eb8ab14fa658e8315e00d
BLAKE2b-256 1941a1b9018b96b899ea7c09b38d8c037e90e35153fc30a2144a922d41468204

See more details on using hashes here.

File details

Details for the file ml_lib_remla-1.1.4-py3-none-any.whl.

File metadata

  • Download URL: ml_lib_remla-1.1.4-py3-none-any.whl
  • Upload date:
  • Size: 6.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1022-azure

File hashes

Hashes for ml_lib_remla-1.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 00904b09260fdefa547b226f64988371772161b6f5770fdaaf2b6aa6df234fbe
MD5 adc06615bd34456296322809ff7b4d1a
BLAKE2b-256 168b079bd51db8512875b4fd29c1948ca3591a172c46357ba4db21ed3ee62a5a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page