No project description provided
Project description
REMLA Team 6's ML Preprocessing Library
This is the implementation of the phishing detection pre-processing repository for CS4295 Release Engineering for Machine Learning Applications (Team 6) at TU Delft (Q4, 2024).
This package supports the preprocessing (tokenisation and label encoding) for the phishing detection dataset so that the same logic can be re-used at both training and inference time. In order to support this, the library is packaged with a tokenizer and label encoder that is fit on the entire dataset. Please refer to the PreprocessingUtil
class here for more information on how the tokenizer and label encoder are created.
The package release workflow is automatically triggered when a new Git tag is pushed. We follow semantic versioning for the library in the format v<major>.<minor>.<patch>
.
The package repository (PyPi) can be found here.
Installation
pip install ml-lib-remla
Usage
To get the current version using this package, execute the following lines in Python -
from ml_lib_remla.preprocessing import Preprocessing
pp = Preprocessing()
print(pp.tokenize_single("www.google.com"))
The Preprocessing
class supports the following functionalities -
- tokenize_batch: Performs the tokenization on a list of urls. This can be used to pre-process the entire dataset.
- tokenize_single: Performs the tokenization for a single url. This can be used to pre-process a single url at inference time.
- encode_label_batch: Performs the label encoding for a list of labels. This can be used to pre-process the target labels of the entire dataset.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file ml_lib_remla-1.1.4.tar.gz
.
File metadata
- Download URL: ml_lib_remla-1.1.4.tar.gz
- Upload date:
- Size: 5.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d2084d2b7fbfb26d9baf6f872a1a15a1d169f8a409c4f9dc7cbde147b62c9f5 |
|
MD5 | 5ecb2da92e8eb8ab14fa658e8315e00d |
|
BLAKE2b-256 | 1941a1b9018b96b899ea7c09b38d8c037e90e35153fc30a2144a922d41468204 |
File details
Details for the file ml_lib_remla-1.1.4-py3-none-any.whl
.
File metadata
- Download URL: ml_lib_remla-1.1.4-py3-none-any.whl
- Upload date:
- Size: 6.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.11.9 Linux/6.5.0-1022-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 00904b09260fdefa547b226f64988371772161b6f5770fdaaf2b6aa6df234fbe |
|
MD5 | adc06615bd34456296322809ff7b4d1a |
|
BLAKE2b-256 | 168b079bd51db8512875b4fd29c1948ca3591a172c46357ba4db21ed3ee62a5a |