Machine Learning Toolbox
Project description
Machine Learning Toolbox 2 - MLTB2
A box of machine learning tools.
The main components are:
from mltb2.somajo import SoMaJoSentenceSplitter
Split texts into sentences. For German and English language.
This is done with the SoMaJo tool.
from mltb2.transformers import TransformersTokenCounter
Count tokens made by a Transformers tokenizer.
from mltb2.somajo_transformers import TextSplitter
Split the text into sections with a specified maximum token length.
Does not divide words, but always whole sentences.
from mltb2.optuna import SignificanceRepeatedTrainingPruner
An Optuna pruner
to use statistical significance (a t-test which serves as a heuristic) to stop
unpromising trials early, avoiding unnecessary repeated training during cross validation.
Installation
MLTB2 is available at the Python Package Index (PyPI). It can be installed with pip:
pip install mltb2
Some optional dependencies might be necessary. You can install all of them with:
pip install mltb2[optional]
Licensing
Copyright (c) 2023 Philip May
Copyright (c) 2023 Philip May, Deutsche Telekom AG
Licensed under the MIT License (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License by reviewing the file LICENSE in the repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.