Python package to sanitize in a standard way ML-related labels.

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3

Project description

Sanitize ML Labels

Sanitize ML Labels is a Python package designed to standardize and sanitize ML-related labels. Currently supports over 100 labels, including metric and model names.

If you have ML-related labels, and you find yourself renaming and sanitizing them in a consistent manner, with the proper capitalizaton, this package ensures they are always sanitized in a standard way.

How do I install this package?

You can install it using pip:

pip install sanitize_ml_labels

Usage examples

Here are some common use cases for normalizing labels:

Example for metrics

from sanitize_ml_labels import sanitize_ml_labels

labels = [
    "acc",
    "loss",
    "auroc",
    "lr"
]

assert sanitize_ml_labels(labels) == [
    "Accuracy",
    "Loss",
    "AUROC",
    "Learning rate"
]

Example for models

from sanitize_ml_labels import sanitize_ml_labels

labels = [
    "mlp",
    "cnn",
    "ffNN",
    "Feed-forward neural network",
    "perceptron",
    "recurrent neural network",
    "LStM"
]

assert sanitize_ml_labels(labels) == [
    "MLP",
    "CNN",
    "FFNN",
    "FFNN",
    "Perceptron",
    "RNN",
    "LSTM"
]

assert sanitize_ml_labels("vanilla mlp") == "MLP"
assert sanitize_ml_labels("vanilla cnn") == "CNN"

assert sanitize_ml_labels([
    "Large Language Model",
    "transe",
    "Generative Pre-trained Transformer",
    "Graph Convolutional Neural Network",
    "Convolutional Graph Neural Network",
    "Graph Neural Network",
    "Graph Attention Network",
    "Graph Attention Neural Network",
]) == ["LLM","TransE","GPT","GCN","GCN","GNN","GAT","GAT"]

Sometimes, it happens that you have prefixed all your models with "vanilla" or "simple" or "basic". This package can help you remove these prefixes.

from sanitize_ml_labels import sanitize_ml_labels

labels = [
    "vanilla mlp",
    "vanilla cnn",
    "vanilla ffnn",
    "vanilla perceptron"
]

assert sanitize_ml_labels(labels) == ["MLP", "CNN", "FFNN", "Perceptron"]

Corner cases

Sometimes, you might encounter hyphenated terms that need to be correctly identified and normalized. We use a heuristic approach based on an extended list of over 45K hyphenated English words, originally from the Metadata consulting website.

The lookup heuristic, written by Tommaso Fontana, ensures efficient and accurate hyphenated word recognition.

from sanitize_ml_labels import sanitize_ml_labels

# Running the following
assert sanitize_ml_labels("non-existent-edges-in-graph") == "Non-existent edges in graph"

Extra utilities

In addition to label sanitization, the package provides methods to check metric normalization:

Is normalized metric

Validates if a metric falls within the range [0, 1].

from sanitize_ml_labels import is_normalized_metric

assert not is_normalized_metric("MSE")
assert is_normalized_metric("acc")
assert is_normalized_metric("accuracy")
assert is_normalized_metric("AUROC")
assert is_normalized_metric("auprc")

Is absolutely normalized metric

Validates if a metric falls within the range [-1, 1].

from sanitize_ml_labels import is_absolutely_normalized_metric

assert not is_absolutely_normalized_metric("auprc")
assert is_absolutely_normalized_metric("MCC")
assert is_absolutely_normalized_metric("Markedness")

Shoud be maximized

Whether a metric should be maximized or minimized. Unknown metrics will raise a NotImplementedError.

from sanitize_ml_labels import should_be_maximized

assert not should_be_maximized("MSE")
assert should_be_maximized("AUROC")
assert should_be_maximized("accuracy")

License

This software is licensed under the MIT license. See the LICENSE.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.1.5

Jan 7, 2026

1.1.4

Oct 28, 2024

1.1.3

Oct 28, 2024

1.1.2

Oct 2, 2024

1.1.0

Aug 2, 2024

1.0.51

Sep 27, 2023

1.0.50

Nov 11, 2022

1.0.49

Aug 23, 2022

1.0.48

Aug 21, 2022

1.0.47

Aug 21, 2022

1.0.46

Aug 21, 2022

1.0.45

Aug 19, 2022

1.0.44

Aug 19, 2022

1.0.43

Jul 1, 2022

1.0.42

Jun 8, 2022

1.0.41

Jun 1, 2022

1.0.40

May 24, 2022

1.0.39

May 23, 2022

1.0.38

May 5, 2022

1.0.37

May 1, 2022

1.0.36

May 1, 2022

1.0.35

Apr 30, 2022

1.0.33

Apr 19, 2022

1.0.32

Apr 19, 2022

1.0.31

Apr 14, 2022

1.0.30

Mar 28, 2022

1.0.29

Nov 22, 2021

1.0.28

Nov 22, 2021

1.0.27

Nov 18, 2021

1.0.26

Apr 8, 2021

1.0.25

Apr 8, 2021

1.0.24

Jan 5, 2021

1.0.23

Dec 6, 2020

1.0.22

Nov 30, 2020

1.0.21

Nov 6, 2020

1.0.20

Nov 6, 2020

1.0.19

Nov 5, 2020

1.0.18

Oct 31, 2020

1.0.17

Oct 31, 2020

1.0.16

Oct 19, 2020

1.0.15

Oct 18, 2020

1.0.14

Oct 18, 2020

1.0.13

Sep 27, 2020

1.0.12

Jul 30, 2020

1.0.11

Jul 11, 2020

1.0.10

Jul 11, 2020

1.0.9

Mar 9, 2020

1.0.8

Feb 23, 2020

1.0.7

Jan 19, 2020

1.0.6

Nov 17, 2019

1.0.5

Nov 17, 2019

1.0.4

Nov 17, 2019

1.0.3

Nov 17, 2019

1.0.2

Nov 17, 2019

1.0.1

Nov 17, 2019

1.0.0

Nov 17, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sanitize_ml_labels-1.1.5.tar.gz (327.4 kB view details)

Uploaded Jan 7, 2026 Source

File details

Details for the file sanitize_ml_labels-1.1.5.tar.gz.

File metadata

Download URL: sanitize_ml_labels-1.1.5.tar.gz
Upload date: Jan 7, 2026
Size: 327.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for sanitize_ml_labels-1.1.5.tar.gz
Algorithm	Hash digest
SHA256	`af777b269aac26270faf501224836a145a3a0bdb028b688c45d777645098f77c`
MD5	`d6578e03e9d6e85f2289c5c1c4e4ddb7`
BLAKE2b-256	`6b43dc0d941b7476ac62884ae21d1c02dee2c36bf357859e9a429de0871d7855`

See more details on using hashes here.

sanitize-ml-labels 1.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Sanitize ML Labels

How do I install this package?

Usage examples

Example for metrics

Example for models

Corner cases

Extra utilities

Is normalized metric

Is absolutely normalized metric

Shoud be maximized

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes