Training Data Quality Analyzer — analyze labeled text classification datasets for quality issues

These details have not been verified by PyPI

Project description

title: LabelLens emoji: 🔍 colorFrom: blue colorTo: purple sdk: docker app_file: app.py pinned: false license: mit

Label Lens

Training data quality analyzer for text classification datasets. Upload a CSV with text and label columns, get an automated quality report with actionable recommendations.

Features

Auto-detect columns — Automatically identifies text and label columns in your CSV
Class distribution analysis — Imbalance ratio, effective number of classes, long-tail detection, suggested focal loss weights
Duplicate detection — Exact duplicates and near-duplicates via TF-IDF cosine similarity, with cross-class conflicts flagged as critical
Label noise scoring — Cross-validated confidence scoring to surface likely mislabels
Actionable report — Severity ratings (Critical/Warning/Info) with specific recommendations
Interactive visualizations — Plotly charts for exploring your data

Quick Start

As a web app

pip install label-lens[app]
streamlit run app.py

Or with uv:

uv sync
uv run streamlit run app.py

A sample dataset is included for demo purposes.

As a library

pip install label-lens

import pandas as pd
from label_lens import (
    analyze_distribution,
    find_exact_duplicates,
    find_near_duplicates,
    score_label_noise,
    generate_report,
)

df = pd.read_csv("your_dataset.csv")  # must have 'text' and 'label' columns

dist = analyze_distribution(df)
dups = find_exact_duplicates(df)
near_dups = find_near_duplicates(df)
noise = score_label_noise(df)

report = generate_report(dist, dups, near_dups, noise)
print(report["overall_severity"])  # "Critical", "Warning", or "Info"
print(report["recommendations"])

If your CSV uses different column names, use prepare_dataframe to standardize them:

from label_lens import prepare_dataframe

df = prepare_dataframe(raw_df, text_col="content", label_col="category")

Installation

Requires Python 3.13+.

# Library only (pandas, numpy, scikit-learn)
pip install label-lens

# With Streamlit app and Plotly charts
pip install label-lens[app]

# Development
pip install label-lens[dev]

How It Works

Distribution analysis computes imbalance ratio, entropy-based effective class count, and identifies long-tail classes (<1% representation). It also calculates inverse-frequency focal loss alpha values.

Duplicate detection finds exact text matches and uses TF-IDF vectorization with chunked cosine similarity to find near-duplicates. Cross-class duplicates (same text, different labels) are flagged as critical since they represent definite labeling errors.

Noise scoring trains a logistic regression on TF-IDF features using stratified k-fold cross-validation. For each sample, it records the model's confidence in the given label. The bottom 5th percentile by confidence are flagged as mislabel suspects.

Project Structure

label_lens/
├── ingest.py         # Column detection, validation, DataFrame prep
├── distribution.py   # Class distribution analysis + visualization
├── duplicates.py     # Exact and near-duplicate detection
├── noise.py          # Label noise scoring via cross-validated confidence
├── report.py         # Aggregate findings and generate recommendations
└── utils.py          # Shared helpers

Development

# Install dev dependencies
uv sync --all-extras

# Run tests
pytest tests/ -v

# Lint and format
ruff check .
ruff format .

Deployment

Label Lens is designed for deployment on Hugging Face Spaces using Docker. The Dockerfile at the repo root handles the build.

Tech Stack

Python 3.13+
Streamlit
pandas / numpy
scikit-learn (TF-IDF, logistic regression, cross-validation)
Plotly

License

MIT

Built by Mike Noe

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

May 18, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

label_lens-0.1.0.tar.gz (13.7 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

label_lens-0.1.0-py3-none-any.whl (12.0 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file label_lens-0.1.0.tar.gz.

File metadata

Download URL: label_lens-0.1.0.tar.gz
Upload date: May 18, 2026
Size: 13.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for label_lens-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6d07edddf9272a0fd0126dc3c63531de61cc1b61eba93f7f1280e07dbfe8a51e`
MD5	`d1203037bfbb8c399a4787fdaae43976`
BLAKE2b-256	`3fd9259602b2cd51106c158fdb16f9549d529b844228f8aa107191e571ecb150`

See more details on using hashes here.

File details

Details for the file label_lens-0.1.0-py3-none-any.whl.

File metadata

Download URL: label_lens-0.1.0-py3-none-any.whl
Upload date: May 18, 2026
Size: 12.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for label_lens-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2e0d1befe7716fdea22d44612d89133f55ac47eb1f412380febcb6232269e1f8`
MD5	`ce782cf0934c71f099bc4c77d891884f`
BLAKE2b-256	`d4e05db21e8fa7188488ad0a00af33c881488147667301c263596ccc0200996e`

See more details on using hashes here.

label-lens 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

title: LabelLens emoji: 🔍 colorFrom: blue colorTo: purple sdk: docker app_file: app.py pinned: false license: mit

Label Lens

Features

Quick Start

As a web app

As a library

Installation

How It Works

Project Structure

Development

Deployment

Tech Stack

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes