Skip to main content

A dictionary-based text classification library

Project description

Clear BOW 📚

License: MIT

Lightweight dictionary-based classifier that converts word frequencies into label probabilities using softmax/sigmoid functions. Perfect for bootstrapping classifications with terminology lists.

Features

  • 🔍 Dictionary-based classification
  • 📊 Multi-class (softmax) support
  • 🏷️ Multi-label (sigmoid) support
  • 📝 Simple terminology lists
  • 🔢 Probability outputs
  • 💾 Model save/load functionality
  • 🎯 93% test coverage

Installation

# Via pip
pip install clear-bow

# Or from source
git clone https://github.com/samhardyhey/clear-bow
cd clear-bow
pip install -e .

Usage

from clear_bow.classifier import DictionaryClassifier

# Define your dictionary
super_dict = {
    "regulation": ["asic", "government", "federal", "tax"],
    "contribution": ["contribution", "concession", "personal", "after tax"],
    "fund": ["unisuper", "aus super", "sun super", "qsuper"],
}

# Create classifier (multi-class by default)
dc = DictionaryClassifier(label_dictionary=super_dict)

# Or for multi-label classification
dc = DictionaryClassifier(
    label_dictionary=super_dict,
    classifier_type="multi_label"
)

# Make predictions
result = dc.predict_single("A 10% contribution to your super fund")
# Returns probability distribution across labels

# Batch predictions
results = dc.predict_batch([
    "A 10% contribution to your super fund",
    "Government regulation of super funds"
])

# Save model to disk
dc.to_disk("path/to/model")

# Load model from disk
dc = DictionaryClassifier()
dc.from_disk("path/to/model")

Development

# Setup development environment
make setup-local-dev
source venv/bin/activate

# Run tests
make test-local

# Run tests with coverage
make test-coverage

# Multi-environment testing
make test-tox

# Build distribution
make dist-bundle-build

# Clean build artifacts
make clean

# Upload to PyPI
make publish

Project Structure

clear-bow/
├── src/
│   └── clear_bow/
│       ├── __init__.py
│       └── classifier.py
├── tests/
│   ├── conftest.py
│   └── test_classifier.py
├── pyproject.toml    # Project configuration
├── tox.ini          # Multi-environment testing
└── makefile         # Development commands

Features in Detail

Multi-class Classification

  • Uses softmax transformation
  • Outputs sum to 1.0
  • Best for mutually exclusive categories

Multi-label Classification

  • Uses sigmoid transformation
  • Each label gets independent probability
  • Best for non-exclusive categories

Error Handling

  • Validates classifier types
  • Handles missing/invalid files
  • Provides informative error messages

File Operations

  • Save model configuration
  • Save label dictionaries
  • Load models from disk

Note: See tests for additional usage examples and edge cases.

License

MIT License - See LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clear_bow-1.0.0.tar.gz (14.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clear_bow-1.0.0-py3-none-any.whl (7.0 kB view details)

Uploaded Python 3

File details

Details for the file clear_bow-1.0.0.tar.gz.

File metadata

  • Download URL: clear_bow-1.0.0.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for clear_bow-1.0.0.tar.gz
Algorithm Hash digest
SHA256 fb634865e5f12dd68759bf597113d6c85b8866f67acd0100072e6e4067e3d860
MD5 5331c00cda4e96ac968b2954dcf0fbf8
BLAKE2b-256 4141466803e1712955b575d400fa3a076b8902e45048552236a043cd006e3d29

See more details on using hashes here.

File details

Details for the file clear_bow-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: clear_bow-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for clear_bow-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a1a70831aa358149ddf8c44e5c79b10f09adbfef5b83c914fbc6e9ec292935e6
MD5 22a01019a83be6823f6cda554404585c
BLAKE2b-256 0c96f37f7c3221cbde995ebd986dd2f9a496b5726d0d9777ee5a9f8e6a742cfa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page