A dictionary-based text classification library
Project description
Clear BOW 📚
Lightweight dictionary-based classifier that converts word frequencies into label probabilities using softmax/sigmoid functions. Perfect for bootstrapping classifications with terminology lists.
Features
- 🔍 Dictionary-based classification
- 📊 Multi-class (softmax) support
- 🏷️ Multi-label (sigmoid) support
- 📝 Simple terminology lists
- 🔢 Probability outputs
- 💾 Model save/load functionality
- 🎯 93% test coverage
Installation
# Via pip
pip install clear-bow
# Or from source
git clone https://github.com/samhardyhey/clear-bow
cd clear-bow
pip install -e .
Usage
from clear_bow.classifier import DictionaryClassifier
# Define your dictionary
super_dict = {
"regulation": ["asic", "government", "federal", "tax"],
"contribution": ["contribution", "concession", "personal", "after tax"],
"fund": ["unisuper", "aus super", "sun super", "qsuper"],
}
# Create classifier (multi-class by default)
dc = DictionaryClassifier(label_dictionary=super_dict)
# Or for multi-label classification
dc = DictionaryClassifier(
label_dictionary=super_dict,
classifier_type="multi_label"
)
# Make predictions
result = dc.predict_single("A 10% contribution to your super fund")
# Returns probability distribution across labels
# Batch predictions
results = dc.predict_batch([
"A 10% contribution to your super fund",
"Government regulation of super funds"
])
# Save model to disk
dc.to_disk("path/to/model")
# Load model from disk
dc = DictionaryClassifier()
dc.from_disk("path/to/model")
Development
# Setup development environment
make setup-local-dev
source venv/bin/activate
# Run tests
make test-local
# Run tests with coverage
make test-coverage
# Multi-environment testing
make test-tox
# Build distribution
make dist-bundle-build
# Clean build artifacts
make clean
# Upload to PyPI
make publish
Project Structure
clear-bow/
├── src/
│ └── clear_bow/
│ ├── __init__.py
│ └── classifier.py
├── tests/
│ ├── conftest.py
│ └── test_classifier.py
├── pyproject.toml # Project configuration
├── tox.ini # Multi-environment testing
└── makefile # Development commands
Features in Detail
Multi-class Classification
- Uses softmax transformation
- Outputs sum to 1.0
- Best for mutually exclusive categories
Multi-label Classification
- Uses sigmoid transformation
- Each label gets independent probability
- Best for non-exclusive categories
Error Handling
- Validates classifier types
- Handles missing/invalid files
- Provides informative error messages
File Operations
- Save model configuration
- Save label dictionaries
- Load models from disk
Note: See tests for additional usage examples and edge cases.
License
MIT License - See LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clear_bow-1.0.0.tar.gz.
File metadata
- Download URL: clear_bow-1.0.0.tar.gz
- Upload date:
- Size: 14.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fb634865e5f12dd68759bf597113d6c85b8866f67acd0100072e6e4067e3d860
|
|
| MD5 |
5331c00cda4e96ac968b2954dcf0fbf8
|
|
| BLAKE2b-256 |
4141466803e1712955b575d400fa3a076b8902e45048552236a043cd006e3d29
|
File details
Details for the file clear_bow-1.0.0-py3-none-any.whl.
File metadata
- Download URL: clear_bow-1.0.0-py3-none-any.whl
- Upload date:
- Size: 7.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1a70831aa358149ddf8c44e5c79b10f09adbfef5b83c914fbc6e9ec292935e6
|
|
| MD5 |
22a01019a83be6823f6cda554404585c
|
|
| BLAKE2b-256 |
0c96f37f7c3221cbde995ebd986dd2f9a496b5726d0d9777ee5a9f8e6a742cfa
|