A text classification toolkit to easily build, train and evaluate deep learning text classifiers using PyTorch.
Project description
torchTextClassifiers
A unified, extensible framework for text classification with categorical variables built on PyTorch and PyTorch Lightning.
🚀 Features
- Complex input support: Handle text data alongside categorical variables seamlessly.
- Unified yet highly customizable:
- Use any tokenizer from HuggingFace or the original fastText's ngram tokenizer.
- Manipulate the components (
TextEmbedder,CategoricalVariableNet,ClassificationHead) to easily create custom architectures - including self-attention. All of them aretorch.nn.Module! - The
TextClassificationModelclass combines these components and can be extended for custom behavior.
- Multiclass / multilabel classification support: Support for both multiclass (only one label is true) and multi-label (several labels can be true) classification tasks.
- PyTorch Lightning: Automated training with callbacks, early stopping, and logging
- Easy experimentation: Simple API for training, evaluating, and predicting with minimal code:
- The
torchTextClassifierswrapper class orchestrates the tokenizer and the model for you
- The
- Additional features: explainability using Captum
📦 Installation
# Clone the repository
git clone https://github.com/InseeFrLab/torchTextClassifiers.git
cd torchTextClassifiers
# Install with uv (recommended)
uv sync
# Or install with pip
pip install -e .
📖 Documentation
Full documentation is available at: https://inseefrlab.github.io/torchTextClassifiers/ The documentation includes:
- Getting Started: Installation and quick start guide
- Architecture: Understanding the 3-layer design
- Tutorials: Step-by-step guides for different use cases
- API Reference: Complete API documentation
📝 Usage
Checkout the notebook for a quick start.
📚 Examples
See the examples/ directory for:
- Basic text classification
- Multi-class classification
- Mixed features (text + categorical)
- Advanced training configurations
- Prediction and explainability
📄 License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file torchtextclassifiers-1.0.4.tar.gz.
File metadata
- Download URL: torchtextclassifiers-1.0.4.tar.gz
- Upload date:
- Size: 29.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
026bb7f873554bac4b1f05371de5532485876223a639a1a21f823b2e07977679
|
|
| MD5 |
1a80f87fee4932c186878beaf32193ff
|
|
| BLAKE2b-256 |
e5a93a00066b8fea10e7093e058e232d90d300a2cf526bec96cc651a2bf68b0f
|
File details
Details for the file torchtextclassifiers-1.0.4-py3-none-any.whl.
File metadata
- Download URL: torchtextclassifiers-1.0.4-py3-none-any.whl
- Upload date:
- Size: 37.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1afa18ecd6a4db0ca4936523ab03b1500a9b44eb9541e276668c603a1509bab
|
|
| MD5 |
3866ae44aea9fc6852d8c149a5f2a195
|
|
| BLAKE2b-256 |
cca674e2e607fd784090c0b340eed4a89addb75d65d43aa90e71926c67ffa03d
|