Skip to main content

A text classification toolkit to easily build, train and evaluate deep learning text classifiers using PyTorch.

Project description

torchTextClassifiers

Documentation

A unified, extensible framework for text classification with categorical variables built on PyTorch and PyTorch Lightning.

🚀 Features

  • Complex input support: Handle text data alongside categorical variables seamlessly.
  • Unified yet highly customizable:
    • Use any tokenizer from HuggingFace or the original fastText's ngram tokenizer.
    • Manipulate the components (TextEmbedder, CategoricalVariableNet, ClassificationHead) to easily create custom architectures - including self-attention. All of them are torch.nn.Module !
    • The TextClassificationModel class combines these components and can be extended for custom behavior.
  • Multiclass / multilabel classification support: Support for both multiclass (only one label is true) and multi-label (several labels can be true) classification tasks.
  • PyTorch Lightning: Automated training with callbacks, early stopping, and logging
  • Easy experimentation: Simple API for training, evaluating, and predicting with minimal code:
    • The torchTextClassifiers wrapper class orchestrates the tokenizer and the model for you
  • Additional features: explainability using Captum

📦 Installation

# Clone the repository
git clone https://github.com/InseeFrLab/torchTextClassifiers.git
cd torchtextClassifiers

# Install with uv (recommended)
uv sync

# Or install with pip
pip install -e .

📖 Documentation

Full documentation is available at: https://inseefrlab.github.io/torchTextClassifiers/ The documentation includes:

  • Getting Started: Installation and quick start guide
  • Architecture: Understanding the 3-layer design
  • Tutorials: Step-by-step guides for different use cases
  • API Reference: Complete API documentation

📝 Usage

Checkout the notebook for a quick start.

📚 Examples

See the examples/ directory for:

  • Basic text classification
  • Multi-class classification
  • Mixed features (text + categorical)
  • Advanced training configurations
  • Prediction and explainability

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtextclassifiers-1.0.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchtextclassifiers-1.0.0-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file torchtextclassifiers-1.0.0.tar.gz.

File metadata

  • Download URL: torchtextclassifiers-1.0.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for torchtextclassifiers-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0245daf62d3ae5a0746c6e71068f369f9ba644f714a7e8b67a75bc79b07812df
MD5 6eb1616755373e736c9a53b979d56ef0
BLAKE2b-256 fae90936ab47a10ad460ae54f3d0fdd1a4e548d520ce7537f484c1a297b798b4

See more details on using hashes here.

File details

Details for the file torchtextclassifiers-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: torchtextclassifiers-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.12 {"installer":{"name":"uv","version":"0.9.12"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for torchtextclassifiers-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4cf99d551cacda3b970f9cb86b461fb8c817cd6709c8306f34a8dacff0320fb7
MD5 8aa2c7e6eb95b6a4440b3e0848d6d74b
BLAKE2b-256 59f19705dbab265255b95cfaec04c90cf1b86e75d6da10c7f844a9471f9ed736

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page