Skip to main content

A text classification toolkit to easily build, train and evaluate deep learning text classifiers using PyTorch.

Project description

torchTextClassifiers

Documentation

A unified, extensible framework for text classification with categorical variables built on PyTorch and PyTorch Lightning.

🚀 Features

  • Complex input support: Handle text data alongside categorical variables seamlessly.
  • Unified yet highly customizable:
    • Use any tokenizer from HuggingFace or the original fastText's ngram tokenizer.
    • Manipulate the components (TextEmbedder, CategoricalVariableNet, ClassificationHead) to easily create custom architectures - including self-attention. All of them are torch.nn.Module !
    • The TextClassificationModel class combines these components and can be extended for custom behavior.
  • Multiclass / multilabel classification support: Support for both multiclass (only one label is true) and multi-label (several labels can be true) classification tasks.
  • PyTorch Lightning: Automated training with callbacks, early stopping, and logging
  • Easy experimentation: Simple API for training, evaluating, and predicting with minimal code:
    • The torchTextClassifiers wrapper class orchestrates the tokenizer and the model for you
  • Additional features: explainability using Captum

📦 Installation

# Clone the repository
git clone https://github.com/InseeFrLab/torchTextClassifiers.git
cd torchTextClassifiers

# Install with uv (recommended)
uv sync

# Or install with pip
pip install -e .

📖 Documentation

Full documentation is available at: https://inseefrlab.github.io/torchTextClassifiers/ The documentation includes:

  • Getting Started: Installation and quick start guide
  • Architecture: Understanding the 3-layer design
  • Tutorials: Step-by-step guides for different use cases
  • API Reference: Complete API documentation

📝 Usage

Checkout the notebook for a quick start.

📚 Examples

See the examples/ directory for:

  • Basic text classification
  • Multi-class classification
  • Mixed features (text + categorical)
  • Advanced training configurations
  • Prediction and explainability

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtextclassifiers-1.0.4.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchtextclassifiers-1.0.4-py3-none-any.whl (37.3 kB view details)

Uploaded Python 3

File details

Details for the file torchtextclassifiers-1.0.4.tar.gz.

File metadata

  • Download URL: torchtextclassifiers-1.0.4.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for torchtextclassifiers-1.0.4.tar.gz
Algorithm Hash digest
SHA256 026bb7f873554bac4b1f05371de5532485876223a639a1a21f823b2e07977679
MD5 1a80f87fee4932c186878beaf32193ff
BLAKE2b-256 e5a93a00066b8fea10e7093e058e232d90d300a2cf526bec96cc651a2bf68b0f

See more details on using hashes here.

File details

Details for the file torchtextclassifiers-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: torchtextclassifiers-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 37.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.27 {"installer":{"name":"uv","version":"0.9.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for torchtextclassifiers-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 a1afa18ecd6a4db0ca4936523ab03b1500a9b44eb9541e276668c603a1509bab
MD5 3866ae44aea9fc6852d8c149a5f2a195
BLAKE2b-256 cca674e2e607fd784090c0b340eed4a89addb75d65d43aa90e71926c67ffa03d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page