Skip to main content

A text classification toolkit to easily build, train and evaluate deep learning text classifiers using PyTorch.

Project description

torchTextClassifiers

Documentation

A unified, extensible framework for text classification with categorical variables built on PyTorch and PyTorch Lightning.

🚀 Features

  • Complex input support: Handle text data alongside categorical variables seamlessly.
  • Unified yet highly customizable:
    • Use any tokenizer from HuggingFace or the original fastText's ngram tokenizer.
    • Manipulate the components (TextEmbedder, CategoricalVariableNet, ClassificationHead) to easily create custom architectures - including self-attention. All of them are torch.nn.Module !
    • The TextClassificationModel class combines these components and can be extended for custom behavior.
  • Multiclass / multilabel classification support: Support for both multiclass (only one label is true) and multi-label (several labels can be true) classification tasks.
  • PyTorch Lightning: Automated training with callbacks, early stopping, and logging
  • Easy experimentation: Simple API for training, evaluating, and predicting with minimal code:
    • The torchTextClassifiers wrapper class orchestrates the tokenizer and the model for you
  • Additional features: explainability using Captum

📦 Installation

# Clone the repository
git clone https://github.com/InseeFrLab/torchTextClassifiers.git
cd torchtextClassifiers

# Install with uv (recommended)
uv sync

# Or install with pip
pip install -e .

📖 Documentation

Full documentation is available at: https://inseefrlab.github.io/torchTextClassifiers/ The documentation includes:

  • Getting Started: Installation and quick start guide
  • Architecture: Understanding the 3-layer design
  • Tutorials: Step-by-step guides for different use cases
  • API Reference: Complete API documentation

📝 Usage

Checkout the notebook for a quick start.

📚 Examples

See the examples/ directory for:

  • Basic text classification
  • Multi-class classification
  • Mixed features (text + categorical)
  • Advanced training configurations
  • Prediction and explainability

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtextclassifiers-1.0.1.tar.gz (25.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchtextclassifiers-1.0.1-py3-none-any.whl (33.5 kB view details)

Uploaded Python 3

File details

Details for the file torchtextclassifiers-1.0.1.tar.gz.

File metadata

  • Download URL: torchtextclassifiers-1.0.1.tar.gz
  • Upload date:
  • Size: 25.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for torchtextclassifiers-1.0.1.tar.gz
Algorithm Hash digest
SHA256 5640c4aa71c1313e493f57f760f1a61f6762ea24b42f25a3593e545ea79ecbdc
MD5 2d8a42de52fe39448f7b97028bc38384
BLAKE2b-256 fd4084dff242828fe5dce1e4420b9408e47016ca91f3ec9cc319a33ee8fd6f4f

See more details on using hashes here.

File details

Details for the file torchtextclassifiers-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: torchtextclassifiers-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 33.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for torchtextclassifiers-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 595722de4eb1537e268e12c963d088ac002e67fc70719b85048d7242cf960cdd
MD5 20082e246bab263878662ce5fd0c2df9
BLAKE2b-256 5a4e0562cd81d92e80923665207dd28230bfab54cf9cb875abf4cde8dad3cfd0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page