Skip to main content

A text classification toolkit to easily build, train and evaluate deep learning text classifiers using PyTorch.

Project description

torchTextClassifiers

Documentation

A unified, extensible framework for text classification with categorical variables built on PyTorch and PyTorch Lightning.

🚀 Features

  • Complex input support: Handle text data alongside categorical variables seamlessly.
  • Unified yet highly customizable:
    • Use any tokenizer from HuggingFace or the original fastText's ngram tokenizer.
    • Manipulate the components (TextEmbedder, CategoricalVariableNet, ClassificationHead) to easily create custom architectures - including self-attention. All of them are torch.nn.Module !
    • The TextClassificationModel class combines these components and can be extended for custom behavior.
  • Multiclass / multilabel classification support: Support for both multiclass (only one label is true) and multi-label (several labels can be true) classification tasks.
  • PyTorch Lightning: Automated training with callbacks, early stopping, and logging
  • Easy experimentation: Simple API for training, evaluating, and predicting with minimal code:
    • The torchTextClassifiers wrapper class orchestrates the tokenizer and the model for you
  • Additional features: explainability using Captum

📦 Installation

# Clone the repository
git clone https://github.com/InseeFrLab/torchTextClassifiers.git
cd torchtextClassifiers

# Install with uv (recommended)
uv sync

# Or install with pip
pip install -e .

📖 Documentation

Full documentation is available at: https://inseefrlab.github.io/torchTextClassifiers/ The documentation includes:

  • Getting Started: Installation and quick start guide
  • Architecture: Understanding the 3-layer design
  • Tutorials: Step-by-step guides for different use cases
  • API Reference: Complete API documentation

📝 Usage

Checkout the notebook for a quick start.

📚 Examples

See the examples/ directory for:

  • Basic text classification
  • Multi-class classification
  • Mixed features (text + categorical)
  • Advanced training configurations
  • Prediction and explainability

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtextclassifiers-1.0.2.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchtextclassifiers-1.0.2-py3-none-any.whl (34.5 kB view details)

Uploaded Python 3

File details

Details for the file torchtextclassifiers-1.0.2.tar.gz.

File metadata

  • Download URL: torchtextclassifiers-1.0.2.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for torchtextclassifiers-1.0.2.tar.gz
Algorithm Hash digest
SHA256 e5858f4f6476ae30c0b70feaf802e447315f8fcb963838fffd7d393c3fee6dec
MD5 844212dc78a42211757ad7c9871fb149
BLAKE2b-256 10987258e9e461457baa31c106e76f8775ff4b837a47eabb254c06a3dbf6f909

See more details on using hashes here.

File details

Details for the file torchtextclassifiers-1.0.2-py3-none-any.whl.

File metadata

  • Download URL: torchtextclassifiers-1.0.2-py3-none-any.whl
  • Upload date:
  • Size: 34.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.17 {"installer":{"name":"uv","version":"0.9.17","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for torchtextclassifiers-1.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c0606fdb5d7f329841fea66489cb196dc1d1e8c9c009f59a0352b0c7834cd022
MD5 5f642ad256aa8e876b1e0019e7d7b4ed
BLAKE2b-256 615f446af028f844cac554791a93769dd70fcfb211e4fa345817c61d543d2600

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page