Skip to main content

A text classification toolkit to easily build, train and evaluate deep learning text classifiers using PyTorch.

Project description

torchTextClassifiers

A unified, extensible framework for text classification with categorical variables built on PyTorch and PyTorch Lightning.

🚀 Features

  • Mixed input support: Handle text data alongside categorical variables seamlessly.
  • Unified yet highly customizable:
    • Use any tokenizer from HuggingFace or the original fastText's ngram tokenizer.
    • Manipulate the components (TextEmbedder, CategoricalVariableNet, ClassificationHead) to easily create custom architectures - including self-attention. All of them are torch.nn.Module !
    • The TextClassificationModel class combines these components and can be extended for custom behavior.
  • PyTorch Lightning: Automated training with callbacks, early stopping, and logging
  • Easy experimentation: Simple API for training, evaluating, and predicting with minimal code:
    • The torchTextClassifiers wrapper class orchestrates the tokenizer and the model for you
  • Additional features: explainability using Captum

📦 Installation

# Clone the repository
git clone https://github.com/InseeFrLab/torchTextClassifiers.git
cd torchtextClassifiers

# Install with uv (recommended)
uv sync

# Or install with pip
pip install -e .

📝 Usage

Checkout the notebook for a quick start.

📚 Examples

See the examples/ directory for:

  • Basic text classification
  • Multi-class classification
  • Mixed features (text + categorical)
  • Advanced training configurations
  • Prediction and explainability

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

torchtextclassifiers-0.1.0.tar.gz (24.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

torchtextclassifiers-0.1.0-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file torchtextclassifiers-0.1.0.tar.gz.

File metadata

File hashes

Hashes for torchtextclassifiers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 baa64c3699a5dd688cfd33783ae369426fa5b835affd204732da25c1133cbe00
MD5 c69ed631387934db62224242c03e9ec2
BLAKE2b-256 9a7db45565151ff79ce07516ecd10d7d6c9527b6291d7d44290b4617ab8d86e1

See more details on using hashes here.

File details

Details for the file torchtextclassifiers-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for torchtextclassifiers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6502e1ba15b075f32188b2df49786298f34d483db4ed20f05d0b254ccb78f971
MD5 b1d971cedde17170fa0415c5d216168a
BLAKE2b-256 6e90ff51e7a0f333165dd3441237ebf4bd67c324d1efcec69f52baf2df2b69aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page