Skip to main content

Generalist and Lightweight Model for Text Classification

Project description

⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification

This is an efficient zero-shot classifier inspired by GLiNER work. It demonstrates the same performance as a cross-encoder while being more compute-efficient because classification is done at a single forward path.

It can be used for topic classification, sentiment analysis and as a reranker in RAG pipelines.

Instalation:

pip install gliclass

How to use:

from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-small-v1")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-small-v1")

pipeline = ZeroShotClassificationPipeline(model, tokenizer, classification_type='multi-label', device='cuda:0')

text = "One day I will see the world!"
labels = ["travel", "dreams", "sport", "science", "politics"]
results = pipeline(text, labels, threshold=0.5)[0] #because we have one text

for result in results:
 print(result["label"], "=>", result["score"])

How to train:

Prepare training data in the following format: [ {"text": "Some text here!", "all_labels": ["sport", "science", "business", ...], "true_labels": ["other"]}, ... ]

Specify your training parameters in the train.py script.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gliclass-0.1.3.tar.gz (16.9 kB view hashes)

Uploaded Source

Built Distribution

gliclass-0.1.3-py3-none-any.whl (19.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page