Generalist and Lightweight Model for Text Classification
Project description
⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification
GLiClass is an efficient, zero-shot sequence classification model inspired by the GLiNER framework. It achieves comparable performance to traditional cross-encoder models while being significantly more computationally efficient, offering classification results approximately 10 times faster by performing classification in a single forward pass.
📄 Blog
•
📢 Discord
•
📺 Demo
•
🤗 Available models
•
🚀 Quick Start
Install GLiClass easily using pip:
pip install gliclass
Install from Source
Clone and install directly from GitHub:
git clone https://github.com/Knowledgator/GLiClass
cd GLiClass
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
pip install .
Verify your installation:
import gliclass
print(gliclass.__version__)
🧑💻 Usage Example
from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer
model = GLiClassModel.from_pretrained("knowledgator/gliclass-small-v1.0")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-small-v1.0")
pipeline = ZeroShotClassificationPipeline(
model, tokenizer, classification_type='multi-label', device='cuda:0'
)
text = "One day I will see the world!"
labels = ["travel", "dreams", "sport", "science", "politics"]
results = pipeline(text, labels, threshold=0.5)[0]
for result in results:
print(f"{result['label']} => {result['score']:.3f}")
🌟 Retrieval-Augmented Classification (RAC)
With new models trained with retrieval-agumented classification, such as this model you can specify examples to improve classification accuracy:
example = {
"text": "A new machine learning platform automates complex data workflows but faces integration issues.",
"all_labels": ["AI", "automation", "data_analysis", "usability", "integration"],
"true_labels": ["AI", "integration", "automation"]
}
text = "The new AI-powered tool streamlines data analysis but has limited integration capabilities."
labels = ["AI", "automation", "data_analysis", "usability", "integration"]
results = pipeline(text, labels, threshold=0.1, rac_examples=[example])[0]
for predict in results:
print(f"{predict['label']} => {predict['score']:.3f}")
🎯 Key Use Cases
- Sentiment Analysis: Rapidly classify texts as positive, negative, or neutral.
- Document Classification: Efficiently organize and categorize large document collections.
- Search Results Re-ranking: Improve relevance and precision by reranking search outputs.
- News Categorization: Automatically tag and organize news articles into predefined categories.
- Fact Checking: Quickly validate and categorize statements based on factual accuracy.
🛠️ How to Train
Prepare your training data as follows:
[
{"text": "Sample text.", "all_labels": ["sports", "science", "business"], "true_labels": ["sports"]},
...
]
Optionally, specify confidence scores explicitly:
[
{"text": "Sample text.", "all_labels": ["sports", "science"], "true_labels": {"sports": 0.9}},
...
]
Please, refer to the train.py script to set up your training from scratch or fine-tune existing models.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gliclass-0.1.12.tar.gz.
File metadata
- Download URL: gliclass-0.1.12.tar.gz
- Upload date:
- Size: 29.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0d579eec15cf9204bef4b5bf3d94d81fd679f54a34ae71d951eada681da626fa
|
|
| MD5 |
715711120131525fb40a589b1b597ff8
|
|
| BLAKE2b-256 |
f189ef49696e563c0aedc75c6dffe28e986dfec2e925070ff99366cfe90c85b0
|
File details
Details for the file gliclass-0.1.12-py3-none-any.whl.
File metadata
- Download URL: gliclass-0.1.12-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e32dcb5b363ab932d971216d6c8ec55357b833e50d79e05f0292c5f3195162f
|
|
| MD5 |
7c94d05408e403889ad65f2d5162f316
|
|
| BLAKE2b-256 |
a8cefcc556ad7d4e113e3b02a8f63527b8dd02487ad23eb4128fdb0317747d07
|