Skip to main content

Generalist and Lightweight Model for Text Classification

Project description

⭐ GLiClass: Generalist and Lightweight Model for Sequence Classification

GLiClass is an efficient, zero-shot sequence classification model inspired by the GLiNER framework. It achieves comparable performance to traditional cross-encoder models while being significantly more computationally efficient, offering classification results approximately 10 times faster by performing classification in a single forward pass.

📄 Blog   •   📢 Discord   •   📺 Demo   •   🤗 Available models   •  

🚀 Quick Start

Install GLiClass easily using pip:

pip install gliclass

Install from Source

Clone and install directly from GitHub:

git clone https://github.com/Knowledgator/GLiClass
cd GLiClass

python -m venv venv
source venv/bin/activate  # Windows: venv\Scripts\activate

pip install -r requirements.txt
pip install .

Verify your installation:

import gliclass
print(gliclass.__version__)

🧑‍💻 Usage Example

from gliclass import GLiClassModel, ZeroShotClassificationPipeline
from transformers import AutoTokenizer

model = GLiClassModel.from_pretrained("knowledgator/gliclass-small-v1.0")
tokenizer = AutoTokenizer.from_pretrained("knowledgator/gliclass-small-v1.0")

pipeline = ZeroShotClassificationPipeline(
    model, tokenizer, classification_type='multi-label', device='cuda:0'
)

text = "One day I will see the world!"
labels = ["travel", "dreams", "sport", "science", "politics"]
results = pipeline(text, labels, threshold=0.5)[0]

for result in results:
    print(f"{result['label']} => {result['score']:.3f}")

🌟 Retrieval-Augmented Classification (RAC)

With new models trained with retrieval-agumented classification, such as this model you can specify examples to improve classification accuracy:

example = {
    "text": "A new machine learning platform automates complex data workflows but faces integration issues.",
    "all_labels": ["AI", "automation", "data_analysis", "usability", "integration"],
    "true_labels": ["AI", "integration", "automation"]
}

text = "The new AI-powered tool streamlines data analysis but has limited integration capabilities."
labels = ["AI", "automation", "data_analysis", "usability", "integration"]

results = pipeline(text, labels, threshold=0.1, rac_examples=[example])[0]

for predict in results:
    print(f"{predict['label']} => {predict['score']:.3f}")

🎯 Key Use Cases

  • Sentiment Analysis: Rapidly classify texts as positive, negative, or neutral.
  • Document Classification: Efficiently organize and categorize large document collections.
  • Search Results Re-ranking: Improve relevance and precision by reranking search outputs.
  • News Categorization: Automatically tag and organize news articles into predefined categories.
  • Fact Checking: Quickly validate and categorize statements based on factual accuracy.

🛠️ How to Train

Prepare your training data as follows:

[
  {"text": "Sample text.", "all_labels": ["sports", "science", "business"], "true_labels": ["sports"]},
  ...
]

Optionally, specify confidence scores explicitly:

[
  {"text": "Sample text.", "all_labels": ["sports", "science"], "true_labels": {"sports": 0.9}},
  ...
]

Please, refer to the train.py script to set up your training from scratch or fine-tune existing models.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gliclass-0.1.12.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gliclass-0.1.12-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file gliclass-0.1.12.tar.gz.

File metadata

  • Download URL: gliclass-0.1.12.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for gliclass-0.1.12.tar.gz
Algorithm Hash digest
SHA256 0d579eec15cf9204bef4b5bf3d94d81fd679f54a34ae71d951eada681da626fa
MD5 715711120131525fb40a589b1b597ff8
BLAKE2b-256 f189ef49696e563c0aedc75c6dffe28e986dfec2e925070ff99366cfe90c85b0

See more details on using hashes here.

File details

Details for the file gliclass-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: gliclass-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for gliclass-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 0e32dcb5b363ab932d971216d6c8ec55357b833e50d79e05f0292c5f3195162f
MD5 7c94d05408e403889ad65f2d5162f316
BLAKE2b-256 a8cefcc556ad7d4e113e3b02a8f63527b8dd02487ad23eb4128fdb0317747d07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page