A zero-shot classification engine based on various LLM models
Project description
zeroshot-engine
A open-source scientific zero-shot text classification engine based on various LLM models.
📖 About this package
Description
This project provides a flexible framework for performing zero-shot classification using large language models and pandas. It allows you to classify text into categories without requiring explicit training data for those categories. All instructions to LLMs are provided by mere natural language prompts. The framework is designed to support a wide range of text classification tasks including multi-label, multi-class, and single-class classification scenarios.
Purpose
This package was developed as part of an academic research project to systematically classify political communication. The primary goal was to create an easy-to-use and accessible framework for building adaptable zero-shot classifications with large language models (LLMs) across a wide variety of text analysis tasks. By providing a flexible and intuitive tool, this project aims to empower students and researchers — especially those in social sciences — to explore, evaluate, and harness the potential of zero-shot classification while addressing its challenges in a user-friendly environment. I have no financial interest in this project.
Open-Source and Non-Commercial
This project is fully open-source and was developed with no financial interests. It is intended to support academic research and the broader scientific community. Contributions are welcome to help improve the framework and expand its capabilities.
✨ Features
Overview
- Handles multi-label, multi-class, and single-class classification tasks.
- Option for incorporating few-shot learning through the flexible prompt engineering approach.
- Supports multiple LLM models (e.g., OpenAI, Ollama).
- Easy-to-use command-line interface for demo purposes.
- Customizable prompts.
- Integration with pandas for data handling.
Key Concepts
- Zero-Shot Learning: The ability of a model to make predictions on unseen classes or tasks without prior training on those specific classes or tasks. The system learns entirely through natural language instructions, eliminating the need for labeled examples or fine-tuning.
- Sequential Classification: A process where tasks are performed in a series of steps without strict dependencies (IDZSC approach).
- Hierarchical Classification: A structured approach that breaks down complex classification tasks into a series of simpler decisions following a predefined hierarchy with explicit dependencies (HDZSC approach).
- Multi-Prompting: The use of multiple different prompts for different tasks to elicit more comprehensive and reliable predictions from the model.
- Modular Prompt Design: While not automated in the current implementation, the modular prompt design with text blocks facilitates manual testing and refinement of prompts to improve classification accuracy.
Core Modules
- Iterative Double Validated Zero-Shot Classification (IDZSC): IDZSC is the core module to classify texts in an iterative process. It can use a double validation technique to ensure the robustness and accuracy of the classifications.
- Hierarchical Double Validated Zero-Shot Classification (HDZSC): HDZSC extends the zero-shot classification capabilities to hierarchical category structures. It leverages a double validation approach to maintain accuracy while navigating the complexities of hierarchical classification.
🚀 Get Started
How to install
Install the zeroshot-engine package using pip in your Windows Powershell or Linux / Mac Bash Terminal.
pip install zeroshot-engine
Interactive Demo in the Command Line
Test the zeroshot-engine in the HDZSC-scenario by selecting from a wide variety of LLMs and bringing your own text for classification:
zeroshot-engine demo
This command will guide you through an interactive demo where you can:
- Choose an LLM model (e.g., one from OpenAI or Ollama).
- Provide your own text for classification or use a provided example text.
- Observe how the hierarchical classification process works in real-time.
Run your first Zeroshot Classification Project in Python
This tutorial provides example code for your first test project, which you can use as a template to build and adapt your own research projects. For more detailed information and advanced usage, please refer to the documentation.
📚 Documentation
For more detailed information about the framework and its implementation, please refer to the following documentation:
-
Overview of IDZSC and HDZSC - A comprehensive explanation of the Iterative and Hierarchical Double Zero-Shot Classification approaches, including detailed examples and usage patterns.
-
Performance Evaluation - Benchmark results and performance metrics across different models and classification tasks.
-
In-Depth-Demo-Explanation - Learn how the HDZSC works in detail.
-
Tutorial: Get started with your first classification - Create your first projects with prompt, code examples and text to learn how to set up the classifer.
Example Flow Chart
==============================================================
ZEROSHOTENGINE DEMO LABEL DEPENDENCY FLOWCHART
==============================================================
[POLITICAL]
├─ if political = 1:
│ [PRESENTATION]
│ [ATTACK]
│ ├─ if attack = 1:
│ │ [TARGET]
│ │ │
│ │ ▼
│ │ STOP
│ └─ if attack = 0:
│ → Skip: target
│ STOP
└─ if political = 0:
→ Skip: presentation, attack, target
STOP
--------------------------------------------------------------
STOP CONDITIONS EXPLANATION
--------------------------------------------------------------
If political = 0 (absent), the following steps are skipped:
- presentation
- attack
- target
If attack = 0 (absent), the following steps are skipped:
- target
--------------------------------------------------------------
LEGEND
--------------------------------------------------------------
- 1 (present): Proceeds to the next classification step
- 0 (absent): Skips one or more subsequent classifications
LABEL CODES
present: 1
absent: 0
non-coded: 8
empty-list: []
--------------------------------------------------------------
📅 Planned Features
- List of supported LLMs
- Additional tutorial for double validation vs. zero-temp approach.
- Documentation of all relevant functions.
- Create prompting guidelines.
- Better integration and testing of validation metrics.
- Automated Logging System
- Add contribution guidelines.
- Support for more LLMs and APIs.
🚧 Notice: Under Development
Note:
While the core functionality ofzeroshot-engineis already up and running, this project is still under active development. There may be bugs, incomplete features, or areas for improvement.If you encounter any issues, have feature requests, or would like to contribute code to the project, please feel free to:
- Open an issue on the GitHub repository.
- Submit a pull request with your contributions.
- Contact the author directly at luc.schwarz@posteo.de.
Contributions are highly appreciated and will help improve the framework for the scientific community!
📜 License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
🫱🏼🫲🏼 Contributing
Contributions are welcome! Feel free to open issues for bug reports or feature requests. If you'd like to contribute code directly, please see the contributing guidelines.
🤵 Author
Lucas Schwarz
📧 Contact
🏛️ Citation
If you use zeroshot-engine in your research, please cite it as follows:
Schwarz, L. (2025). zeroshot-engine: A scientific zero-shot text classification engine based on various LLM models (Version 0.1.3) [Computer software]. https://doi.org/10.5281/zenodo.15079109
@software{Schwarz_zeroshot-engine_A_scientific_2025,
author = {Schwarz, Lucas},
doi = {10.5281/zenodo.15079109},
month = mar,
title = {{zeroshot-engine: A scientific zero-shot text classification engine based on various LLM models}},
url = {https://github.com/TheLucasSchwarz/zeroshotENGINE},
version = {0.1.3},
year = {2025}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file zeroshot_engine-0.1.3.tar.gz.
File metadata
- Download URL: zeroshot_engine-0.1.3.tar.gz
- Upload date:
- Size: 50.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d52f4d6196df6b340be7d243fdff6aa4b4ac22075c7d509e9915b87e573eaff
|
|
| MD5 |
ea435c657e50b4a9f69560f707372382
|
|
| BLAKE2b-256 |
3b83840f3c1687069e09561448dfdb6b2f5e30e789448043dd89385d12c5b878
|
File details
Details for the file zeroshot_engine-0.1.3-py3-none-any.whl.
File metadata
- Download URL: zeroshot_engine-0.1.3-py3-none-any.whl
- Upload date:
- Size: 50.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d13006e727a40feebb0982703ccc78082d7d2a76a280dc01bf752c9ed541160
|
|
| MD5 |
0247391abec7cf7bd1e62901b45d7fa5
|
|
| BLAKE2b-256 |
a44aec5343cf160516d61674ef7b7c5817156e7de258ff78e90bbfc381a3693f
|