Semantic Evaluation Layer for LLMs — classify and evaluate sentence-level understanding.
Project description
skeval
Semantic Evaluation Layer for LLMs
skeval is a lightweight library designed to evaluate how well Large Language Models (LLMs) understand and generate different types of sentences—such as facts, emotions, opinions, and instructions.
🚀 Motivation
Most LLM evaluation focuses on:
- Accuracy
- BLEU / ROUGE scores
- Reasoning benchmarks
But real-world language understanding also requires:
- Distinguishing facts from opinions
- Detecting emotions
- Identifying intent and instruction
skeval fills this gap by providing a semantic classification and evaluation layer.
🧠 What It Does
-
Classifies sentences into categories:
- Fact
- Emotion
- Opinion
- Instruction
- (extendable)
-
Evaluates LLM outputs based on:
- Classification accuracy
- Confusion between categories
- Per-class metrics
-
Works with:
- LLM outputs
- Custom datasets
- Benchmark pipelines
📦 Features
- Modular architecture (classifier, evaluator, metrics)
- Custom evaluation metrics for semantic types
- Compatible with LLM pipelines
- Extensible label taxonomy
- Clean CLI support (planned)
🏗️ Project Structure
skeval/
│
├── src/skeval/
│ ├── classifier/
│ ├── evaluator/
│ ├── metrics/
│ └── dataset/
│
├── data/
│ ├── raw/
│ └── processed/
│
├── tests/
├── scripts/
├── docs/
└── notebooks/
⚙️ Installation
git clone https://github.com/direkkakkar319-ops/Sentinel.AI.git
cd Sentinel.AI
pip install -e .
🧪 Example Usage
from skeval.classifier import SentenceClassifier
from skeval.evaluator import Evaluator
sentences = [
"Water boils at 100 degrees Celsius",
"I feel sad today",
"I think this movie is amazing",
"Please close the door",
]
labels = ["fact", "emotion", "opinion", "instruction"]
classifier = SentenceClassifier(embed_dim=64)
classifier.train(sentences, labels, epochs=20)
predictions = classifier.predict([
"The sky is blue",
"I am so happy",
"I believe dogs are better than cats",
"Turn off the lights",
])
evaluator = Evaluator()
results = evaluator.evaluate(predictions, ["fact", "emotion", "opinion", "instruction"])
print(results)
📊 Example Output
{
"accuracy": 0.75,
"per_class": {"fact": {"precision": 1.0, "recall": 1.0, "f1-score": 1.0, ...}, ...},
"macro_avg": {"precision": ..., "recall": ..., "f1-score": ...},
"weighted_avg": {"precision": ..., "recall": ..., "f1-score": ...},
"confusion_matrix": [[...], ...],
"labels": ["emotion", "fact", "instruction", "opinion"]
}
📚 Documentation
Full documentation (Sphinx-based) is available in the docs/ directory.
To build locally:
cd docs
make html
🧠 Future Roadmap
- Multi-label classification (mixed sentences)
- Sarcasm detection
- Benchmark dataset release
- Integration with LLM evaluation tools
- CLI interface
🤝 Contributing
Contributions are welcome!
Please read CONTRIBUTING.md before submitting a PR.
📄 License
This project is licensed under the MIT License.
⚠️ Disclaimer
This project is for research and educational purposes. It does not guarantee perfect semantic understanding and should not be used for critical decision-making systems without validation.
⭐ Acknowledgments
Inspired by the need for better semantic evaluation in modern LLM systems.
🔥 Tagline
“Not just what the model says—but what it means.”
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file skeval-0.1.1.tar.gz.
File metadata
- Download URL: skeval-0.1.1.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fec70c1ce86736ec347b4ab695e0b429659f24e845827d710b1e7dfe64941441
|
|
| MD5 |
c052cc2df0b1d472359b36d63aa0c2c5
|
|
| BLAKE2b-256 |
843c171ef9ae3d60221f6db065f080ba0cbb2585ccbc2df71c31b2d4f453cb31
|
File details
Details for the file skeval-0.1.1-py3-none-any.whl.
File metadata
- Download URL: skeval-0.1.1-py3-none-any.whl
- Upload date:
- Size: 12.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d0d0bf7ab55c6bdca453feb3f491a4bd6d386fb74c7333e6f19d823e6625a09b
|
|
| MD5 |
64cd8d196ea2bb702e0e3f353f36c22d
|
|
| BLAKE2b-256 |
fd233c73cf7d47fc65fc0fb9287be4d5d82572b9f44bec0c5d4804eba9567c1a
|