Skip to main content

Semantic Evaluation Layer for LLMs — classify and evaluate sentence-level understanding.

Project description

skeval

Semantic Evaluation Layer for LLMs

skeval is a lightweight library designed to evaluate how well Large Language Models (LLMs) understand and generate different types of sentences—such as facts, emotions, opinions, and instructions.


🚀 Motivation

Most LLM evaluation focuses on:

  • Accuracy
  • BLEU / ROUGE scores
  • Reasoning benchmarks

But real-world language understanding also requires:

  • Distinguishing facts from opinions
  • Detecting emotions
  • Identifying intent and instruction

skeval fills this gap by providing a semantic classification and evaluation layer.


🧠 What It Does

  • Classifies sentences into categories:

    • Fact
    • Emotion
    • Opinion
    • Instruction
    • (extendable)
  • Evaluates LLM outputs based on:

    • Classification accuracy
    • Confusion between categories
    • Per-class metrics
  • Works with:

    • LLM outputs
    • Custom datasets
    • Benchmark pipelines

📦 Features

  • Modular architecture (classifier, evaluator, metrics)
  • Custom evaluation metrics for semantic types
  • Compatible with LLM pipelines
  • Extensible label taxonomy
  • Clean CLI support (planned)

🏗️ Project Structure

skeval/
│
├── src/skeval/
│   ├── classifier/
│   ├── evaluator/
│   ├── metrics/
│   └── dataset/
│
├── data/
│   ├── raw/
│   └── processed/
│
├── tests/
├── scripts/
├── docs/
└── notebooks/

⚙️ Installation

git clone https://github.com/direkkakkar319-ops/Sentinel.AI.git
cd Sentinel.AI
pip install -e .

🧪 Example Usage

from skeval.classifier import SentenceClassifier
from skeval.evaluator import Evaluator

sentences = [
    "Water boils at 100 degrees Celsius",
    "I feel sad today",
    "I think this movie is amazing",
    "Please close the door",
]
labels = ["fact", "emotion", "opinion", "instruction"]

classifier = SentenceClassifier(embed_dim=64)
classifier.train(sentences, labels, epochs=20)

predictions = classifier.predict([
    "The sky is blue",
    "I am so happy",
    "I believe dogs are better than cats",
    "Turn off the lights",
])

evaluator = Evaluator()
results = evaluator.evaluate(predictions, ["fact", "emotion", "opinion", "instruction"])
print(results)

📊 Example Output

{
  "accuracy": 0.75,
  "per_class": {"fact": {"precision": 1.0, "recall": 1.0, "f1-score": 1.0, ...}, ...},
  "macro_avg": {"precision": ..., "recall": ..., "f1-score": ...},
  "weighted_avg": {"precision": ..., "recall": ..., "f1-score": ...},
  "confusion_matrix": [[...], ...],
  "labels": ["emotion", "fact", "instruction", "opinion"]
}

📚 Documentation

Full documentation (Sphinx-based) is available in the docs/ directory.

To build locally:

cd docs
make html

🧠 Future Roadmap

  • Multi-label classification (mixed sentences)
  • Sarcasm detection
  • Benchmark dataset release
  • Integration with LLM evaluation tools
  • CLI interface

🤝 Contributing

Contributions are welcome!

Please read CONTRIBUTING.md before submitting a PR.


📄 License

This project is licensed under the MIT License.


⚠️ Disclaimer

This project is for research and educational purposes. It does not guarantee perfect semantic understanding and should not be used for critical decision-making systems without validation.


⭐ Acknowledgments

Inspired by the need for better semantic evaluation in modern LLM systems.


🔥 Tagline

“Not just what the model says—but what it means.”

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skeval-0.1.1.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skeval-0.1.1-py3-none-any.whl (12.4 kB view details)

Uploaded Python 3

File details

Details for the file skeval-0.1.1.tar.gz.

File metadata

  • Download URL: skeval-0.1.1.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for skeval-0.1.1.tar.gz
Algorithm Hash digest
SHA256 fec70c1ce86736ec347b4ab695e0b429659f24e845827d710b1e7dfe64941441
MD5 c052cc2df0b1d472359b36d63aa0c2c5
BLAKE2b-256 843c171ef9ae3d60221f6db065f080ba0cbb2585ccbc2df71c31b2d4f453cb31

See more details on using hashes here.

File details

Details for the file skeval-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: skeval-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for skeval-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d0d0bf7ab55c6bdca453feb3f491a4bd6d386fb74c7333e6f19d823e6625a09b
MD5 64cd8d196ea2bb702e0e3f353f36c22d
BLAKE2b-256 fd233c73cf7d47fc65fc0fb9287be4d5d82572b9f44bec0c5d4804eba9567c1a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page