YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering.
Project description
📋 What is the YESciEval?
Large Language Models (LLMs) drive scientific question-answering on modern search engines, yet their evaluation robustness remains underexplored. We introduce YESciEval, an open-source framework that combines fine-grained rubric-based assessment with reinforcement learning to mitigate optimism bias in LLM evaluators. The framework is presented as f ollows:
We release multidisciplinary scienceQ&A datasets, including adversarial variants, with evaluation scores from multiple LLMs. Independent of proprietary models and human feedback, our approach enables scalable, cost-free evaluation. By advancing reliable LLM-as-a-judge models, this work supports AI alignment and fosters robust, transparent evaluation essential for scientific inquiry and artificial general intelligence.
📃 License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yescieval-0.1.0.tar.gz.
File metadata
- Download URL: yescieval-0.1.0.tar.gz
- Upload date:
- Size: 76.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.10.17 Linux/6.11.0-1015-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
32a9ae161bf8d2b193da1c87140d87bbc66bd3e54285d2df06374d42ada7ef96
|
|
| MD5 |
e19d491cdd91db842939dd8b56ec062f
|
|
| BLAKE2b-256 |
feb416a0b47e062bad2bd90f19c328c9092e384d8181761130faef6f1360e2b2
|
File details
Details for the file yescieval-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yescieval-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.10.17 Linux/6.11.0-1015-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67d189ca1520d78331e1888361b49465784e537e15f41c2a26c9d2a57c7785b1
|
|
| MD5 |
39ed3f89fddcf95dbc21872157dea6d6
|
|
| BLAKE2b-256 |
d53c30f621c42c1c9f72d247f0e30dc2ca8810d2c606a1a966862ac3ebce4b59
|