Skip to main content

YESciEval: Robust LLM-as-a-Judge for Scientific Question Answering.

Project description

pre-commit Code style: black Imports: isort License: MIT

📋 What is the YESciEval?

Large Language Models (LLMs) drive scientific question-answering on modern search engines, yet their evaluation robustness remains underexplored. We introduce YESciEval, an open-source framework that combines fine-grained rubric-based assessment with reinforcement learning to mitigate optimism bias in LLM evaluators. The framework is presented as f ollows:

We release multidisciplinary scienceQ&A datasets, including adversarial variants, with evaluation scores from multiple LLMs. Independent of proprietary models and human feedback, our approach enables scalable, cost-free evaluation. By advancing reliable LLM-as-a-judge models, this work supports AI alignment and fosters robust, transparent evaluation essential for scientific inquiry and artificial general intelligence.

📃 License

This work is licensed under a License: MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yescieval-0.1.0.tar.gz (76.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

yescieval-0.1.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file yescieval-0.1.0.tar.gz.

File metadata

  • Download URL: yescieval-0.1.0.tar.gz
  • Upload date:
  • Size: 76.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.10.17 Linux/6.11.0-1015-azure

File hashes

Hashes for yescieval-0.1.0.tar.gz
Algorithm Hash digest
SHA256 32a9ae161bf8d2b193da1c87140d87bbc66bd3e54285d2df06374d42ada7ef96
MD5 e19d491cdd91db842939dd8b56ec062f
BLAKE2b-256 feb416a0b47e062bad2bd90f19c328c9092e384d8181761130faef6f1360e2b2

See more details on using hashes here.

File details

Details for the file yescieval-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: yescieval-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.10.17 Linux/6.11.0-1015-azure

File hashes

Hashes for yescieval-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 67d189ca1520d78331e1888361b49465784e537e15f41c2a26c9d2a57c7785b1
MD5 39ed3f89fddcf95dbc21872157dea6d6
BLAKE2b-256 d53c30f621c42c1c9f72d247f0e30dc2ca8810d2c606a1a966862ac3ebce4b59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page