Tools for evaluating large language models.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

[!NOTE] This project is under development. The API may undergo major changes between versions, so we recommend checking the CHANGELOG for any breaking changes before upgrading.

EvalSense: LLM Evaluation

About

EvalSense is a framework for systematic evaluation of large language models (LLMs) on open-ended generation tasks, with a particular focus on bespoke, domain-specific evaluations. Some of its key features include:

Broad model support. Out-of-the-box compatibility with a wide range of local and API-based model providers, including Ollama, Hugging Face, vLLM, OpenAI, Anthropic and others.
Evaluation guidance. An interactive evaluation guide and automated meta-evaluation tools assist in selecting the most appropriate evaluation methods for a specific use-case, including the use of perturbed data to assess method effectiveness.
Interactive UI. A web-based interface enables rapid experimentation with different evaluation workflows without requiring any code.
Advanced evaluation methods. EvalSense incorporates recent LLM-as-a-Judge and hybrid evaluation approaches, such as G-Eval and QAGS, while also supporting more traditional metrics like BERTScore and ROUGE.
Efficient execution. Intelligent experiment scheduling and resource management minimise computational overhead for local models. For remote APIs, EvalSense uses asynchronous parallel calls to maximise throughput.
Modularity and extensibility. Key components and evaluation methods can be used independently or replaced with user-defined implementations.
Comprehensive logging. All key aspects of evaluation are recorded in machine-readable logs, including model parameters, prompts, model outputs, evaluation results, and other metadata.

More information about EvalSense can be found on its homepage and in its documentation.

Note: Only public or fake data are shared in this repository.

Project Stucture

The main code for the EvalSense Python package can be found under evalsense/.
The accompanying documentation is available in the docs/ folder.
Code for the interactive LLM evaluation guide is located under guide/.
Jupyter notebooks with the evaluation experiments and examples are located under notebooks/.

Getting Started

Installation

You can install the project using pip by running the following command:

pip install evalsense

This will install the latest released version of the package from PyPI without any optional dependencies.

Depending on your use-case, you may want to install additional dependencies from the following groups:

webui: For using the interactive web UI.
jupyter: For running experiments in Jupyter notebooks (only needed if you don't already have the necessary libraries installed).
transformers: For using models and metrics requiring the Hugging Face Transformers library.
vllm: For using models and metrics requiring vLLM.
interactive: For using EvalSense with interactive UI features (currently includes webui and jupyter).
local: For installing all local model dependencies (currently includes transformers and vllm).
all: For installing all optional dependencies.

For example, if you want to install EvalSense with all optional dependencies, you can run:

pip install "evalsense[all]"

If you want to use EvalSense with the interactive features (interactive) and Hugging Face Transformers (transformers), you can run:

pip install "evalsense[interactive,transformers]"

and similarly for other combinations.

Installation for Development

To install the project for local development, you can follow the steps below:

To clone the repo:

git clone git@github.com:nhsengland/evalsense.git

To setup the Python environment for the project:

Install uv if it's not installed already
uv sync --all-extras
source .venv/bin/activate
pre-commit install

Note that the code is formatted with ruff and type-checked by pyright in standard type checking mode. For the best development experience, we recommend enabling the corresponding extensions in your preferred code editor.

To setup the Node environment for the LLM evaluation guide (located under guide/):

Install node if it's not installed already
Change to the guide/ directory (cd guide)
npm install
npm run start to run the development server

See also the separate README.md for the guide.

Programmatic Usage

For examples illustrating the usage of EvalSense, please check the notebooks under the notebooks/ folder:

The Demo notebook illustrates a basic application of EvalSense to the ACI-Bench dataset.
The Experiments notebook illustrates more thorough experiments on the same dataset, involving a larger number of evaluators and models.
The Meta-Evaluation notebook focuses on meta-evaluation on synthetically perturbed data, where the goal is to identify the most reliable evaluation methods rather than the best-performing models.

Web-Based UI

To use the interactive web-based UI implemented in EvalSense, simply run

evalsense webui

after installing the package and its dependencies. Note that you need to install EvalSense with the webui extra (pip install "evalsense[webui]") or an extra that includes it before running this command.

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

Fork the Project
Create your Feature Branch (git checkout -b amazing-feature)
Commit your Changes (git commit -m 'Add some amazing feature')
Push to the Branch (git push origin amazing-feature)
Open a Pull Request

See CONTRIBUTING.md for detailed guidance.

License

Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.

See LICENSE for more information.

The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.

Contact

This project is currently maintained by @adamdejl. If you have any questions, suggestions for new features or want to report a bug, please open an issue. For security concerns, please file a private vulnerability report.

To find out more about the NHS England Data Science visit our project website or get in touch at datascience@nhs.net.

Acknowledgements

We thank the Inspect AI development team for their work on the Inspect AI library, which serves as a basis for EvalSense.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

adamdejl

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.6

Apr 12, 2026

This version

0.1.5

Sep 17, 2025

0.1.4

Sep 17, 2025

0.1.3

May 19, 2025

0.1.2

May 10, 2025

0.1.1

May 9, 2025

0.1.0

May 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

evalsense-0.1.5.tar.gz (57.2 kB view details)

Uploaded Sep 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

evalsense-0.1.5-py3-none-any.whl (82.4 kB view details)

Uploaded Sep 17, 2025 Python 3

File details

Details for the file evalsense-0.1.5.tar.gz.

File metadata

Download URL: evalsense-0.1.5.tar.gz
Upload date: Sep 17, 2025
Size: 57.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.7.3

File hashes

Hashes for evalsense-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`056967452b37f372ba7f1591b16a6f060908fa3f0b5513738bffd633430344be`
MD5	`c2fafb8d63bb21ec82c73b4cd0146418`
BLAKE2b-256	`c8b57942983df035dd84e4bc6cb889b476090f7b3690293720c3449fbb5a5622`

See more details on using hashes here.

File details

Details for the file evalsense-0.1.5-py3-none-any.whl.

File metadata

Download URL: evalsense-0.1.5-py3-none-any.whl
Upload date: Sep 17, 2025
Size: 82.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: uv/0.7.3

File hashes

Hashes for evalsense-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`194fc4c61381c6dc80fa579f1d97c300013a6e6e5f0a34628136f6fb8c669938`
MD5	`5cae3d5d2bd74a2db0aa4d7c8dfdf102`
BLAKE2b-256	`0de3e5b64e23e8314d4e5152777a10424c8e5802eb2c8f0fcedba83bc8fd0ef1`

See more details on using hashes here.

evalsense 0.1.5

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

EvalSense: LLM Evaluation

About

Project Stucture

Getting Started

Installation

Installation for Development

Programmatic Usage

Web-Based UI

Contributing

License

Contact

Acknowledgements

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes