Tools for evaluating large language models.
Project description
[!WARNING] This project is a work in progress. Critical components may be missing, inoperative or incomplete, and the API can undergo major changes without any notice. Please check back later for a more stable version.
EvalSense: LLM Evaluation
About
This repository holds a Python package enabling systematic evaluation of large language models (LLMs) on open-ended generation tasks, with a particular focus on healthcare and summarisation. It also includes supplementary documentation and assets related to the NHS England project on LLM evaluation, such as the code for an interactive LLM evaluation guide (located in the guide/ directory). You can find more information about the project in the original project proposal.
Note: Only public or fake data are shared in this repository.
Project Stucture
- The main code for the EvalSense Python package can be found under
evalsense/. - The accompanying documentation is available in the
docs/folder. - Code for the interactive LLM evaluation guide is located under
guide/. - Jupyter notebooks with the evaluation experiments and examples are located under
notebooks/.
Getting Started
Installation for Development
To install the project for local development, you can follow the steps below:
To clone the repo:
git clone git@github.com:nhsengland/evalsense.git
To setup the Python environment for the project:
- Install uv if it's not installed already
uv sync --all-extrassource .venv/bin/activatepre-commit install
To setup the Node environment for the LLM evaluation guide (located under guide/):
- Install node if it's not installed already
npm installin theguide/directorynpm run startto run the development server
Usage
For an example illustrating the usage of EvalSense, please check the Demo notebook under the notebooks/ folder.
Contributing
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/amazing-feature) - Commit your Changes (
git commit -m 'Add some amazing feature') - Push to the Branch (
git push origin feature/amazing-feature) - Open a Pull Request
See CONTRIBUTING.md for detailed guidance.
License
Unless stated otherwise, the codebase is released under the MIT Licence. This covers both the codebase and any sample code in the documentation.
See LICENSE for more information.
The documentation is © Crown copyright and available under the terms of the Open Government 3.0 licence.
Contact
To find out more about the NHS England Data Science visit our project website or get in touch at datascience@nhs.net.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file evalsense-0.1.2.tar.gz.
File metadata
- Download URL: evalsense-0.1.2.tar.gz
- Upload date:
- Size: 40.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
12a889374cbd55806661666679b89831d2a0bcd7f97c52eb2c10c75657e1ac9f
|
|
| MD5 |
30f7ebc552b2d3b5eb37641419a5143e
|
|
| BLAKE2b-256 |
4fb090f42160fe889842883eaf09c2098c0c3edff86b6f77e0fb83769185f4be
|
File details
Details for the file evalsense-0.1.2-py3-none-any.whl.
File metadata
- Download URL: evalsense-0.1.2-py3-none-any.whl
- Upload date:
- Size: 54.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.7.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6621776b2d5260b59a5fa83d8cd9f95be4808f6deb8c7a0559cc3dc017fc8c70
|
|
| MD5 |
2e924245f2309c01631e0696e7771161
|
|
| BLAKE2b-256 |
319a5ccb5841cc5f86933d6d2cbea28f8d5821c57810122db785916f35315cd5
|