No project description provided

These details have not been verified by PyPI

Project links

Project description

hep-data-llm

Installation
License

Introduction

This repo contains the code used to translate english queries for plots into the actual plots using LLM's and python packages and tools like ServiceX, Awkward, Vector, and hist. This is a proof-of-concept and not meant to be production level code.

Benchmark studies with the 8 adl-index as presented in conferences can be found in the results directory for various workflows.

Use --help at the root level and on commands (e.g. hep-data-llm plot --help) to get a complete list of options.

Installation

To run out of the box you'll need to do the following once:

Prerequisites:

You'll need to have docker installed on your machine, and if you are on ARM, the multi-arch extensions (the source images are amd64 only):
- docker buildx create --name multiarch --driver docker-container --use
- docker run --privileged --rm tonistiigi/binfmt --install amd64
Build the docker image to run the workflow. Which docker image is used depends on what workflow you are using.
- ServiceX/Awkward: docker build -t hepdatallm-awkward:latest Docker
- ServiceX/RDF: docker build -t hepdatallm-rdf:latest -f Docker/Dockerfile.RDF . that is used to run servicex, awkward, and friends:
If you are running a servicex workflow, get an access token. Make sure the servicex.yaml file is either in your home directory or your current working directory.
You'll need token(s) to access the LLM. Here is what the .env looks like. Please create this either in your local directory or your home directory. Make sure only you can read it: this is access to a paid service!

api_openai_com_API_KEY=<openai-key>
api_together_xyz_API_KEY=<together.ai key>
openrouter_ai_API_KEY=<openrouter-key>

Running in a local python environment

pip install hep-data-llm
hep-data-llm plot "Plot the ETmiss of all events in the rucio dataset mc23_13p6TeV:mc23_13p6TeV.801167.Py8EG_A14NNPDF23LO_jj_JZ2.deriv.DAOD_PHYSLITE.e8514_e8528_a911_s4114_r15224_r15225_p6697." output.md

The output will be in output.md - view in a markdown rendering problem (I use vscode). A img directory will be created and it will contain the plot (hopefully).

Use hep-data-llm plot --help to see all the options you can give it. It defaults to using gpt-5, the most successful model in tests.

Default questions

A questions.yaml file is bundled with the package containing a list of common plotting questions. To run one of these questions by number, pass the index (starting from 1) instead of the full text:

hep-data-llm plot 1 output.md

This will execute the first question from questions.yaml.

Question references and metrics

Some questions include reference metrics for each expected plot to help validate the generated output. References live alongside the question text in questions.yaml and consist of per-plot average entries per event and mean values derived from the raw data list used to fill the histogram. For example:

questions:
  - text: "Plot the ETmiss of all events in the rucio dataset user.zmarshal:user.zmarshal.364702_OpenData_v1_p6026_2024-04-23."
      references:
        plots:
          - avg_entries_per_event: 1.0
            mean: 38.5

When a question includes references, the generated plotting code is expected to print lines like METRIC: avg_entries_per_event=<N> mean=<M> for each plot, computed directly from the numbers passed into the histogram. The CLI will compare these against the reference pairs to determine success.

Running with `uvx`

This is great if you want to just run once or twice.

uvx hep-data-llm plot "Plot the ETmiss of all events in the rucio dataset mc23_13p6TeV:mc23_13p6TeV.801167.Py8EG_A14NNPDF23LO_jj_JZ2.deriv.DAOD_PHYSLITE.e8514_e8528_a911_s4114_r15224_r15225_p6697." output.md

This uses the uvx tool to install a temporary environment. If you want to keep this around to use, you can use uv tool install hep-data-llm. Do remember to update it every now and then!

Usage

new profile

Use the new profile <filename> to create a new profile. It copies the default profile, and you can then modify it and update it with new prompt or other items.

Creating a new workflow

Otherwise known as creating a new prompt, this is about creating a new prompt file and hint files and what it takes in the context of this package.

General preparation - you'll need a docker container with the appropriate software installed. You'll also need a good set of test instructions.
Use the hep-data-llm new profile my-prompt.yaml command to create a "dummy" prompt file.
Edit the new profile yaml file:
- If you are editing new hint files, then replace the list of hint files with a local (relative) reference to the hint files you want to use
- Choose a fairly cheap model to run (since you'll probably be running it a lot). Change the model: entry (or you can use the --model option).
When you are ready to test, use hep-data-llm plot --profile my-prompt --ignore-cache hints <question> output.md. Replace <question> with your question or a question number from the default list of questions.
- Note the ignore-cache - the code always caches the hints files, even if they are located on the local disk.
- Use --repeat N to record multiple, independent runs for each model. Every trial bypasses the LLM cache automatically so you get fresh outputs for each repetition without reusing earlier responses.

Notes from adding a servicex-RDF workflow:

The guardrail that looked for the png file to be written out had to be altered
Hint files that described servicex assumed awkward output - it had to be split in two so that there was a short hint file that described how to generate a servicex request and a second one that described how to take the results and turn them into a awkward data. The same thing then had to occur for rdf.
A new docker container had to be built, in this case based on the ROOT container image.

License

hep-data-llm is distributed under the terms of the MIT license.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.1a1 pre-release

May 21, 2026

2.0.0

May 20, 2026

This version

2.0.0a2 pre-release

May 20, 2026

2.0.0a1 pre-release

May 20, 2026

1.1.1

Nov 27, 2025

1.1.0

Nov 27, 2025

1.1.0b1 pre-release

Sep 16, 2025

1.1.0a1 pre-release

Sep 13, 2025

1.0.0

Sep 12, 2025

1.0.0b3 pre-release

Sep 12, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hep_data_llm-2.0.0a2.tar.gz (37.2 MB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

hep_data_llm-2.0.0a2-py3-none-any.whl (45.2 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file hep_data_llm-2.0.0a2.tar.gz.

File metadata

Download URL: hep_data_llm-2.0.0a2.tar.gz
Upload date: May 20, 2026
Size: 37.2 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hep_data_llm-2.0.0a2.tar.gz
Algorithm	Hash digest
SHA256	`f4b1f9167aa326e20ec0222d8a11e78263382457e659f2a696f2854f60490d4f`
MD5	`59c6b54eec43eb322e98a920c6cb1351`
BLAKE2b-256	`2fd6a62986e712a94a25471f0e18b3f9b88cb4d08b8b8c0e6249f90f3559fe5d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hep_data_llm-2.0.0a2.tar.gz:

Publisher: publish.yaml on gordonwatts/hep-data-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hep_data_llm-2.0.0a2.tar.gz
- Subject digest: f4b1f9167aa326e20ec0222d8a11e78263382457e659f2a696f2854f60490d4f
- Sigstore transparency entry: 1576199022
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: gordonwatts/hep-data-llm@0c91da2160d48f15ecd1d098290996562f4a9973
- Branch / Tag: refs/tags/2.0.0a2
- Owner: https://github.com/gordonwatts
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@0c91da2160d48f15ecd1d098290996562f4a9973
- Trigger Event: push

File details

Details for the file hep_data_llm-2.0.0a2-py3-none-any.whl.

File metadata

Download URL: hep_data_llm-2.0.0a2-py3-none-any.whl
Upload date: May 20, 2026
Size: 45.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for hep_data_llm-2.0.0a2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e70d520649625256f00d9aeafcf95575e0f3288df708012f5741a272d7312fb8`
MD5	`957fcaf95bed34a3e4768720ea33f408`
BLAKE2b-256	`96dbadbb42de7fa426ea85a8e2c06750f3fbe063ecfb16e1ad66d49c05dca980`

See more details on using hashes here.

Provenance

The following attestation bundles were made for hep_data_llm-2.0.0a2-py3-none-any.whl:

Publisher: publish.yaml on gordonwatts/hep-data-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: hep_data_llm-2.0.0a2-py3-none-any.whl
- Subject digest: e70d520649625256f00d9aeafcf95575e0f3288df708012f5741a272d7312fb8
- Sigstore transparency entry: 1576199025
- Sigstore integration time: May 20, 2026
Source repository:
- Permalink: gordonwatts/hep-data-llm@0c91da2160d48f15ecd1d098290996562f4a9973
- Branch / Tag: refs/tags/2.0.0a2
- Owner: https://github.com/gordonwatts
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@0c91da2160d48f15ecd1d098290996562f4a9973
- Trigger Event: push

hep-data-llm 2.0.0a2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hep-data-llm

Table of Contents

Introduction

Installation

Running in a local python environment

Default questions

Question references and metrics

Running with `uvx`

Usage

new profile

Creating a new workflow

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

hep-data-llm 2.0.0a2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

hep-data-llm

Table of Contents

Introduction

Installation

Running in a local python environment

Default questions

Question references and metrics

Running with uvx

Usage

new profile

Creating a new workflow

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Running with `uvx`