Skip to main content

No project description provided

Project description

hep-data-llm

PyPI - Version PyPI - Python Version


Table of Contents

Introduction

This repo contains the code used to translate english queries for plots into the actual plots using LLM's and python packages and tools like ServiceX, Awkward, Vector, and hist. This is a proof-of-concept and not meant to be production level code.

Benchmark studies with the 8 adl-index as presented in conferences can be found in the results directory for various workflows.

Use --help at the root level and on commands (e.g. hep-data-llm plot --help) to get a complete list of options.

Installation

To run out of the box you'll need to do the following once:

Prerequisites:

  1. You'll need to have docker installed on your machine
  2. Build the docker image to run the workflow. Which docker image is used depends on what workflow you are using.
    • ServiceX/Awkward: docker build -t hepdatallm-awkward:latest Docker
    • ServiceX/RDF: docker build -t hepdatallm-rdf:latest -f Docker/Dockerfile.RDF . that is used to run servicex, awkward, and friends:
  3. If you are running a servicex workflow, get an access token. Make sure the servicex.yaml file is either in your home directory or your current working directory.
  4. You'll need token(s) to access the LLM. Here is what the .env looks like. Please create this either in your local directory or your home directory. Make sure only you can read it: this is access to a paid service!
api_openai_com_API_KEY=<openai-key>
api_together_xyz_API_KEY=<together.ai key>
openrouter_ai_API_KEY=<openrouter-key>

Running in a local python environment

pip install hep-data-llm
hep-data-llm plot "Plot the ETmiss of all events in the rucio dataset mc23_13p6TeV:mc23_13p6TeV.801167.Py8EG_A14NNPDF23LO_jj_JZ2.deriv.DAOD_PHYSLITE.e8514_e8528_a911_s4114_r15224_r15225_p6697." output.md

The output will be in output.md - view in a markdown rendering problem (I use vscode). A img directory will be created and it will contain the plot (hopefully).

Use hep-data-llm plot --help to see all the options you can give it. It defaults to using gpt-5, the most successful model in tests.

Default questions

A questions.yaml file is bundled with the package containing a list of common plotting questions. To run one of these questions by number, pass the index (starting from 1) instead of the full text:

hep-data-llm plot 1 output.md

This will execute the first question from questions.yaml.

Running with uvx

This is great if you want to just run once or twice.

uvx hep-data-llm plot "Plot the ETmiss of all events in the rucio dataset mc23_13p6TeV:mc23_13p6TeV.801167.Py8EG_A14NNPDF23LO_jj_JZ2.deriv.DAOD_PHYSLITE.e8514_e8528_a911_s4114_r15224_r15225_p6697." output.md

This uses the uvx tool to install a temporary environment. If you want to keep this around to use, you can use uv tool install hep-data-llm. Do remember to update it every now and then!

Usage

new profile

Use the new profile <filename> to create a new profile. It copies the default profile, and you can then modify it and update it with new prompt or other items.

Creating a new workflow

Otherwise known as creating a new prompt, this is about creating a new prompt file and hint files and what it takes in the context of this package.

  1. General preparation - you'll need a docker container with the appropriate software installed. You'll also need a good set of test instructions.
  2. Use the hep-data-llm new profile my-prompt.yaml command to create a "dummy" prompt file.
  3. Edit the new profile yaml file:
    • If you are editing new hint files, then replace the list of hint files with a local (relative) reference to the hint files you want to use
    • Choose a fairly cheap model to run (since you'll probably be running it a lot). Change the model: entry (or you can use the --model option).
  4. When you are ready to test, use hep-data-llm plot --profile my-prompt --ignore-cache hints <question> output.md. Replace <question> with your question or a question number from the default list of questions.
    • Note the ignore-cache - the code always caches the hints files, even if they are located on the local disk.
    • Use --repeat N to record multiple, independent runs for each model. Every trial bypasses the LLM cache automatically so you get fresh outputs for each repetition without reusing earlier responses.

Notes from adding a servicex-RDF workflow:

  • The guardrail that looked for the png file to be written out had to be altered
  • Hint files that described servicex assumed awkward output - it had to be split in two so that there was a short hint file that described how to generate a servicex request and a second one that described how to take the results and turn them into a awkward data. The same thing then had to occur for rdf.
  • A new docker container had to be built, in this case based on the ROOT container image.

License

hep-data-llm is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hep_data_llm-1.1.1.tar.gz (26.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hep_data_llm-1.1.1-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file hep_data_llm-1.1.1.tar.gz.

File metadata

  • Download URL: hep_data_llm-1.1.1.tar.gz
  • Upload date:
  • Size: 26.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hep_data_llm-1.1.1.tar.gz
Algorithm Hash digest
SHA256 fa42f897858f02fa17a767894cb801b4405ee6aa41140c965d5bc88c6a7b295b
MD5 25831aa5b821d249106ddc4aca4f29f7
BLAKE2b-256 217930f7b3ec68016d2b13466eddda027619700606baeef567f96c94fc68743e

See more details on using hashes here.

Provenance

The following attestation bundles were made for hep_data_llm-1.1.1.tar.gz:

Publisher: publish.yaml on gordonwatts/hep-data-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hep_data_llm-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: hep_data_llm-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hep_data_llm-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 6920e15051ef5a064339195ce71de9f7b3070d620f0895e5435b6a96ae00dca8
MD5 208406d1054dfd68a052258959a2be03
BLAKE2b-256 b10c39854f101db8bbfd5d197ba40e56801e59d50ba6f47b51b92b127e0b7969

See more details on using hashes here.

Provenance

The following attestation bundles were made for hep_data_llm-1.1.1-py3-none-any.whl:

Publisher: publish.yaml on gordonwatts/hep-data-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page