Skip to main content

No project description provided

Project description

hep-data-llm

PyPI - Version PyPI - Python Version


Table of Contents

Introduction

This repo contains the code used to translate english queries for plots into the actual plots using LLM's and python packages and tools like ServiceX, Awkward, Vector, and hist. This is a proof-of-concept and not meant to be production level code.

Benchmark studies with the 8 adl-index as presented in conferences can be found in the results directory for various workflows.

Use --help at the root level and on commands (e.g. hep-data-llm plot --help) to get a complete list of options.

Installation

To run out of the box you'll need to do the following once:

Prerequisites:

  1. You'll need to have docker installed on your machine
  2. Build the docker image to run the workflow. Which docker image is used depends on what workflow you are using.
    • ServiceX/Awkward: docker build -t hepdatallm-awkward:latest Docker
    • ServiceX/RDF: docker build -t hepdatallm-rdf:latest -f Docker/Dockerfile.RDF . that is used to run servicex, awkward, and friends:
  3. If you are running a servicex workflow, get an access token. Make sure the servicex.yaml file is either in your home directory or your current working directory.
  4. You'll need token(s) to access the LLM. Here is what the .env looks like. Please create this either in your local directory or your home directory. Make sure only you can read it: this is access to a paid service!
api_openai_com_API_KEY=<openai-key>
api_together_xyz_API_KEY=<together.ai key>
openrouter_ai_API_KEY=<openrouter-key>

Running in a local python environment

pip install hep-data-llm
hep-data-llm plot "Plot the ETmiss of all events in the rucio dataset mc23_13p6TeV:mc23_13p6TeV.801167.Py8EG_A14NNPDF23LO_jj_JZ2.deriv.DAOD_PHYSLITE.e8514_e8528_a911_s4114_r15224_r15225_p6697." output.md

The output will be in output.md - view in a markdown rendering problem (I use vscode). A img directory will be created and it will contain the plot (hopefully).

Use hep-data-llm plot --help to see all the options you can give it. It defaults to using gpt-5, the most successful model in tests.

Default questions

A questions.yaml file is bundled with the package containing a list of common plotting questions. To run one of these questions by number, pass the index (starting from 1) instead of the full text:

hep-data-llm plot 1 output.md

This will execute the first question from questions.yaml.

Running with uvx

This is great if you want to just run once or twice.

uvx hep-data-llm plot "Plot the ETmiss of all events in the rucio dataset mc23_13p6TeV:mc23_13p6TeV.801167.Py8EG_A14NNPDF23LO_jj_JZ2.deriv.DAOD_PHYSLITE.e8514_e8528_a911_s4114_r15224_r15225_p6697." output.md

This uses the uvx tool to install a temporary environment. If you want to keep this around to use, you can use uv tool install hep-data-llm. Do remember to update it every now and then!

Usage

new profile

Use the new profile <filename> to create a new profile. It copies the default profile, and you can then modify it and update it with new prompt or other items.

Creating a new workflow

Otherwise known as creating a new prompt, this is about creating a new prompt file and hint files and what it takes in the context of this package.

  1. General preparation - you'll need a docker container with the appropriate software installed. You'll also need a good set of test instructions.
  2. Use the hep-data-llm new profile my-prompt.yaml command to create a "dummy" prompt file.
  3. Edit the new profile yaml file:
    • If you are editing new hint files, then replace the list of hint files with a local (relative) reference to the hint files you want to use
    • Choose a fairly cheap model to run (since you'll probably be running it a lot). Change the model: entry (or you can use the --model option).
  4. When you are ready to test, use hep-data-llm plot --profile my-prompt --ignore-cache hints <question> output.md. Replace <question> with your question or a question number from the default list of questions.
    • Note the ignore-cache - the code always caches the hints files, even if they are located on the local disk.
    • Use --repeat N to record multiple, independent runs for each model. Every trial bypasses the LLM cache automatically so you get fresh outputs for each repetition without reusing earlier responses.

Notes from adding a servicex-RDF workflow:

  • The guardrail that looked for the png file to be written out had to be altered
  • Hint files that described servicex assumed awkward output - it had to be split in two so that there was a short hint file that described how to generate a servicex request and a second one that described how to take the results and turn them into a awkward data. The same thing then had to occur for rdf.
  • A new docker container had to be built, in this case based on the ROOT container image.

License

hep-data-llm is distributed under the terms of the MIT license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hep_data_llm-1.1.0.tar.gz (26.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hep_data_llm-1.1.0-py3-none-any.whl (32.4 kB view details)

Uploaded Python 3

File details

Details for the file hep_data_llm-1.1.0.tar.gz.

File metadata

  • Download URL: hep_data_llm-1.1.0.tar.gz
  • Upload date:
  • Size: 26.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hep_data_llm-1.1.0.tar.gz
Algorithm Hash digest
SHA256 13bb7ae8a876ac476f731c73f23f232472f9190319cff9f51a6d6fc01768eb36
MD5 bc0f105604734fec0693b72a2d603253
BLAKE2b-256 03b5b7f5f438c9cb23c7bb457a38960771db3e441f472162001a6c5dee8dbd02

See more details on using hashes here.

Provenance

The following attestation bundles were made for hep_data_llm-1.1.0.tar.gz:

Publisher: publish.yaml on gordonwatts/hep-data-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file hep_data_llm-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: hep_data_llm-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 32.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for hep_data_llm-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5328acabfb2ce75d052e350fbf613dbd570d68a7385e5ca5c46edc5817012790
MD5 3e87562211a053b534eed0113ff2fb44
BLAKE2b-256 783afcb3c088e92ec7ce43559fd6f43c7288374ec20a1194dd70d9915630dbfb

See more details on using hashes here.

Provenance

The following attestation bundles were made for hep_data_llm-1.1.0-py3-none-any.whl:

Publisher: publish.yaml on gordonwatts/hep-data-llm

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page