Skip to main content

Preprocessing for jet tagging

Project description

Code style: black codecov docs

UPP: Umami PreProcessing

This is a modular preprocessing pipeline for jet tagging. It addresses several issues with the current umami preprocessing workflow, and uses the atlas-ftag-tools package extensively.

Documentation is under construction here

Comparisons with umami

Main changes

  • modular, class-based design
  • h5 virtual datasets to wrap the source files
  • 2 main stages: resample -> merge -> done!
  • parallelised processing of flavours within a sample
  • support for different resampling "regions", which is usefull for Xbb preprocessing
  • ndim sampling support, which is also useful for Xbb
  • "new" improved training file format (which is actually just the tdd output format)
    • structured arrays are smaller on disk and therefore faster to read
    • only one dataloader is needed and can be reused for training and testing
    • other plotting scripts can support a single file format
    • normalisation/concatenation is applied on the fly during training
    • training files can contain supersets of variables used for training
  • new "countup" samping which is more efficient than pdf (it uses more the available statistics and reduces duplication of jets)
  • the code estimates the number of unique jets for you and saves this number as an attribute in the output file

Performance and LOC

Compared with a comparable preprocessing config from umami:

  1. train file size decreased by 30%
  2. train read speed improved by 30% (separate from file size reduction, by using read_direct)
  3. only one command is needed to generate all preprocessing outputs (running with --split=all will produce train/val/test files)
  4. lines of code are reduced vs umami by 4x
  5. 10x faster than default umami preprocessing (0.06 vs 0.825 hours/million jets)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

umami-preprocessing-0.0.1.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

umami_preprocessing-0.0.1-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file umami-preprocessing-0.0.1.tar.gz.

File metadata

  • Download URL: umami-preprocessing-0.0.1.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for umami-preprocessing-0.0.1.tar.gz
Algorithm Hash digest
SHA256 aa65408cb8900bfa4ae3f66ba12d45654bce9340e24c339e9b35c9538ab17525
MD5 100b7ee544d1c9d4750d21c0364975fd
BLAKE2b-256 0d9da0221bf3e0f512175fd772ff745c87c36b050e44f2d2469b45cf3d38f3db

See more details on using hashes here.

File details

Details for the file umami_preprocessing-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for umami_preprocessing-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 80ad8e37df51292a5e270e8fe8d96b7914e7fcac5e305e8c456af81e633d6495
MD5 933b5d3d9a9a027cdffe25bca6c493d0
BLAKE2b-256 65aa0f4f5ab81392993f8b0854bb78c8f1a7816bff430e4ee65913a21c1a05bd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page