Preprocessing for jet tagging
Project description
UPP: Umami PreProcessing
This is a modular preprocessing pipeline for jet tagging.
It addresses several issues with the current umami preprocessing workflow, and uses the atlas-ftag-tools
package extensively.
Documentation is under construction here
Comparisons with umami
Main changes
- modular, class-based design
- h5 virtual datasets to wrap the source files
- 2 main stages: resample -> merge -> done!
- parallelised processing of flavours within a sample
- support for different resampling "regions", which is usefull for Xbb preprocessing
- ndim sampling support, which is also useful for Xbb
- "new" improved training file format (which is actually just the tdd output format)
- structured arrays are smaller on disk and therefore faster to read
- only one dataloader is needed and can be reused for training and testing
- other plotting scripts can support a single file format
- normalisation/concatenation is applied on the fly during training
- training files can contain supersets of variables used for training
- new "countup" samping which is more efficient than pdf (it uses more the available statistics and reduces duplication of jets)
- the code estimates the number of unique jets for you and saves this number as an attribute in the output file
Performance and LOC
Compared with a comparable preprocessing config from umami:
- train file size decreased by 30%
- train read speed improved by 30% (separate from file size reduction, by using
read_direct
) - only one command is needed to generate all preprocessing outputs (running with
--split=all
will produce train/val/test files) - lines of code are reduced vs umami by 4x
- 10x faster than default umami preprocessing (0.06 vs 0.825 hours/million jets)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file umami-preprocessing-0.0.1.tar.gz
.
File metadata
- Download URL: umami-preprocessing-0.0.1.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa65408cb8900bfa4ae3f66ba12d45654bce9340e24c339e9b35c9538ab17525 |
|
MD5 | 100b7ee544d1c9d4750d21c0364975fd |
|
BLAKE2b-256 | 0d9da0221bf3e0f512175fd772ff745c87c36b050e44f2d2469b45cf3d38f3db |
File details
Details for the file umami_preprocessing-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: umami_preprocessing-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/4.0.2 CPython/3.11.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 80ad8e37df51292a5e270e8fe8d96b7914e7fcac5e305e8c456af81e633d6495 |
|
MD5 | 933b5d3d9a9a027cdffe25bca6c493d0 |
|
BLAKE2b-256 | 65aa0f4f5ab81392993f8b0854bb78c8f1a7816bff430e4ee65913a21c1a05bd |