Object partitioning package.
Project description
Object Partitioning
A Python package to help understand partitioning by objects. Works only on ATLAS xAOD format files (PHYS, PHYSLITE, etc.).
Writes a parquet file with per-event data, a bin_boundaries.yaml files, and a python pickle file with an n-dimensional histogram.
- Each axis is a count of PHYSLITE objects (muons, electrons, jets, etc).
- Looks at each axis and tries to divide the counts into equal bins of events.
- Then sub-divides each bin of axis 1 by axis 2 and axis 3 etc (making a n-dimensional histogram).
- Saves the binning and histogram to files.
- Prints out a table with the 10 largest and smallest bins.
The following are the axes:
- Jets (
AnalysisJets) - Large-R Jets (
AnalysisLargeRJets) - Electrons (
AnalysisElectrons) - Muons (
AnalysisMuons) - Taus (
AnalysisTauJets) - Photons (
AnalysisPhotons) - MissingET (
MET_Core_AnalysisMET) - In ATLAS,metis analysis dependent. This is just the first object in theMissingETcontainer, withmet()called on that object.
Use atlas-object-partitioning partition --help to see available options. Set
--bins-per-axis to control how many bins are used per axis (defaults to 4).
Tail-capping optionally clips per-axis counts at a quantile before binning. This reduces long tails by replacing values above the chosen quantile with the cap value, which can help stabilize boundary selection when a few extreme events dominate an axis.
Tail-capping examples:
# Cap each axis at the 98th percentile before building boundaries
atlas-object-partitioning partition data18_13TeV:data18_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_PHYSLITE.grp18_v01_p6697 \
-n 50 --ignore-axes met --bins-per-axis 3 --tail-cap-quantile 0.98
# Combine tail-capping with target scan to see summary stats
atlas-object-partitioning partition data18_13TeV:data18_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_PHYSLITE.grp18_v01_p6697 \
-n 50 --ignore-axes met --tail-cap-quantile 0.95 \
--target-min-fraction 0.01 --target-max-fraction 0.05 \
--target-bins-min 3 --target-bins-max 3
Sparse-bin merging optionally merges adjacent bins per axis after building the histogram. It uses marginal counts for each axis and repeatedly merges the smallest bins into their nearest neighbor until each marginal bin fraction meets the threshold or the axis hits the minimum bin count.
Sparse-bin merging examples:
# Merge marginal bins below 1% along each axis, keep at least 2 bins per axis
atlas-object-partitioning partition data18_13TeV:data18_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_PHYSLITE.grp18_v01_p6697 \
-n 50 --ignore-axes met --bins-per-axis 3 --merge-min-fraction 0.01 \
--merge-min-bins 2
# Combine target scan with a stricter merge threshold
atlas-object-partitioning partition data18_13TeV:data18_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_PHYSLITE.grp18_v01_p6697 \
-n 50 --ignore-axes met --target-min-fraction 0.0 --target-max-fraction 1.0 \
--target-bins-min 3 --target-bins-max 3 --merge-min-fraction 0.05 \
--merge-min-bins 2
Adjacent grid-cell merging groups sparse n-D cells (sharing a face) without
changing the bin boundaries. The merged groups are written to
bin_boundaries.yaml, and the CLI prints a merged-cell summary with the total
grid cells, how many were combined, and the final group count.
bin_boundaries.yaml schema:
axes: map of axis name to list of bin edges (inclusive lower, exclusive upper).merged_cells: optional summary of merged n-D cell groups when--merge-cell-min-fractionis used.min_fraction: the fraction threshold used for grouping.groups: list of merged groups.cells: list of grid cell indices, keyed by axis name (0-based).count: total event count in the merged group.fraction: total event fraction for the merged group.
Pretty-print merged cells from the CLI:
# Summarize merged cell groups with index ranges per axis
atlas-object-partitioning describe-cells bin_boundaries.yaml
# Include bin edge ranges with the index ranges
atlas-object-partitioning describe-cells bin_boundaries.yaml --show-values
# Sort groups by size (count) descending
atlas-object-partitioning describe-cells bin_boundaries.yaml --sort-by-size
Update merged cell counts using an existing binning and merged-cell grouping
(the input YAML must already contain merged_cells.groups), e.g. when the
binning was defined on a smaller scan but you want counts from a larger scan:
atlas-object-partitioning repartition data18_13TeV:data18_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_PHYSLITE.grp18_v01_p6697 \
bin_boundaries.yaml -n 500 -o bin_boundaries.repartition.yaml
Adjacent grid-cell merging example:
# Merge sparse n-D cells below 1% into adjacent groups
atlas-object-partitioning partition data18_13TeV:data18_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_PHYSLITE.grp18_v01_p6697 \
-n 50 --ignore-axes met --bins-per-axis 3 --merge-cell-min-fraction 0.01
If you are trying to balance max bin fraction (~5%) with minimum group size (~1%), the adjacent grid-cell merge example above is the recommended starting point.
Adaptive binning examples (greedily reduces bins per axis to approach target min/max fractions):
# Baseline adaptive search targeting 1% min nonzero, 5% max fraction
atlas-object-partitioning partition data18_13TeV:data18_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_PHYSLITE.grp18_v01_p6697 \
-n 50 --ignore-axes met --bins-per-axis 3 \
--adaptive-bins --adaptive-min-fraction 0.01 --adaptive-max-fraction 0.05
# Constrain the minimum bins per axis and keep explicit overrides fixed
atlas-object-partitioning partition data18_13TeV:data18_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_PHYSLITE.grp18_v01_p6697 \
-n 50 --ignore-axes met --bins-per-axis 4 \
--bins-per-axis-override n_jets=4 --bins-per-axis-override n_large_jets=4 \
--adaptive-bins --adaptive-min-fraction 0.005 --adaptive-max-fraction 0.05 \
--adaptive-min-bins 2
An example output:
$ atlas-object-partitioning partition data18_13TeV:data18_13TeV.periodAllYear.physics_Main.PhysCont.DAOD_PHYSLITE.grp18_v01_p6697 -n 10
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ n_jets ┃ n_large_jets ┃ n_electrons ┃ n_muons ┃ n_taus ┃ n_photons ┃ met ┃ count ┃ fraction ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ [0.0, 6.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [0.0, 3.0) │ [0.0, 11.0) │ 4,611 │ 0.011 │
│ [0.0, 6.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [0.0, 3.0) │ [0.0, 11.0) │ 3,605 │ 0.009 │
│ [0.0, 6.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [0.0, 3.0) │ [11.0, 18.0) │ 3,401 │ 0.008 │
│ [0.0, 6.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [0.0, 3.0) │ [11.0, 18.0) │ 3,107 │ 0.008 │
│ [0.0, 6.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [0.0, 1.0) │ [0.0, 3.0) │ [0.0, 11.0) │ 3,047 │ 0.007 │
│ [0.0, 6.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [0.0, 1.0) │ [0.0, 3.0) │ [11.0, 18.0) │ 2,708 │ 0.007 │
│ [0.0, 6.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [1.0, 2.0) │ [0.0, 3.0) │ [0.0, 11.0) │ 2,353 │ 0.006 │
│ [0.0, 6.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [0.0, 3.0) │ [18.0, 26.0) │ 2,141 │ 0.005 │
│ [0.0, 6.0) │ [0.0, 1.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [1.0, 2.0) │ [0.0, 3.0) │ [11.0, 18.0) │ 2,139 │ 0.005 │
│ [0.0, 6.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [0.0, 3.0) │ [18.0, 26.0) │ 1,964 │ 0.005 │
└────────────┴──────────────┴─────────────┴────────────┴────────────┴────────────┴──────────────┴───────┴──────────┘
Least 10 bins
┏━━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┳━━━━━━━┳━━━━━━━━━━┓
┃ n_jets ┃ n_large_jets ┃ n_electrons ┃ n_muons ┃ n_taus ┃ n_photons ┃ met ┃ count ┃ fraction ┃
┡━━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━╇━━━━━━━╇━━━━━━━━━━┩
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [0.0, 1.0) │ [1.0, 2.0) │ [4.0, 5.0) │ [26.0, 160.0) │ 0 │ 0.000 │
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [0.0, 1.0) │ [2.0, 7.0) │ [5.0, 17.0) │ [11.0, 18.0) │ 0 │ 0.000 │
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [0.0, 1.0) │ [2.0, 7.0) │ [4.0, 5.0) │ [26.0, 160.0) │ 0 │ 0.000 │
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [0.0, 1.0) │ [2.0, 7.0) │ [3.0, 4.0) │ [26.0, 160.0) │ 0 │ 0.000 │
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [1.0, 2.0) │ [2.0, 7.0) │ [5.0, 17.0) │ [11.0, 18.0) │ 0 │ 0.000 │
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [1.0, 2.0) │ [2.0, 7.0) │ [4.0, 5.0) │ [26.0, 160.0) │ 0 │ 0.000 │
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [2.0, 7.0) │ [2.0, 7.0) │ [4.0, 5.0) │ [0.0, 11.0) │ 0 │ 0.000 │
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [2.0, 7.0) │ [2.0, 7.0) │ [3.0, 4.0) │ [26.0, 160.0) │ 0 │ 0.000 │
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [2.0, 7.0) │ [2.0, 7.0) │ [3.0, 4.0) │ [11.0, 18.0) │ 0 │ 0.000 │
│ [0.0, 6.0) │ [1.0, 3.0) │ [2.0, 8.0) │ [2.0, 7.0) │ [2.0, 7.0) │ [0.0, 3.0) │ [11.0, 18.0) │ 0 │ 0.000 │
└────────────┴──────────────┴─────────────┴────────────┴────────────┴─────────────┴───────────────┴───────┴──────────┘
Histogram summary: max fraction 0.011, zero bins 16,384
Installation
Install via pip:
pip install atlas-object-partitioning
Run via uv:
- If you don't have the
uvtool installed, it is highly recommended as a way to quickly install local versions of the code without having to build custom environments, etc.
Install locally so always available:
uv tool install atlas-object-partitioning
atlas-object-partitioning --help
Update it to the most recent version with uv tool upgrade atlas-object-partitioning.
Or running it in an ephemeral environment (recommended for intermittent or one-off use):
uvx atlas-object-partitioning --help
Or install from source:
git clone https://github.com/gordonwatts/object-partitioning.git
cd atlas-object-partitioning
pip install .
Usage
You'll need a servicex.yaml file with a valid token to use the ServiceX backend. See here to help you get started.
From the command line.
- Use
atlas-object-partitioning partition --helpto see all partition options - Specify a rucio dataset, for example,
atlas-object-partitioning partition mc23_13p6TeV:mc23_13p6TeV.601237.PhPy8EG_A14_ttbar_hdamp258p75_allhad.deriv.DAOD_PHYSLITE.e8514_s4369_r16083_p6697 - Use the
-noption to specify how many files in the dataset to run over. By default 1, specify0to run on everything. Some datasets are quite large. Feel free to start the transform, then re-run the same command to have it pick up where it left off. See the dashboard to monitor status. - Use
--adaptive-binsto greedily reduce bins per axis toward target min/max fractions. Note that adaptive mode cannot be combined with--target-min-fractionor--target-max-fraction.
If you wish, you can also use it as a library:
from atlas_object_partitioning.partition import partition_objects
from atlas_object_partitioning.scan_ds import scan_dataset
# Example: Partition a list of objects
data = [...] # your data here
partitions = partition_objects(data, num_partitions=4)
# Scan a dataset
results = scan_dataset('object_counts.parquet')
Goal
We want to come up with a set of simple square partitions that will have 5% as the largest partition and a minimal number of zeros in the partition.
Contributing
Contributions are welcome! Please open issues or pull requests on GitHub.
- Fork the repository
- Create your feature branch (
git checkout -b feature/my-feature) - Commit your changes (
git commit -am 'Add new feature') - Push to the branch (
git push origin feature/my-feature) - Open a pull request
License
This project is licensed under the terms of the MIT license. See LICENSE.txt for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atlas_object_partitioning-1.2.0.tar.gz.
File metadata
- Download URL: atlas_object_partitioning-1.2.0.tar.gz
- Upload date:
- Size: 21.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
318c8e9fc6c33acd93f0445291d2b56d70c95ec7b68a4081ed7f2cd920fe9476
|
|
| MD5 |
6a7a8ea8562b85c9ec6ea593b7c0ae3b
|
|
| BLAKE2b-256 |
54377db7902c28089cdec393f419537ab7f6e0137117aa95a0dbc71a2d31d637
|
Provenance
The following attestation bundles were made for atlas_object_partitioning-1.2.0.tar.gz:
Publisher:
publish-to-pypi.yml on gordonwatts/object-partitioning
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_object_partitioning-1.2.0.tar.gz -
Subject digest:
318c8e9fc6c33acd93f0445291d2b56d70c95ec7b68a4081ed7f2cd920fe9476 - Sigstore transparency entry: 830507056
- Sigstore integration time:
-
Permalink:
gordonwatts/object-partitioning@254954269ff0f37fec0c3d4550638a6fd15afeb2 -
Branch / Tag:
refs/tags/1.2.0 - Owner: https://github.com/gordonwatts
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@254954269ff0f37fec0c3d4550638a6fd15afeb2 -
Trigger Event:
push
-
Statement type:
File details
Details for the file atlas_object_partitioning-1.2.0-py3-none-any.whl.
File metadata
- Download URL: atlas_object_partitioning-1.2.0-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1bd83177255146a0c0b17f5328f5c01720de90164a4cc57f812e6e6035170dae
|
|
| MD5 |
54bed8b4a888f5cce1ca570eeda3bddc
|
|
| BLAKE2b-256 |
8c2f983bf894e85a0df6c4ff0e82815e572bb276d3d206e10bb1d3da399f4319
|
Provenance
The following attestation bundles were made for atlas_object_partitioning-1.2.0-py3-none-any.whl:
Publisher:
publish-to-pypi.yml on gordonwatts/object-partitioning
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
atlas_object_partitioning-1.2.0-py3-none-any.whl -
Subject digest:
1bd83177255146a0c0b17f5328f5c01720de90164a4cc57f812e6e6035170dae - Sigstore transparency entry: 830507057
- Sigstore integration time:
-
Permalink:
gordonwatts/object-partitioning@254954269ff0f37fec0c3d4550638a6fd15afeb2 -
Branch / Tag:
refs/tags/1.2.0 - Owner: https://github.com/gordonwatts
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-to-pypi.yml@254954269ff0f37fec0c3d4550638a6fd15afeb2 -
Trigger Event:
push
-
Statement type: