Skip to main content

Helper python package for ATLAS Common NTuple Analysis work.

Project description

atlas-schema v0.5.0

Actions Status Documentation Status

PyPI version Conda-Forge PyPI platforms

GitHub Discussion

This is the python package containing schemas and helper functions enabling analyzers to work with ATLAS datasets (Monte Carlo and Data), using coffea.

Hello World

The simplest example is to just get started processing the file as expected:

from atlas_schema.schema import NtupleSchema
from coffea import dataset_tools
import awkward as ak

fileset = {"ttbar": {"files": {"path/to/ttbar.root": "tree_name"}}}
samples, report = dataset_tools.preprocess(fileset)


def noop(events):
    return ak.fields(events)


fields = dataset_tools.apply_to_fileset(noop, samples, schemaclass=NtupleSchema)
print(fields)

which produces something similar to

{
    "ttbar": [
        "dataTakingYear",
        "mcChannelNumber",
        "runNumber",
        "eventNumber",
        "lumiBlock",
        "actualInteractionsPerCrossing",
        "averageInteractionsPerCrossing",
        "truthjet",
        "PileupWeight",
        "RandomRunNumber",
        "met",
        "recojet",
        "truth",
        "generatorWeight",
        "beamSpotWeight",
        "trigPassed",
        "jvt",
    ]
}

However, a more involved example to apply a selection and fill a histogram looks like below:

import awkward as ak
from hist import Hist
import matplotlib.pyplot as plt
from coffea import processor
from distributed import Client

from atlas_schema.schema import NtupleSchema


class MyFirstProcessor(processor.ProcessorABC):
    def __init__(self):
        pass

    def process(self, events):
        dataset = events.metadata["dataset"]
        h_ph_pt = (
            Hist.new.StrCat(["all", "pass", "fail"], name="isEM")
            .Regular(200, 0.0, 2000.0, name="pt", label="$pt_{\gamma}$ [GeV]")
            .Int64()
        )

        cut = ak.all(events.ph.isEM, axis=1)
        h_ph_pt.fill(isEM="all", pt=ak.firsts(events.ph.pt / 1.0e3))
        h_ph_pt.fill(isEM="pass", pt=ak.firsts(events[cut].ph.pt / 1.0e3))
        h_ph_pt.fill(isEM="fail", pt=ak.firsts(events[~cut].ph.pt / 1.0e3))

        return {
            dataset: {
                "entries": ak.num(events, axis=0),
                "ph_pt": h_ph_pt,
            }
        }

    def postprocess(self, accumulator):
        pass


if __name__ == "__main__":
    client = Client()

    fileset = {"700352.Zqqgamma.mc20d.v1": {"files": {"ntuple.root": "analysis"}}}

    run = processor.Runner(
        executor=processor.IterativeExecutor(compression=None),
        schema=NtupleSchema,
        savemetrics=True,
    )

    out, metrics = run(fileset, processor_instance=MyFirstProcessor())

    print(out)
    print(metrics)

    fig, ax = plt.subplots()
    computed["700352.Zqqgamma.mc20d.v1"]["ph_pt"].plot1d(ax=ax)
    ax.set_xscale("log")
    ax.legend(title="Photon pT for Zqqgamma")

    fig.savefig("ph_pt.pdf")

which produces

three stacked histograms of photon pT, with each stack corresponding to: no selection, requiring the isEM flag, and inverting the isEM requirement

Processing with Systematic Variations

For analyses requiring systematic uncertainty evaluation, you can easily iterate over all systematic variations using the new events["NOSYS"] alias and systematic_names property:

import awkward as ak
from hist import Hist
from coffea import processor
from atlas_schema.schema import NtupleSchema


class SystematicsProcessor(processor.ProcessorABC):
    def __init__(self):
        self.h = (
            Hist.new.StrCat([], name="variation", growth=True)
            .Regular(50, 0.0, 500.0, name="jet_pt", label="Leading Jet $p_T$ [GeV]")
            .Int64()
        )

    def process(self, events):
        dsid = events.metadata["dataset"]

        # Process all systematic variations including nominal ("NOSYS")
        for variation in events.systematic_names:
            event_view = events[variation]

            # Fill histogram with leading jet pT for this systematic variation
            leading_jet_pt = event_view.jet.pt[:, 0] / 1_000  # Convert MeV to GeV
            weights = (
                event_view.weight.mc
                if hasattr(event_view, "weight")
                else ak.ones_like(leading_jet_pt)
            )

            self.h.fill(variation=variation, jet_pt=leading_jet_pt, weight=weights)

        return {
            "hist": self.h,
            "meta": {"sumw": {dsid: {(events.metadata["fileuuid"], ak.sum(weights))}}},
        }

    def postprocess(self, accumulator):
        return accumulator

This approach allows you to seamlessly process both nominal and systematic variations in a single loop, eliminating the need for special-case handling of the nominal variation.

Developer Notes

Converting Enums from C++ to Python

This useful vim substitution helps:

%s/    \([A-Za-z]\+\)\s\+=  \(\d\+\),\?/    \1: Annotated[int, "\1"] = \2

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atlas_schema-0.5.0.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

atlas_schema-0.5.0-py3-none-any.whl (27.5 kB view details)

Uploaded Python 3

File details

Details for the file atlas_schema-0.5.0.tar.gz.

File metadata

  • Download URL: atlas_schema-0.5.0.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for atlas_schema-0.5.0.tar.gz
Algorithm Hash digest
SHA256 d55c7a9f44e47d8255e0dcc5dddbc4994a92f8a063ebb503b2472c2ec20c0b2d
MD5 635e78df59cf2038239400888cb03faa
BLAKE2b-256 af8236f7737887ed278b4a4d84c3c1e4fb04645dd961af23e5f77b2234154454

See more details on using hashes here.

Provenance

The following attestation bundles were made for atlas_schema-0.5.0.tar.gz:

Publisher: cd.yml on scipp-atlas/atlas-schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file atlas_schema-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: atlas_schema-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for atlas_schema-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f6fdeefc6e267ac6baf4246bf27b195c9ecc79434e98f7598930290e0bea4832
MD5 bd815967497964bb0b36d95cf18484b5
BLAKE2b-256 c48f1be0f5e59ab91e22fb9d3c1e1b713186f704f0c77ef0b122c7cf1e399eb9

See more details on using hashes here.

Provenance

The following attestation bundles were made for atlas_schema-0.5.0-py3-none-any.whl:

Publisher: cd.yml on scipp-atlas/atlas-schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page