If you want to read more about uncertainty-aware neural networks, I highly recommend our #paper on machine-learned #uncertainties for the calibration of calorimeter signals in the #ATLAS experiment:
inspirehep.net/literature/2...
@lorenzvogel.bsky.social
If you want to read more about uncertainty-aware neural networks, I highly recommend our #paper on machine-learned #uncertainties for the calibration of calorimeter signals in the #ATLAS experiment:
inspirehep.net/literature/2...
Spending the holidays teaching my little nephews some machine-learning basics β it's not that difficult; the Bayesian neural network (BNN) loss function follows a clear statistics logic
28.12.2024 12:04 β π 2 π 0 π¬ 1 π 0Precision calibration of calorimeter signals in the ATLAS experiment using an uncertainty-aware neural network
arxiv.org/abs/2412.04370v1
ATLAS Collaboration
This has become one of my most favorite (pre-)Christmas traditions: the annual #GlΓΌhwein workshop β this year at the Karlsruher Institut fΓΌr Technologie (KIT), organized by Markus Klute (thank you!)
16.12.2024 22:30 β π 0 π 0 π¬ 0 π 0Summary: the #BNN not only yields a continuous and smooth topo-cluster #calibration function that improves the performance relative to the standard LCW calibration β but also provides meaningful single-cluster #uncertainties on the predicted responses and the calibrated energies
09.12.2024 10:55 β π 0 π 0 π¬ 0 π 0Both pulls follow an approximate Gaussian shape in the center (as expected for stochastic or noisy data) β and both networks slightly overestimate the uncertainty, meaning that the per-cluster error is conservative
09.12.2024 10:54 β π 0 π 0 π¬ 1 π 0After checking that the BNN and RE uncertainties are highly comparable, they can be further evaluated with respect to the spread of the predicted response around the target β the "pull" allows us to test if the learned single-cluster uncertainty covers the experimental spread
09.12.2024 10:54 β π 0 π 0 π¬ 1 π 0We can also directly compare the individual uncertainty predictions from the Bayesian neural network (BNN) and the repulsive ensemble (RE) cluster-by-cluster β the two uncertainty predictions track each other well
09.12.2024 10:54 β π 0 π 0 π¬ 1 π 0The #systematic uncertainty (part of the likelihood) approaches the same plateau as for the BNN when increasing the training-dataset size (green and brown curves) β and the #statistical uncertainty (induced by the repulsive force) again vanishes (red and blue curves)
09.12.2024 10:54 β π 0 π 0 π¬ 1 π 0The idea is to determine uncertainties by forcing an ensemble of simultaneously trained networks to not predict the same best-fit parameters, but to force the ensemble to spread out and to explore the loss landscape around the actual minimum
09.12.2024 10:53 β π 0 π 0 π¬ 1 π 0Learned #uncertainties on neural-network outputs are not a standard method used in HEP... To increase confidence in the uncertainty predictions from our BNN setup, we compare our BNN results with an alternative way of learning uncertainties β so-called #repulsive #ensembles (REs)
09.12.2024 10:53 β π 0 π 0 π¬ 1 π 0We see that these topo-clustes are all located in the tile-gap scintillator region: the tile-gap scintillator is not a regular calorimeter β the feature quality in this region is insufficient, so it is expected that the calibration in this region yields a large uncertainty
09.12.2024 10:53 β π 0 π 0 π¬ 1 π 0An interesting question is what role the learned uncertainties can play in understanding the data... When looking at the uncertainty spectrum, we see a distinctive secondary maximum β what feature leads the BNN uncertainties to flag these topo-clusters?
09.12.2024 10:52 β π 0 π 0 π¬ 1 π 0...(ii) the "systematic uncertainty" (green curve) captures the intrinsic data stochasticity (pile-up), and accounts for limited network expressivity and bad hyper-parameters β for learning the stochastic nature, more data helps, but it does not got to zero but approaches a finite plateau
09.12.2024 10:52 β π 0 π 0 π¬ 1 π 0The total BNN uncertainty actually consists of two terms: (i) the "statistical uncertainty" (red curve) accounts for a lack of knowledge due to a limited amount of training data and vanishes in the limit of infinite training data, and...
09.12.2024 10:51 β π 0 π 0 π¬ 1 π 0The improvements of the relative local energy resolution when evaluated as a function of the in-time (left) and out-of-time (right) pile-up activity shows a significant level of cluster-by-cluster pile-up mitigation when applying the ML-derived calibration
09.12.2024 10:50 β π 0 π 0 π¬ 1 π 0Another performance measure is the relative energy resolution β again, the BNN is better over the whole energy range, and especially spectacular at low energies (it best learns the signal-source transition from inelastic hadronic interactions to ionisation-dominated signals)
09.12.2024 10:49 β π 0 π 0 π¬ 1 π 0To evaluate the performance, we compare the BNN predictions to the target values in terms of the signal linearity (should peak at zero) β the BNN calibration performs significantly better than any other of the considered scales (with a significant precision gain at low energies)
09.12.2024 10:49 β π 0 π 0 π¬ 1 π 0The corresponding loss function then follows a clear statistics logic: the first term can be seen as a weight regularization avoiding over-training, and the second term tries to maximize the likelihood
09.12.2024 10:49 β π 0 π 0 π¬ 1 π 0The network training can be described as constructing a #variational approximation, where we approximate the intractable posterior with a simplified and tractable distribution β to learn the variational posterior we minimize the Kullback-Leibler (KL) divergence
09.12.2024 10:48 β π 0 π 0 π¬ 1 π 0BNN weights are not trained as fixed values, the parameters are described by weight distributions β during inference, the learned weight distributions are sampled multiple times to generate an ensemble of networks, from which we construct the central value and the uncertainty
09.12.2024 10:48 β π 1 π 0 π¬ 1 π 0Since control and #uncertainties are key in #HEP, our BNN is trained to also learn an uncertainty associated with the predicted #calibration function β this uncertainty allows for a better understanding of possible signal-quality issues in the data or training-related limitations
09.12.2024 10:48 β π 1 π 0 π¬ 1 π 0#ML can be used to learn multi-dimensional continuous #calibration functions β we do this by training a #regression network (an uncertainty-aware BNN) to learn the "response" of single topo-clusters as a function over feature space using a properly defined minimization task
09.12.2024 10:47 β π 1 π 0 π¬ 1 π 0The principal signals of the #ATLAS calorimeters are so-called "topo-clusters". The signals are calibrated to correctly measure the energy deposited by EM showers β but therefore they provide no compensation for energy losses in the complex development of hadronic showers...
09.12.2024 10:46 β π 2 π 0 π¬ 1 π 0So when used correctly, #ML is a perfect tool to quantify and control different kinds of #uncertainties in #LHC physics β and the future of the LHC really is triggered, inspired and shaped by data science as a new common language of particle experiment and theory
09.12.2024 10:46 β π 0 π 0 π¬ 1 π 0TL;DR: Neural networks are extremely powerful numerical tools that offer a lot of control: (i) an appropriate and well-defined loss function tells us exactly what the network training is trying to achieve; (ii) uncertainty-aware networks (like Bayesian NNs) come with an error bar on their output
09.12.2024 10:46 β π 1 π 0 π¬ 1 π 0Welcome to the machine!
After more than one year, the time has finally come: the @uniheidelberg.bsky.social non-ATLAS HEP-ML group has published its first (preprint) paper in collaboration with the @atlasexperiment.bsky.social
arxiv.org/abs/2412.04370
inspirehep.net/literature/2...