Appendix 1: Adding systematic uncertainties#
Johann Brehmer, Felix Kling, Irina Espejo, and Kyle Cranmer 2018-2019
In this tutorial we’ll explain how to add systematic uncertainties to the MadMiner workflow.
Preparations#
Before you execute this notebook, make sure you have MadGraph, Pythia and Delphes installed.
import os
import logging
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline
from madminer.core import MadMiner
from madminer.lhe import LHEReader
from madminer.sampling import SampleAugmenter
from madminer import sampling
from madminer.ml import ScoreEstimator
from madminer.fisherinformation import FisherInformation, profile_information, project_information
from madminer.plotting import plot_systematics, plot_fisher_information_contours_2d
Please enter here the environment variable pointing to your MG5 installation folder.
mg_dir = os.getenv('MG_FOLDER_PATH')
MadMiner uses the Python logging
module to provide additional information and debugging output. You can choose how much of this output you want to see by switching the level in the following lines to logging.DEBUG
or logging.WARNING
.
# MadMiner output
logging.basicConfig(
format='%(asctime)-5.5s %(name)-20.20s %(levelname)-7.7s %(message)s',
datefmt='%H:%M',
level=logging.DEBUG
)
# Output of all other modules (e.g. matplotlib)
for key in logging.Logger.manager.loggerDict:
if "madminer" not in key:
logging.getLogger(key).setLevel(logging.WARNING)
1. Parameters and benchmarks#
We’ll just load the MadMiner setup from the first part of this tutorial:
miner = MadMiner()
miner.load('data/setup.h5')
16:19 madminer.utils.inter DEBUG HDF5 file does not contain is_reference field.
16:19 madminer.core INFO Found 2 parameters:
16:19 madminer.core INFO CWL2 (LHA: dim6 2, maximal power in squared ME: (2,), range: (-20.0, 20.0))
16:19 madminer.core INFO CPWL2 (LHA: dim6 5, maximal power in squared ME: (2,), range: (-20.0, 20.0))
16:19 madminer.core INFO Found 6 benchmarks:
16:19 madminer.core INFO sm: CWL2 = 0.00e+00, CPWL2 = 0.00e+00
16:19 madminer.core INFO w: CWL2 = 15.20, CPWL2 = 0.10
16:19 madminer.core INFO neg_w: CWL2 = -1.54e+01, CPWL2 = 0.20
16:19 madminer.core INFO ww: CWL2 = 0.30, CPWL2 = 15.10
16:19 madminer.core INFO neg_ww: CWL2 = 0.40, CPWL2 = -1.53e+01
16:19 madminer.core INFO morphing_basis_vector_5: CWL2 = -1.58e+01, CPWL2 = -1.94e+01
16:19 madminer.core INFO Found morphing setup with 6 components
16:19 madminer.core INFO Did not find systematics setup.
2. Set up systematics, save settings#
This is where things become interesting: We want to model systematic uncertainties. The main function is add_systematics()
, the keyword effect
determines how the effect of the nuisance parameters on the event weights is calculated. For effect="norm"
, the nuisance parameter rescales thee overall cross section of one or multiple samples. For effect="pdf"
, its effect is calculated with PDF variations. Finally, with effect="scale"
scale variations are used.
Here we consider three nuisance parameters: one for the signal normalization, one for the background normalization, and one for scale uncertainties (which we here assume to be correlated between signal and background).
miner.add_systematics(effect="norm", systematic_name="signal_norm", norm_variation=1.1)
miner.add_systematics(effect="norm", systematic_name="bkg_norm", norm_variation=1.2)
miner.add_systematics(effect="scale", systematic_name="scales", scale="mu")
Again, we save our setup:
miner.save('data/setup_systematics.h5')
16:19 madminer.core INFO Saving setup (including morphing) to data/setup_systematics.h5
3. Run MadGraph#
Now it’s time to run MadGraph. MadMiner will instruct MadGraph to use its built-in systematics
tool to calculate how the event weights change under the scale variation.
miner.run(
sample_benchmark='sm',
mg_directory=mg_dir,
mg_process_directory='./mg_processes/signal_systematics',
proc_card_file='cards/proc_card_signal.dat',
param_card_template_file='cards/param_card_template.dat',
run_card_file='cards/run_card_signal_small.dat',
log_directory='logs/signal',
python2_override=True,
systematics=["signal_norm", "scales"],
)
16:20 madminer.utils.inter INFO Generating MadGraph process folder from cards/proc_card_signal.dat at ./mg_processes/signal_systematics
16:20 madminer.core INFO Run 0
16:20 madminer.core INFO Sampling from benchmark: sm
16:20 madminer.core INFO Original run card: cards/run_card_signal_small.dat
16:20 madminer.core INFO Original Pythia8 card: None
16:20 madminer.core INFO Original config card: None
16:20 madminer.core INFO Copied run card: /madminer/cards/run_card_0.dat
16:20 madminer.core INFO Copied Pythia8 card: None
16:20 madminer.core INFO Copied config card: None
16:20 madminer.core INFO Param card: /madminer/cards/param_card_0.dat
16:20 madminer.core INFO Reweight card: /madminer/cards/reweight_card_0.dat
16:20 madminer.core INFO Log file: run_0.log
16:20 madminer.core INFO Creating param and reweight cards in ./mg_processes/signal_systematics//madminer/cards/param_card_0.dat, ./mg_processes/signal_systematics//madminer/cards/reweight_card_0.dat
16:20 madminer.utils.inter INFO Starting MadGraph and Pythia in ./mg_processes/signal_systematics
16:23 madminer.core INFO Finished running MadGraph! Please check that events were succesfully generated in the following folders:
./mg_processes/signal_systematics/Events/run_01
miner.run(
sample_benchmark='sm',
mg_directory=mg_dir,
mg_process_directory='./mg_processes/bkg_systematics',
proc_card_file='cards/proc_card_background.dat',
param_card_template_file='cards/param_card_template.dat',
run_card_file='cards/run_card_background.dat',
log_directory='logs/background',
python2_override=True,
systematics=["bkg_norm", "scales"],
)
16:23 madminer.utils.inter INFO Generating MadGraph process folder from cards/proc_card_background.dat at ./mg_processes/bkg_systematics
16:23 madminer.core INFO Run 0
16:23 madminer.core INFO Sampling from benchmark: sm
16:23 madminer.core INFO Original run card: cards/run_card_background.dat
16:23 madminer.core INFO Original Pythia8 card: None
16:23 madminer.core INFO Original config card: None
16:23 madminer.core INFO Copied run card: /madminer/cards/run_card_0.dat
16:23 madminer.core INFO Copied Pythia8 card: None
16:23 madminer.core INFO Copied config card: None
16:23 madminer.core INFO Param card: /madminer/cards/param_card_0.dat
16:23 madminer.core INFO Reweight card: /madminer/cards/reweight_card_0.dat
16:23 madminer.core INFO Log file: run_0.log
16:23 madminer.core INFO Creating param and reweight cards in ./mg_processes/bkg_systematics//madminer/cards/param_card_0.dat, ./mg_processes/bkg_systematics//madminer/cards/reweight_card_0.dat
16:23 madminer.utils.inter INFO Starting MadGraph and Pythia in ./mg_processes/bkg_systematics
4. Load events from LHE file#
When adding LHE or Delphes files, use the systematics
keyword to list which systematic uncertainties apply to which sample:
lhe = LHEReader('data/setup_systematics.h5')
lhe.add_sample(
lhe_filename='mg_processes/signal_systematics/Events/run_01/unweighted_events.lhe.gz',
sampled_from_benchmark='sm',
is_background=False,
k_factor=1.1,
systematics=["signal_norm", "scales"]
)
lhe.add_sample(
lhe_filename='mg_processes/bkg_systematics/Events/run_01/unweighted_events.lhe.gz',
sampled_from_benchmark='sm',
is_background=True,
k_factor=1.1,
systematics=["bkg_norm", "scales"]
)
The next steps are unaffected by systematics.
lhe.set_smearing(
pdgids=[1,2,3,4,5,6,9,21,-1,-2,-3,-4,-5,-6], # Partons giving rise to jets
energy_resolution_abs=0.,
energy_resolution_rel=0.1,
pt_resolution_abs=None,
pt_resolution_rel=None,
eta_resolution_abs=0.1,
eta_resolution_rel=0.,
phi_resolution_abs=0.1,
phi_resolution_rel=0.,
)
lhe.add_observable(
'pt_j1',
'j[0].pt',
required=False,
default=0.,
)
lhe.add_observable(
'delta_phi_jj',
'j[0].deltaphi(j[1]) * (-1. + 2.*float(j[0].eta > j[1].eta))',
required=True,
)
lhe.add_observable(
'met',
'met.pt',
required=True,
)
lhe.add_cut('(a[0] + a[1]).m > 122.')
lhe.add_cut('(a[0] + a[1]).m < 128.')
lhe.add_cut('pt_j1 > 30.')
lhe.analyse_samples()
lhe.save('data/lhe_data_systematics.h5')
A look at distributions#
The function plot_systematics()
makes it easy to check the effect of the various nuisance parameters on a distribution:
plot_systematics?
_ = plot_systematics(
filename='data/lhe_data_systematics.h5',
theta=np.array([0.,0.]),
observable="pt_j1",
obs_label="$p_{T,j}$",
obs_range=(30.,400.),
ratio_range=(0.7,1.4)
)
_ = plot_systematics(
filename='data/lhe_data_systematics.h5',
theta=np.array([0.,0.]),
observable="delta_phi_jj",
obs_label=r"$\Delta\phi$",
obs_range=(-3.1,3.1),
ratio_range=(0.7,1.4)
)
5. Sampling#
Let’s generate training data for the SALLY method.
sampler = SampleAugmenter('data/lhe_data_systematics.h5', include_nuisance_parameters=True)
x, theta, t_xz, _ = sampler.sample_train_local(
theta=sampling.benchmark('sm'),
n_samples=100000,
folder='./data/samples',
filename='train_score_systematics'
)
6. Training#
The SALLY estimator will learn the score in terms of both the physics parameters and the nuisance parameters. In our case, its output will therefore be a vector with five components.
estimator = ScoreEstimator(n_hidden=(50,))
estimator.train(
method='sally',
x='data/samples/x_train_score_systematics.npy',
t_xz='data/samples/t_xz_train_score_systematics.npy',
)
estimator.save('models/sally_systematics')
7. Fisher information#
The Fisher information is now also a 5x5 matrix. Constraint terms on the nuisance parameters, representing our prior knowledge on their values, can be calculated with FisherInformation.nuisance_constraint_information()
.
fisher = FisherInformation('data/lhe_data_systematics.h5')
fisher.full_information?
info_sally, _ = fisher.full_information(
theta=np.zeros(5),
model_file='models/sally_systematics',
luminosity=1000000.,
include_xsec_info=False,
)
info_nuisance = fisher.nuisance_constraint_information()
info = info_sally + info_nuisance
for row in info:
print(" ".join(["{:6.2f}".format(entry) for entry in row]))
Fisher contours and profiled information#
When looking at a subset of these five parameter, we can either ignore the other parameters (“project”), or conservatively pick the most pessimistic value of them (“profile” them). MadMiner provides the functions profile_information()
and project_information()
for this purpose. Let’s do that for the parameter space of the two physics parameters:
info_proj = project_information(info, [0,1])
info_prof = profile_information(info, [0,1])
_ = plot_fisher_information_contours_2d(
[info_proj, info_prof],
inline_labels=["Projected", "Profiled"],
xrange=(-0.1,0.1),
yrange=(-0.1,0.1)
)
For illustration, we can look at just one physics parameter and one nuisance parameter:
info_demo = project_information(info, [0, 4])
info_demo_profiled = np.zeros((2,2))
info_demo_profiled[0,0] = profile_information(info_demo, [0,])
_ = plot_fisher_information_contours_2d(
[info_demo, info_demo_profiled],
inline_labels=[None, "Profiled"],
xrange=(-0.2,0.2),
yrange=(-0.6,0.6),
xlabel="Parameter of interest",
ylabel="Nuisance parameter",
)
plot_fisher_information_contours_2d?