MadMiner physics tutorial (part 2A)#

Johann Brehmer, Felix Kling, Irina Espejo, and Kyle Cranmer 2018-2019

In this second part of the tutorial, we’ll generate events and extract the observables and weights from them. You have two options: In this notebook we’ll do this at parton level, in the alternative part 2b we use Delphes.

0. Preparations#

Before you execute this notebook, make sure you have a running installation of MadGraph.

import os
import logging
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

from madminer.core import MadMiner
from madminer.lhe import LHEReader
from madminer.sampling import combine_and_shuffle
from madminer.plotting import plot_distributions
# MadMiner output
logging.basicConfig(
    format='%(asctime)-5.5s %(name)-20.20s %(levelname)-7.7s %(message)s',
    datefmt='%H:%M',
    level=logging.DEBUG
)

# Output of all other modules (e.g. matplotlib)
for key in logging.Logger.manager.loggerDict:
    if "madminer" not in key:
        logging.getLogger(key).setLevel(logging.WARNING)

Please enter here the environment variable pointing to your MG5 installation folder.

mg_dir = os.getenv('MG_FOLDER_PATH')

1. Generate events#

Let’s load our setup:

miner = MadMiner()
miner.load("data/setup.h5")
16:39 madminer.utils.inter DEBUG   HDF5 file does not contain is_reference field.
16:39 madminer.core        INFO    Found 2 parameters:
16:39 madminer.core        INFO       CWL2 (LHA: dim6 2, maximal power in squared ME: (2,), range: (-20.0, 20.0))
16:39 madminer.core        INFO       CPWL2 (LHA: dim6 5, maximal power in squared ME: (2,), range: (-20.0, 20.0))
16:39 madminer.core        INFO    Found 6 benchmarks:
16:39 madminer.core        INFO       sm: CWL2 = 0.00e+00, CPWL2 = 0.00e+00
16:39 madminer.core        INFO       w: CWL2 = 15.20, CPWL2 = 0.10
16:39 madminer.core        INFO       neg_w: CWL2 = -1.54e+01, CPWL2 = 0.20
16:39 madminer.core        INFO       ww: CWL2 = 0.30, CPWL2 = 15.10
16:39 madminer.core        INFO       neg_ww: CWL2 = 0.40, CPWL2 = -1.53e+01
16:39 madminer.core        INFO       morphing_basis_vector_5: CWL2 = 16.88, CPWL2 = 14.95
16:39 madminer.core        INFO    Found morphing setup with 6 components
16:39 madminer.core        INFO    Did not find systematics setup.

In a next step, MadMiner starts MadGraph to generate events and calculate the weights. You can use run() or run_multiple(); the latter allows to generate different runs with different run cards and optimizing the phase space for different benchmark points.

In either case, you have to provide paths to the process card, run card, param card (the entries corresponding to the parameters of interest will be automatically adapted), and an empty reweight card. Log files in the log_directory folder collect the MadGraph output and are important for debugging.

The sample_benchmark (or in the case of run_all, sample_benchmarks) option can be used to specify which benchmark should be used for sampling, i.e. for which benchmark point the phase space is optimized. If you just use one benchmark, reweighting to far-away points in parameter space can lead to large event weights and thus large statistical fluctuations. It is therefore often a good idea to combine a lot of events at the “reference hypothesis” (for us the SM) and smaller samples from other benchmarks that span the parameter space.

One slight annoyance is that MadGraph only supports Python 2. The run() and run_multiple() commands have a keyword initial_command that let you load a virtual environment in which python maps to Python 2 (which is what we do below). Alternatively / additionally you can set python2_override=True, which calls python2.7 instead of python to start MadGraph.

miner.run(
    sample_benchmark='sm',
    mg_directory=mg_dir,
    mg_process_directory='./mg_processes/signal1',
    proc_card_file='cards/proc_card_signal.dat',
    param_card_template_file='cards/param_card_template.dat',
    run_card_file='cards/run_card_signal_large.dat',
    log_directory='logs/signal',
    initial_command="source activate python2"
)
16:39 madminer.utils.inter INFO    Generating MadGraph process folder from cards/proc_card_signal.dat at ./mg_processes/signal1
16:39 madminer.core        INFO    Run 0
16:39 madminer.core        INFO      Sampling from benchmark: sm
16:39 madminer.core        INFO      Original run card:       cards/run_card_signal_large.dat
16:39 madminer.core        INFO      Original Pythia8 card:   None
16:39 madminer.core        INFO      Original config card:    None
16:39 madminer.core        INFO      Copied run card:         /madminer/cards/run_card_0.dat
16:39 madminer.core        INFO      Copied Pythia8 card:     None
16:39 madminer.core        INFO      Copied config card:      None
16:39 madminer.core        INFO      Param card:              /madminer/cards/param_card_0.dat
16:39 madminer.core        INFO      Reweight card:           /madminer/cards/reweight_card_0.dat
16:39 madminer.core        INFO      Log file:                run_0.log
16:39 madminer.core        INFO    Creating param and reweight cards in ./mg_processes/signal1//madminer/cards/param_card_0.dat, ./mg_processes/signal1//madminer/cards/reweight_card_0.dat
16:39 madminer.utils.inter INFO    Starting MadGraph and Pythia in ./mg_processes/signal1
16:52 madminer.core        INFO    Finished running MadGraph! Please check that events were succesfully generated in the following folders:

./mg_processes/signal1/Events/run_01
additional_benchmarks = ['w', 'ww', 'neg_w', 'neg_ww']
miner.run_multiple(
    sample_benchmarks=additional_benchmarks,
    mg_directory=mg_dir,
    mg_process_directory='./mg_processes/signal2',
    proc_card_file='cards/proc_card_signal.dat',
    param_card_template_file='cards/param_card_template.dat',
    run_card_files=['cards/run_card_signal_small.dat'],
    log_directory='logs/signal',
    initial_command="source activate python2"
)
16:52 madminer.utils.inter INFO    Generating MadGraph process folder from cards/proc_card_signal.dat at ./mg_processes/signal2
16:52 madminer.core        INFO    Run 0
16:52 madminer.core        INFO      Sampling from benchmark: w
16:52 madminer.core        INFO      Original run card:       cards/run_card_signal_small.dat
16:52 madminer.core        INFO      Original Pythia8 card:   None
16:52 madminer.core        INFO      Original config card:    None
16:52 madminer.core        INFO      Copied run card:         /madminer/cards/run_card_0.dat
16:52 madminer.core        INFO      Copied Pythia8 card:     None
16:52 madminer.core        INFO      Copied config card:      None
16:52 madminer.core        INFO      Param card:              /madminer/cards/param_card_0.dat
16:52 madminer.core        INFO      Reweight card:           /madminer/cards/reweight_card_0.dat
16:52 madminer.core        INFO      Log file:                run_0.log
16:52 madminer.core        INFO    Creating param and reweight cards in ./mg_processes/signal2//madminer/cards/param_card_0.dat, ./mg_processes/signal2//madminer/cards/reweight_card_0.dat
16:52 madminer.utils.inter INFO    Starting MadGraph and Pythia in ./mg_processes/signal2
16:55 madminer.core        INFO    Run 1
16:55 madminer.core        INFO      Sampling from benchmark: ww
16:55 madminer.core        INFO      Original run card:       cards/run_card_signal_small.dat
16:55 madminer.core        INFO      Original Pythia8 card:   None
16:55 madminer.core        INFO      Original config card:    None
16:55 madminer.core        INFO      Copied run card:         /madminer/cards/run_card_1.dat
16:55 madminer.core        INFO      Copied Pythia8 card:     None
16:55 madminer.core        INFO      Copied config card:      None
16:55 madminer.core        INFO      Param card:              /madminer/cards/param_card_1.dat
16:55 madminer.core        INFO      Reweight card:           /madminer/cards/reweight_card_1.dat
16:55 madminer.core        INFO      Log file:                run_1.log
16:55 madminer.core        INFO    Creating param and reweight cards in ./mg_processes/signal2//madminer/cards/param_card_1.dat, ./mg_processes/signal2//madminer/cards/reweight_card_1.dat
16:55 madminer.utils.inter INFO    Starting MadGraph and Pythia in ./mg_processes/signal2
16:57 madminer.core        INFO    Run 2
16:57 madminer.core        INFO      Sampling from benchmark: neg_w
16:57 madminer.core        INFO      Original run card:       cards/run_card_signal_small.dat
16:57 madminer.core        INFO      Original Pythia8 card:   None
16:57 madminer.core        INFO      Original config card:    None
16:57 madminer.core        INFO      Copied run card:         /madminer/cards/run_card_2.dat
16:57 madminer.core        INFO      Copied Pythia8 card:     None
16:57 madminer.core        INFO      Copied config card:      None
16:57 madminer.core        INFO      Param card:              /madminer/cards/param_card_2.dat
16:57 madminer.core        INFO      Reweight card:           /madminer/cards/reweight_card_2.dat
16:57 madminer.core        INFO      Log file:                run_2.log
16:57 madminer.core        INFO    Creating param and reweight cards in ./mg_processes/signal2//madminer/cards/param_card_2.dat, ./mg_processes/signal2//madminer/cards/reweight_card_2.dat
16:57 madminer.utils.inter INFO    Starting MadGraph and Pythia in ./mg_processes/signal2
17:00 madminer.core        INFO    Run 3
17:00 madminer.core        INFO      Sampling from benchmark: neg_ww
17:00 madminer.core        INFO      Original run card:       cards/run_card_signal_small.dat
17:00 madminer.core        INFO      Original Pythia8 card:   None
17:00 madminer.core        INFO      Original config card:    None
17:00 madminer.core        INFO      Copied run card:         /madminer/cards/run_card_3.dat
17:00 madminer.core        INFO      Copied Pythia8 card:     None
17:00 madminer.core        INFO      Copied config card:      None
17:00 madminer.core        INFO      Param card:              /madminer/cards/param_card_3.dat
17:00 madminer.core        INFO      Reweight card:           /madminer/cards/reweight_card_3.dat
17:00 madminer.core        INFO      Log file:                run_3.log
17:00 madminer.core        INFO    Creating param and reweight cards in ./mg_processes/signal2//madminer/cards/param_card_3.dat, ./mg_processes/signal2//madminer/cards/reweight_card_3.dat
17:00 madminer.utils.inter INFO    Starting MadGraph and Pythia in ./mg_processes/signal2
17:03 madminer.core        INFO    Finished running MadGraph! Please check that events were succesfully generated in the following folders:

./mg_processes/signal2/Events/run_01
./mg_processes/signal2/Events/run_02
./mg_processes/signal2/Events/run_03
./mg_processes/signal2/Events/run_04

This will take a moment – time for a coffee break!

After running any event generation through MadMiner, you should check whether the run succeeded: are the usual output files there, do the log files show any error messages? MadMiner does not (yet) perform any explicit checks, and if something went wrong in the event generation, it will only notice later when trying to load the event files.

Backgrounds#

We can also easily add other processes like backgrounds. An important option is the is_background keyword, which should be used for processes that do not depend on the parameters theta. is_background=True will disable the reweighting and re-use the same weights for all cross sections.

To reduce the runtime of the notebook, the background part is commented out here. Feel free to activate it and let it run during a lunch break.

"""
miner.run(
    is_background=True,
    sample_benchmark='sm',
    mg_directory=mg_dir,
    mg_process_directory='./mg_processes/background',
    proc_card_file='cards/proc_card_background.dat',
    param_card_template_file='cards/param_card_template.dat',
    run_card_file='cards/run_card_background.dat',
    log_directory='logs/background',
)
"""
u"\nminer.run(\n    is_background=True,\n    sample_benchmark='sm',\n    mg_directory=mg_dir,\n    mg_process_directory='./mg_processes/background',\n    proc_card_file='cards/proc_card_background.dat',\n    param_card_template_file='cards/param_card_template.dat',\n    run_card_file='cards/run_card_background.dat',\n    log_directory='logs/background',\n)\n"

Finally, note that both MadMiner.run() and MadMiner.run_multiple() have a only_create_script keyword. If that is set to True, MadMiner will not start the event generation directly, but prepare folders with all the right settings and ready-to-run bash scripts. This might make it much easier to generate Events on a high-performance computing system.

2. Prepare analysis of the LHE samples#

The madminer.lhe submodule allows us to extract observables directly from the parton-level LHE samples, including an approximate description of the detector response with smearing functions. The central object is an instance of the LHEProcessor class, which has to be initialized with a MadMiner file:

lhe = LHEReader('data/setup.h5')
17:03 madminer.utils.inter DEBUG   HDF5 file does not contain is_reference field.

After creating the LHEReader object, one can add a number of event samples (the output of running MadGraph in step 1) with the add_sample() function.

In addition, you have to provide the information which sample was generated from which benchmark with the sampled_from_benchmark keyword, and set is_background=True for all background samples.

lhe.add_sample(
    lhe_filename='mg_processes/signal1/Events/run_01/unweighted_events.lhe.gz',
    sampled_from_benchmark='sm',
    is_background=False,
    k_factor=1.,
)
for i, benchmark in enumerate(additional_benchmarks):
    lhe.add_sample(
        lhe_filename='mg_processes/signal2/Events/run_0{}/unweighted_events.lhe.gz'.format(i+1),
        sampled_from_benchmark=benchmark,
        is_background=False,
        k_factor=1.,
    )

"""
lhe.add_sample(
    lhe_filename='mg_processes/background/Events/run_01/unweighted_events.lhe.gz',
    sampled_from_benchmark='sm',
    is_background=True,
    k_factor=1.0,
"""
17:03 madminer.lhe         DEBUG   Adding event sample mg_processes/signal1/Events/run_01/unweighted_events.lhe.gz
17:03 madminer.lhe         DEBUG   Adding event sample mg_processes/signal2/Events/run_01/unweighted_events.lhe.gz
17:03 madminer.lhe         DEBUG   Adding event sample mg_processes/signal2/Events/run_02/unweighted_events.lhe.gz
17:03 madminer.lhe         DEBUG   Adding event sample mg_processes/signal2/Events/run_03/unweighted_events.lhe.gz
17:03 madminer.lhe         DEBUG   Adding event sample mg_processes/signal2/Events/run_04/unweighted_events.lhe.gz
u"\nlhe.add_sample(\n    lhe_filename='mg_processes/background/Events/run_01/unweighted_events.lhe.gz',\n    sampled_from_benchmark='sm',\n    is_background=True,\n    k_factor=1.0,\n"

3. Smearing functions to model the detector response#

Now we have to define the smearing functions that are used (in lieu of a proper shower and detector simulation). Here we will assume a simple 10% uncertainty on the jet energy measurements and a \(\pm 0.1\) smearing for jet \(\eta\) and \(\phi\). The transverse momenta of the jets are then derived from the smeared energy and the on-shell condition for the quarks (this is what pt_resolution_abs=None does). The photons from the Higgs are assumed to be measured perfectly (otherwise we’d have to call set_smearing another time with pdgis=[22]).

lhe.set_smearing(
    pdgids=[1,2,3,4,5,6,9,21,-1,-2,-3,-4,-5,-6],   # Partons giving rise to jets
    energy_resolution_abs=0.,
    energy_resolution_rel=0.1,
    pt_resolution_abs=None,
    pt_resolution_rel=None,
    eta_resolution_abs=0.1,
    eta_resolution_rel=0.,
    phi_resolution_abs=0.1,
    phi_resolution_rel=0.,
)

In addition, we can define noise that only affects MET. This adds Gaussian noise with mean 0 and std abs_ + rel * HT to MET_x and MET_y separately.

lhe.set_met_noise(abs_=10., rel=0.05)

4. Observables and cuts#

The next step is the definition of observables, either through a Python function or an expression that can be evaluated. Here we demonstrate the latter, which is implemented in add_observable(). In the expression string, you can use the terms j[i], e[i], mu[i], a[i], met, where the indices i refer to a ordering by the transverse momentum. In addition, you can use p[i], which denotes the i-th particle in the order given in the LHE sample (which is the order in which the final-state particles where defined in MadGraph).

All of these represent objects inheriting from scikit-hep LorentzVectors, see the link for a documentation of their properties. In addition, they have charge and pdg_id properties.

add_observable() has an optional keyword required. If required=True, we will only keep events where the observable can be parsed, i.e. all involved particles have been detected. If required=False, un-parseable observables will be filled with the value of another keyword default.

In a realistic project, you would want to add a large number of observables that capture all information in your events. Here we will just define two observables, the transverse momentum of the leading (= higher-pT) jet, and the azimuthal angle between the two leading jets.

lhe.add_observable(
    'pt_j1',
    'j[0].pt',
    required=False,
    default=0.,
)
lhe.add_observable(
    'delta_phi_jj',
    'j[0].deltaphi(j[1]) * (-1. + 2.*float(j[0].eta > j[1].eta))',
    required=True,
)
lhe.add_observable(
    'met',
    'met.pt',
    required=True,
)
17:03 madminer.lhe         DEBUG   Adding optional observable pt_j1 = j[0].pt with default 0.0
17:03 madminer.lhe         DEBUG   Adding required observable delta_phi_jj = j[0].deltaphi(j[1]) * (-1. + 2.*float(j[0].eta > j[1].eta))
17:03 madminer.lhe         DEBUG   Adding required observable met = met.pt

We can also add cuts, again in parse-able strings. In addition to the objects discussed above, they can contain the observables:

lhe.add_cut('(a[0] + a[1]).m > 122.')
lhe.add_cut('(a[0] + a[1]).m < 128.')
lhe.add_cut('pt_j1 > 20.')
17:03 madminer.lhe         DEBUG   Adding cut (a[0] + a[1]).m > 122.
17:03 madminer.lhe         DEBUG   Adding cut (a[0] + a[1]).m < 128.
17:03 madminer.lhe         DEBUG   Adding cut pt_j1 > 20.

5. Run analysis and store processes events#

The function analyse_samples then calculates all observables from the LHE file(s) generated before, applies the smearing, and checks which events pass the cuts:

lhe.analyse_samples()
17:03 madminer.lhe         INFO    Analysing LHE sample mg_processes/signal1/Events/run_01/unweighted_events.lhe.gz: Calculating 3 observables, requiring 3 selection cuts, using 0 efficiency factors, associated with no systematics
17:03 madminer.lhe         DEBUG   Extracting nuisance parameter definitions from LHE file
17:03 madminer.utils.inter DEBUG   Parsing nuisance parameter setup from LHE file at mg_processes/signal1/Events/run_01/unweighted_events.lhe.gz
17:03 madminer.utils.inter DEBUG   Systematics setup: OrderedDict()
17:03 madminer.utils.inter DEBUG   1 weight groups
17:03 madminer.lhe         DEBUG   systematics_dict: OrderedDict()
17:03 madminer.utils.inter DEBUG   Parsing LHE file mg_processes/signal1/Events/run_01/unweighted_events.lhe.gz
17:03 madminer.utils.inter DEBUG   Parsing header and events as XML with cElementTree
17:03 madminer.utils.inter DEBUG   Found entry event_norm = sum in LHE header. Interpreting this as weight_norm_is_average = False.
17:03 madminer.utils.inter DEBUG   Event 1 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 2 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 3 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 4 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 5 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 6 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 7 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 8 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 9 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 10 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 11 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 12 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 13 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 14 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 15 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 16 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 17 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 18 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 19 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 20 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter INFO      50000 / 50000 events pass cut (a[0] + a[1]).m > 122.
17:03 madminer.utils.inter INFO      50000 / 50000 events pass cut (a[0] + a[1]).m < 128.
17:03 madminer.utils.inter INFO      49991 / 50000 events pass cut pt_j1 > 20.
17:03 madminer.utils.inter INFO      49991 events pass all cuts/efficiencies
17:03 madminer.lhe         DEBUG   Found weights [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5'] in LHE file
17:03 madminer.lhe         DEBUG   Found 49991 events
17:03 madminer.lhe         INFO    Analysing LHE sample mg_processes/signal2/Events/run_01/unweighted_events.lhe.gz: Calculating 3 observables, requiring 3 selection cuts, using 0 efficiency factors, associated with no systematics
17:03 madminer.lhe         DEBUG   Extracting nuisance parameter definitions from LHE file
17:03 madminer.utils.inter DEBUG   Parsing nuisance parameter setup from LHE file at mg_processes/signal2/Events/run_01/unweighted_events.lhe.gz
17:03 madminer.utils.inter DEBUG   Systematics setup: OrderedDict()
17:03 madminer.utils.inter DEBUG   1 weight groups
17:03 madminer.lhe         DEBUG   systematics_dict: OrderedDict()
17:03 madminer.utils.inter DEBUG   Parsing LHE file mg_processes/signal2/Events/run_01/unweighted_events.lhe.gz
17:03 madminer.utils.inter DEBUG   Parsing header and events as XML with cElementTree
17:03 madminer.utils.inter DEBUG   Found entry event_norm = sum in LHE header. Interpreting this as weight_norm_is_average = False.
17:03 madminer.utils.inter DEBUG   Event 1 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 2 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 3 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 4 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 5 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 6 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 7 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 8 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 9 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 10 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 11 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 12 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 13 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 14 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 15 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 16 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 17 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 18 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 19 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 20 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter INFO      10000 / 10000 events pass cut (a[0] + a[1]).m > 122.
17:03 madminer.utils.inter INFO      10000 / 10000 events pass cut (a[0] + a[1]).m < 128.
17:03 madminer.utils.inter INFO      10000 / 10000 events pass cut pt_j1 > 20.
17:03 madminer.utils.inter INFO      10000 events pass all cuts/efficiencies
17:03 madminer.lhe         DEBUG   Found weights [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5'] in LHE file
17:03 madminer.lhe         DEBUG   Found 10000 events
17:03 root                 DEBUG   Merging data extracted from this file with data from previous files
17:03 root                 DEBUG     Weights for benchmark sm exist in both
17:03 root                 DEBUG     Weights for benchmark w exist in both
17:03 root                 DEBUG     Weights for benchmark neg_w exist in both
17:03 root                 DEBUG     Weights for benchmark ww exist in both
17:03 root                 DEBUG     Weights for benchmark neg_ww exist in both
17:03 root                 DEBUG     Weights for benchmark morphing_basis_vector_5 exist in both
17:03 madminer.lhe         INFO    Analysing LHE sample mg_processes/signal2/Events/run_02/unweighted_events.lhe.gz: Calculating 3 observables, requiring 3 selection cuts, using 0 efficiency factors, associated with no systematics
17:03 madminer.lhe         DEBUG   Extracting nuisance parameter definitions from LHE file
17:03 madminer.utils.inter DEBUG   Parsing nuisance parameter setup from LHE file at mg_processes/signal2/Events/run_02/unweighted_events.lhe.gz
17:03 madminer.utils.inter DEBUG   Systematics setup: OrderedDict()
17:03 madminer.utils.inter DEBUG   1 weight groups
17:03 madminer.lhe         DEBUG   systematics_dict: OrderedDict()
17:03 madminer.utils.inter DEBUG   Parsing LHE file mg_processes/signal2/Events/run_02/unweighted_events.lhe.gz
17:03 madminer.utils.inter DEBUG   Parsing header and events as XML with cElementTree
17:03 madminer.utils.inter DEBUG   Found entry event_norm = sum in LHE header. Interpreting this as weight_norm_is_average = False.
17:03 madminer.utils.inter DEBUG   Event 1 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 2 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 3 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 4 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 5 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 6 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 7 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 8 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 9 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 10 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 11 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 12 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 13 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 14 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 15 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 16 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 17 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 18 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 19 passes observations, passes cuts, passes efficiencies -> passes
17:03 madminer.utils.inter DEBUG   Event 20 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter INFO      10000 / 10000 events pass cut (a[0] + a[1]).m > 122.
17:04 madminer.utils.inter INFO      10000 / 10000 events pass cut (a[0] + a[1]).m < 128.
17:04 madminer.utils.inter INFO      10000 / 10000 events pass cut pt_j1 > 20.
17:04 madminer.utils.inter INFO      10000 events pass all cuts/efficiencies
17:04 madminer.lhe         DEBUG   Found weights [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5'] in LHE file
17:04 madminer.lhe         DEBUG   Found 10000 events
17:04 root                 DEBUG   Merging data extracted from this file with data from previous files
17:04 root                 DEBUG     Weights for benchmark sm exist in both
17:04 root                 DEBUG     Weights for benchmark w exist in both
17:04 root                 DEBUG     Weights for benchmark neg_w exist in both
17:04 root                 DEBUG     Weights for benchmark ww exist in both
17:04 root                 DEBUG     Weights for benchmark neg_ww exist in both
17:04 root                 DEBUG     Weights for benchmark morphing_basis_vector_5 exist in both
17:04 madminer.lhe         INFO    Analysing LHE sample mg_processes/signal2/Events/run_03/unweighted_events.lhe.gz: Calculating 3 observables, requiring 3 selection cuts, using 0 efficiency factors, associated with no systematics
17:04 madminer.lhe         DEBUG   Extracting nuisance parameter definitions from LHE file
17:04 madminer.utils.inter DEBUG   Parsing nuisance parameter setup from LHE file at mg_processes/signal2/Events/run_03/unweighted_events.lhe.gz
17:04 madminer.utils.inter DEBUG   Systematics setup: OrderedDict()
17:04 madminer.utils.inter DEBUG   1 weight groups
17:04 madminer.lhe         DEBUG   systematics_dict: OrderedDict()
17:04 madminer.utils.inter DEBUG   Parsing LHE file mg_processes/signal2/Events/run_03/unweighted_events.lhe.gz
17:04 madminer.utils.inter DEBUG   Parsing header and events as XML with cElementTree
17:04 madminer.utils.inter DEBUG   Found entry event_norm = sum in LHE header. Interpreting this as weight_norm_is_average = False.
17:04 madminer.utils.inter DEBUG   Event 1 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 2 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 3 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 4 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 5 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 6 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 7 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 8 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 9 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 10 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 11 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 12 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 13 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 14 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 15 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 16 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 17 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 18 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 19 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 20 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter INFO      10000 / 10000 events pass cut (a[0] + a[1]).m > 122.
17:04 madminer.utils.inter INFO      10000 / 10000 events pass cut (a[0] + a[1]).m < 128.
17:04 madminer.utils.inter INFO      10000 / 10000 events pass cut pt_j1 > 20.
17:04 madminer.utils.inter INFO      10000 events pass all cuts/efficiencies
17:04 madminer.lhe         DEBUG   Found weights [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5'] in LHE file
17:04 madminer.lhe         DEBUG   Found 10000 events
17:04 root                 DEBUG   Merging data extracted from this file with data from previous files
17:04 root                 DEBUG     Weights for benchmark sm exist in both
17:04 root                 DEBUG     Weights for benchmark w exist in both
17:04 root                 DEBUG     Weights for benchmark neg_w exist in both
17:04 root                 DEBUG     Weights for benchmark ww exist in both
17:04 root                 DEBUG     Weights for benchmark neg_ww exist in both
17:04 root                 DEBUG     Weights for benchmark morphing_basis_vector_5 exist in both
17:04 madminer.lhe         INFO    Analysing LHE sample mg_processes/signal2/Events/run_04/unweighted_events.lhe.gz: Calculating 3 observables, requiring 3 selection cuts, using 0 efficiency factors, associated with no systematics
17:04 madminer.lhe         DEBUG   Extracting nuisance parameter definitions from LHE file
17:04 madminer.utils.inter DEBUG   Parsing nuisance parameter setup from LHE file at mg_processes/signal2/Events/run_04/unweighted_events.lhe.gz
17:04 madminer.utils.inter DEBUG   Systematics setup: OrderedDict()
17:04 madminer.utils.inter DEBUG   1 weight groups
17:04 madminer.lhe         DEBUG   systematics_dict: OrderedDict()
17:04 madminer.utils.inter DEBUG   Parsing LHE file mg_processes/signal2/Events/run_04/unweighted_events.lhe.gz
17:04 madminer.utils.inter DEBUG   Parsing header and events as XML with cElementTree
17:04 madminer.utils.inter DEBUG   Found entry event_norm = sum in LHE header. Interpreting this as weight_norm_is_average = False.
17:04 madminer.utils.inter DEBUG   Event 1 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 2 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 3 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 4 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 5 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 6 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 7 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 8 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 9 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 10 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 11 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 12 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 13 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 14 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 15 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 16 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 17 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 18 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 19 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter DEBUG   Event 20 passes observations, passes cuts, passes efficiencies -> passes
17:04 madminer.utils.inter INFO      10000 / 10000 events pass cut (a[0] + a[1]).m > 122.
17:04 madminer.utils.inter INFO      10000 / 10000 events pass cut (a[0] + a[1]).m < 128.
17:04 madminer.utils.inter INFO      10000 / 10000 events pass cut pt_j1 > 20.
17:04 madminer.utils.inter INFO      10000 events pass all cuts/efficiencies
17:04 madminer.lhe         DEBUG   Found weights [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5'] in LHE file
17:04 madminer.lhe         DEBUG   Found 10000 events
17:04 root                 DEBUG   Merging data extracted from this file with data from previous files
17:04 root                 DEBUG     Weights for benchmark sm exist in both
17:04 root                 DEBUG     Weights for benchmark w exist in both
17:04 root                 DEBUG     Weights for benchmark neg_w exist in both
17:04 root                 DEBUG     Weights for benchmark ww exist in both
17:04 root                 DEBUG     Weights for benchmark neg_ww exist in both
17:04 root                 DEBUG     Weights for benchmark morphing_basis_vector_5 exist in both
17:04 madminer.lhe         INFO    Analysed number of events per sampling benchmark:
17:04 madminer.lhe         INFO      49991 from sm
17:04 madminer.lhe         INFO      10000 from w
17:04 madminer.lhe         INFO      10000 from neg_w
17:04 madminer.lhe         INFO      10000 from ww
17:04 madminer.lhe         INFO      10000 from neg_ww

The values of the observables and the weights are then saved in the HDF5 file. It is possible to overwrite the same file, or to leave the original file intact and save all the data into a new file as follows:

lhe.save('data/lhe_data.h5')
17:04 madminer.lhe         DEBUG   Loading HDF5 data from data/setup.h5 and saving file to data/lhe_data.h5
17:04 madminer.lhe         DEBUG   Weight names: [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5']
17:04 madminer.utils.inter DEBUG   HDF5 file does not contain is_reference field.
17:04 madminer.utils.inter DEBUG   Benchmark morphing_basis_vector_5 already in benchmark_names_phys
17:04 madminer.utils.inter DEBUG   Benchmark neg_w already in benchmark_names_phys
17:04 madminer.utils.inter DEBUG   Benchmark neg_ww already in benchmark_names_phys
17:04 madminer.utils.inter DEBUG   Benchmark sm already in benchmark_names_phys
17:04 madminer.utils.inter DEBUG   Benchmark w already in benchmark_names_phys
17:04 madminer.utils.inter DEBUG   Benchmark ww already in benchmark_names_phys
17:04 madminer.utils.inter DEBUG   Combined benchmark names: [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5']
17:04 madminer.utils.inter DEBUG   Combined is_nuisance: [0 0 0 0 0 0]
17:04 madminer.utils.inter DEBUG   Combined is_reference: [1 0 0 0 0 0]
17:04 madminer.utils.inter DEBUG   Weight names found in event file: [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5']
17:04 madminer.utils.inter DEBUG   Benchmarks found in MadMiner file: [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5']
17:04 madminer.sampling    DEBUG   Combining and shuffling samples
17:04 madminer.sampling    DEBUG   Copying setup from data/lhe_data.h5 to data/lhe_data.h5
17:04 madminer.sampling    DEBUG   Loading samples from file 1 / 1 at data/lhe_data.h5, multiplying weights with k factor 1.0
17:04 madminer.sampling    DEBUG   Sampling benchmarks: [0 0 0 ... 4 4 4]
17:04 madminer.sampling    DEBUG   Combined sampling benchmarks: [0 0 0 ... 4 4 4]
17:04 madminer.sampling    DEBUG   Recalculated event numbers per benchmark: [49991 10000 10000 10000 10000     0], background: 0

6. Plot distributions#

Let’s see what our MC run produced:

_ = plot_distributions(
    filename='data/lhe_data.h5',
    parameter_points=['sm', np.array([10.,0.])],
    line_labels=['SM', 'BSM'],
    uncertainties='none',
    n_bins=20,
    n_cols=3,
    normalize=True,
    sample_only_from_closest_benchmark=True
)
17:04 madminer.analysis    INFO    Loading data from data/lhe_data.h5
17:04 madminer.analysis    INFO    Found 2 parameters
17:04 madminer.analysis    DEBUG      CWL2 (LHA: dim6 2, maximal power in squared ME: (2,), range: (-20.0, 20.0))
17:04 madminer.analysis    DEBUG      CPWL2 (LHA: dim6 5, maximal power in squared ME: (2,), range: (-20.0, 20.0))
17:04 madminer.analysis    INFO    Did not find nuisance parameters
17:04 madminer.analysis    INFO    Found 6 benchmarks, of which 6 physical
17:04 madminer.analysis    DEBUG      sm: CWL2 = 0.00e+00, CPWL2 = 0.00e+00
17:04 madminer.analysis    DEBUG      w: CWL2 = 15.20, CPWL2 = 0.10
17:04 madminer.analysis    DEBUG      neg_w: CWL2 = -1.54e+01, CPWL2 = 0.20
17:04 madminer.analysis    DEBUG      ww: CWL2 = 0.30, CPWL2 = 15.10
17:04 madminer.analysis    DEBUG      neg_ww: CWL2 = 0.40, CPWL2 = -1.53e+01
17:04 madminer.analysis    DEBUG      morphing_basis_vector_5: CWL2 = 16.88, CPWL2 = 14.95
17:04 madminer.analysis    INFO    Found 3 observables
17:04 madminer.analysis    DEBUG      0 pt_j1
17:04 madminer.analysis    DEBUG      1 delta_phi_jj
17:04 madminer.analysis    DEBUG      2 met
17:04 madminer.analysis    INFO    Found 89991 events
17:04 madminer.analysis    INFO      49991 signal events sampled from benchmark sm
17:04 madminer.analysis    INFO      10000 signal events sampled from benchmark w
17:04 madminer.analysis    INFO      10000 signal events sampled from benchmark neg_w
17:04 madminer.analysis    INFO      10000 signal events sampled from benchmark ww
17:04 madminer.analysis    INFO      10000 signal events sampled from benchmark neg_ww
17:04 madminer.analysis    INFO    Found morphing setup with 6 components
17:04 madminer.analysis    INFO    Did not find nuisance morphing setup
17:04 madminer.plotting    DEBUG   Observable indices: [0, 1, 2]
17:04 madminer.plotting    DEBUG   Calculated 2 theta matrices
17:04 madminer.analysis    DEBUG   Sampling benchmark closest to None: None
17:04 madminer.analysis    DEBUG   Events per benchmark: [49991. 10000. 10000. 10000. 10000.     0.]
17:04 madminer.plotting    DEBUG   Loaded raw data with shapes (89991, 3), (89991, 6)
17:04 madminer.analysis    DEBUG   Sampling benchmark closest to [0. 0.]: 0
17:04 madminer.analysis    DEBUG   Sampling benchmark closest to [10.  0.]: 1
17:04 madminer.plotting    DEBUG   Plotting panel 0: observable 0, label pt_j1
17:04 madminer.plotting    DEBUG   Ranges for observable pt_j1: min = [20.172650642695523, 20.172650642695523], max = [308.68458504027717, 1578.1646035292065]
17:04 madminer.plotting    DEBUG   Plotting panel 1: observable 1, label delta_phi_jj
17:04 madminer.plotting    DEBUG   Ranges for observable delta_phi_jj: min = [-3.141586511001732, -3.141586511001732], max = [3.141542330470944, 3.141542330470944]
17:04 madminer.plotting    DEBUG   Plotting panel 2: observable 2, label met
17:04 madminer.plotting    DEBUG   Ranges for observable met: min = [0.1060288929419313, 0.1060288929419313], max = [105.62721157551425, 351.0223478095247]
../../../_images/b658b7e5fcc0f1e21287ad6258e6b289a878793967d9fd8e012c468895e680c9.png

7. Combine and shuffle different samples#

To reduce disk usage, you can generate several small event samples with the steps given above, and combine them now. Note that (for now) it is essential that all of them are generated with the same setup, including the same benchmark points / morphing basis!

This is generally good practice even if you use just one sample, since the events might have some inherent ordering (e.g. from sampling from different hypotheses). Later when we split the events into a training and test fraction, such an ordering could cause problems.

combine_and_shuffle(
    ['data/lhe_data.h5'],
    'data/lhe_data_shuffled.h5'
)
17:04 madminer.sampling    DEBUG   Combining and shuffling samples
17:04 madminer.sampling    DEBUG   Copying setup from data/lhe_data.h5 to data/lhe_data_shuffled.h5
17:04 madminer.sampling    DEBUG   Loading samples from file 1 / 1 at data/lhe_data.h5, multiplying weights with k factor 1.0
17:04 madminer.sampling    DEBUG   Sampling benchmarks: [0 0 1 ... 0 2 1]
17:04 madminer.sampling    DEBUG   Combined sampling benchmarks: [0 0 1 ... 0 2 1]
17:04 madminer.sampling    DEBUG   Recalculated event numbers per benchmark: [49991 10000 10000 10000 10000     0], background: 0