MadMiner physics tutorial (part 2B)#

Johann Brehmer, Felix Kling, Irina Espejo, and Kyle Cranmer 2018-2019

In this second part of the tutorial, we’ll generate events and extract the observables and weights from them. You have two options: In this notebook we’ll do this with Delphes, in the alternative part 2a we stick to parton level.

0. Preparations#

Before you execute this notebook, make sure you have working installations of MadGraph, Pythia, and Delphes.

import os
import logging
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

from madminer.core import MadMiner
from madminer.delphes import DelphesReader
from madminer.sampling import combine_and_shuffle
from madminer.plotting import plot_distributions
# MadMiner output
logging.basicConfig(
    format='%(asctime)-5.5s %(name)-20.20s %(levelname)-7.7s %(message)s',
    datefmt='%H:%M',
    level=logging.DEBUG
)

# Output of all other modules (e.g. matplotlib)
for key in logging.Logger.manager.loggerDict:
    if "madminer" not in key:
        logging.getLogger(key).setLevel(logging.WARNING)

Please enter here the environment variable pointing to your MG5 installation folder.

mg_dir = os.getenv('MG_FOLDER_PATH')

1. Generate events#

Let’s load our setup:

miner = MadMiner()
miner.load("data/setup.h5")
16:30 madminer.utils.inter DEBUG   HDF5 file does not contain is_reference field.
16:30 madminer.core        INFO    Found 2 parameters:
16:30 madminer.core        INFO       CWL2 (LHA: dim6 2, maximal power in squared ME: (2,), range: (-20.0, 20.0))
16:30 madminer.core        INFO       CPWL2 (LHA: dim6 5, maximal power in squared ME: (2,), range: (-20.0, 20.0))
16:30 madminer.core        INFO    Found 6 benchmarks:
16:30 madminer.core        INFO       sm: CWL2 = 0.00e+00, CPWL2 = 0.00e+00
16:30 madminer.core        INFO       w: CWL2 = 15.20, CPWL2 = 0.10
16:30 madminer.core        INFO       neg_w: CWL2 = -1.54e+01, CPWL2 = 0.20
16:30 madminer.core        INFO       ww: CWL2 = 0.30, CPWL2 = 15.10
16:30 madminer.core        INFO       neg_ww: CWL2 = 0.40, CPWL2 = -1.53e+01
16:30 madminer.core        INFO       morphing_basis_vector_5: CWL2 = -1.17e+01, CPWL2 = -1.34e+01
16:30 madminer.core        INFO    Found morphing setup with 6 components
16:30 madminer.core        INFO    Did not find systematics setup.

In a next step, MadMiner starts MadGraph and Pythia to generate events and calculate the weights. You can use run() or run_multiple(); the latter allows to generate different runs with different run cards and optimizing the phase space for different benchmark points.

In either case, you have to provide paths to the process card, run card, param card (the entries corresponding to the parameters of interest will be automatically adapted), and an empty reweight card. Log files in the log_directory folder collect the MadGraph output and are important for debugging.

The sample_benchmark (or in the case of run_all, sample_benchmarks) option can be used to specify which benchmark should be used for sampling, i.e. for which benchmark point the phase space is optimized. If you just use one benchmark, reweighting to far-away points in parameter space can lead to large event weights and thus large statistical fluctuations. It is therefore often a good idea to combine at least a few different benchmarks for this option. Here we use the SM and the benchmark “w” that we defined during the setup step.

One slight annoyance is that MadGraph only supports Python 2. The run() and run_multiple() commands have a keyword initial_command that let you load a virtual environment in which python maps to Python 2 (which is what we do below). Alternatively / additionally you can set python2_override=True, which calls python2.7 instead of python to start MadGraph.

miner.run(
    sample_benchmark='sm',
    mg_directory=mg_dir,
    mg_process_directory='./mg_processes/signal_pythia',
    proc_card_file='cards/proc_card_signal.dat',
    param_card_template_file='cards/param_card_template.dat',
    pythia8_card_file='cards/pythia8_card.dat',
    run_card_file='cards/run_card_signal_small.dat',
    log_directory='logs/signal',
    initial_command="source activate python2",
)
16:30 madminer.utils.inter INFO    Generating MadGraph process folder from cards/proc_card_signal.dat at ./mg_processes/signal_pythia
16:30 madminer.core        INFO    Run 0
16:30 madminer.core        INFO      Sampling from benchmark: sm
16:30 madminer.core        INFO      Original run card:       cards/run_card_signal_small.dat
16:30 madminer.core        INFO      Original Pythia8 card:   cards/pythia8_card.dat
16:30 madminer.core        INFO      Copied run card:         /madminer/cards/run_card_0.dat
16:30 madminer.core        INFO      Copied Pythia8 card:     /madminer/cards/pythia8_card_0.dat
16:30 madminer.core        INFO      Param card:              /madminer/cards/param_card_0.dat
16:30 madminer.core        INFO      Reweight card:           /madminer/cards/reweight_card_0.dat
16:30 madminer.core        INFO      Log file:                run_0.log
16:30 madminer.core        INFO    Creating param and reweight cards in ./mg_processes/signal_pythia//madminer/cards/param_card_0.dat, ./mg_processes/signal_pythia//madminer/cards/reweight_card_0.dat
16:30 madminer.utils.inter INFO    Starting MadGraph and Pythia in ./mg_processes/signal_pythia
16:35 madminer.core        INFO    Finished running MadGraph! Please check that events were succesfully generated in the following folders:

./mg_processes/signal_pythia/Events/run_01

This will take a moment – time for a coffee break!

After running any event generation through MadMiner, you should check whether the run succeeded: are the usual output files there (LHE and HepMC), do the log files show any error messages? MadMiner does not (yet) perform any explicit checks, and if something went wrong in the event generation, it will only notice later when trying to load the event files.

Backgrounds#

We can also easily add other processes like backgrounds. An important option is the is_background keyword, which should be used for processes that do not depend on the parameters theta. is_background=True will disable the reweighting and re-use the same weights for all cross sections.

To reduce the runtime of the notebook, the background part is commented out here. Feel free to activate it and let it run during a lunch break.

"""
miner.run(
    is_background=True,
    sample_benchmark='sm',
    mg_directory=mg_dir,
    mg_process_directory='./mg_processes/background_pythia',
    proc_card_file='cards/proc_card_background.dat',
    pythia8_card_file='cards/pythia8_card.dat',
    param_card_template_file='cards/param_card_template.dat',
    run_card_file='cards/run_card_background.dat',
    log_directory='logs/background',
)
"""
u"\nminer.run(\n    is_background=True,\n    sample_benchmark='sm',\n    mg_directory=mg_dir,\n    mg_process_directory='./mg_processes/background_pythia',\n    proc_card_file='cards/proc_card_background.dat',\n    pythia8_card_file='cards/pythia8_card.dat',\n    param_card_template_file='cards/param_card_template.dat',\n    run_card_file='cards/run_card_background.dat',\n    log_directory='logs/background',\n)\n"

Finally, note that both MadMiner.run() and MadMiner.run_multiple() have a only_create_script keyword. If that is set to True, MadMiner will not start the event generation directly, but prepare folders with all the right settings and ready-to-run bash scripts. This might make it much easier to generate Events on a high-performance computing system.

2. Run Delphes#

The madminer.delphes.DelphesReader class wraps around Delphes, a popular fast detector simulation, to simulate the effects of the detector.

delphes = DelphesReader('data/setup.h5')
16:35 madminer.utils.inter DEBUG   HDF5 file does not contain is_reference field.

After creating the DelphesReader object, one can add a number of event samples (the output of running MadGraph and Pythia in step 1 above) with the add_sample() function.

In addition, you have to provide the information which sample was generated from which benchmark with the sampled_from_benchmark keyword, and set is_background=True for all background samples.

delphes.add_sample(
    lhe_filename='mg_processes/signal_pythia/Events/run_01/unweighted_events.lhe.gz',
    hepmc_filename='mg_processes/signal_pythia/Events/run_01/tag_1_pythia8_events.hepmc.gz',
    sampled_from_benchmark='sm',
    is_background=False,
    k_factor=1.1,
)

"""
delphes.add_sample(
    lhe_filename='mg_processes/background_pythia/Events/run_01/unweighted_events.lhe.gz',
    hepmc_filename='mg_processes/background_pythia/Events/run_01/tag_1_pythia8_events.hepmc.gz',
    sampled_from_benchmark='sm',
    is_background=True,
    k_factor=1.0,
"""
16:35 madminer.delphes     DEBUG   Adding event sample mg_processes/signal_pythia/Events/run_01/tag_1_pythia8_events.hepmc.gz
u"\ndelphes.add_sample(\n    lhe_filename='mg_processes/background_pythia/Events/run_01/unweighted_events.lhe.gz',\n    hepmc_filename='mg_processes/background_pythia/Events/run_01/tag_1_pythia8_events.hepmc.gz',\n    sampled_from_benchmark='sm',\n    is_background=True,\n    k_factor=1.0,\n"

Now we run Delphes on these samples (you can also do this externally and then add the keyword delphes_filename when calling DelphesReader.add_sample()):

delphes.run_delphes(
    delphes_directory=mg_dir + '/Delphes',
    delphes_card='cards/delphes_card.dat',
    log_file='logs/delphes.log',
)
16:35 madminer.delphes     INFO    Running Delphes on HepMC sample at mg_processes/signal_pythia/Events/run_01/tag_1_pythia8_events.hepmc.gz
16:35 madminer.utils.inter DEBUG   Unzipping mg_processes/signal_pythia/Events/run_01/tag_1_pythia8_events.hepmc.gz
16:36 madminer.utils.inter DEBUG   Deleting mg_processes/signal_pythia/Events/run_01/tag_1_pythia8_events.hepmc

3. Observables and cuts#

The next step is the definition of observables, either through a Python function or an expression that can be evaluated. Here we demonstrate the latter, which is implemented in add_observable(). In the expression string, you can use the terms j[i], e[i], mu[i], a[i], met, where the indices i refer to a ordering by the transverse momentum. In addition, you can use p[i], which denotes the i-th particle in the order given in the LHE sample (which is the order in which the final-state particles where defined in MadGraph).

All of these represent objects inheriting from scikit-hep LorentzVectors, see the link for a documentation of their properties. In addition, they have charge and pdg_id properties.

add_observable() has an optional keyword required. If required=True, we will only keep events where the observable can be parsed, i.e. all involved particles have been detected. If required=False, un-parseable observables will be filled with the value of another keyword default.

In a realistic project, you would want to add a large number of observables that capture all information in your events. Here we will just define two observables, the transverse momentum of the leading (= higher-pT) jet, and the azimuthal angle between the two leading jets.

delphes.add_observable(
    'pt_j1',
    'j[0].pt',
    required=False,
    default=0.,
)
delphes.add_observable(
    'delta_phi_jj',
    'j[0].deltaphi(j[1]) * (-1. + 2.*float(j[0].eta > j[1].eta))',
    required=True,
)
delphes.add_observable(
    'met',
    'met.pt',
    required=True,
)
16:36 madminer.delphes     DEBUG   Adding optional observable pt_j1 = j[0].pt with default 0.0
16:36 madminer.delphes     DEBUG   Adding required observable delta_phi_jj = j[0].deltaphi(j[1]) * (-1. + 2.*float(j[0].eta > j[1].eta))
16:36 madminer.delphes     DEBUG   Adding required observable met = met.pt

We can also add cuts, again in parse-able strings. In addition to the objects discussed above, they can contain the observables:

delphes.add_cut('(a[0] + a[1]).m > 122.')
delphes.add_cut('(a[0] + a[1]).m < 128.')
delphes.add_cut('pt_j1 > 20.')
16:36 madminer.delphes     DEBUG   Adding cut (a[0] + a[1]).m > 122.
16:36 madminer.delphes     DEBUG   Adding cut (a[0] + a[1]).m < 128.
16:36 madminer.delphes     DEBUG   Adding cut pt_j1 > 20.

4. Analyse events and store data#

The function analyse_samples then calculates all observables from the Delphes file(s) generated before and checks which events pass the cuts:

delphes.analyse_delphes_samples()
16:36 madminer.delphes     INFO    Analysing Delphes sample mg_processes/signal_pythia/Events/run_01/tag_1_pythia8_events_delphes.root
16:36 madminer.delphes     DEBUG   Extracting nuisance parameter definitions from LHE file
16:36 madminer.utils.inter DEBUG   Parsing nuisance parameter setup from LHE file at mg_processes/signal_pythia/Events/run_01/unweighted_events.lhe.gz
16:36 madminer.delphes     DEBUG   Found 0 nuisance parameters with matching benchmarks:
16:36 madminer.utils.inter DEBUG   Parsing Delphes file mg_processes/signal_pythia/Events/run_01/tag_1_pythia8_events_delphes.root
16:36 madminer.utils.inter DEBUG   Not extracting weights
16:36 madminer.utils.inter DEBUG   Found 10000 events
16:36 madminer.utils.inter DEBUG     First 10 values for observable pt_j1:
[ 84.62783051 181.8666687   97.3727417   99.76499939  92.42367554
  48.54855347  47.51708603 100.8453064  135.95497131 126.97653198]
16:36 madminer.utils.inter DEBUG     First 10 values for observable delta_phi_jj:
[-1.20398968 -2.7168507  -2.76430232 -2.48814756 -0.34777592 -0.31459451
 -3.09778506 -2.67325608  0.61564934  2.68325505]
16:36 madminer.utils.inter DEBUG     First 10 values for observable met:
[11.09266376  9.19657421 15.0783062  11.84642029 12.23457527  6.50921965
  9.26392937 10.58230686  6.73142433 29.73778152]
16:36 madminer.utils.inter DEBUG     9269 / 10000 events pass required observable delta_phi_jj
16:36 madminer.utils.inter DEBUG     10000 / 10000 events pass required observable met
16:36 madminer.utils.inter DEBUG     5354 / 10000 events pass cut (a[0] + a[1]).m > 122.
16:36 madminer.utils.inter DEBUG     5415 / 10000 events pass cut (a[0] + a[1]).m < 128.
16:36 madminer.utils.inter DEBUG     9967 / 10000 events pass cut pt_j1 > 20.
16:36 madminer.utils.inter INFO      4288 / 10000 events pass everything
16:36 madminer.delphes     DEBUG   Did not extract weights from Delphes file
16:36 madminer.delphes     DEBUG   Found 4288 events
16:36 madminer.delphes     DEBUG   Extracting weights from LHE file
16:36 madminer.utils.inter DEBUG   Parsing LHE file mg_processes/signal_pythia/Events/run_01/unweighted_events.lhe.gz
16:36 madminer.utils.inter DEBUG   Parsing header and events as XML with cElementTree
16:36 madminer.utils.inter DEBUG   Found entry event_norm = average in LHE header. Interpreting this as weight_norm_is_average = True.
16:36 madminer.delphes     DEBUG   Found weights [u'sm', 'w', 'neg_w', 'ww', 'neg_ww', 'morphing_basis_vector_5'] in LHE file
16:36 madminer.delphes     DEBUG   Applying Delphes-based cuts to LHE weights
16:36 madminer.delphes     INFO    Analysed number of events per sampling benchmark:
16:36 madminer.delphes     INFO      4288 from sm

The values of the observables and the weights are then saved in the HDF5 file. It is possible to overwrite the same file, or to leave the original file intact and save all the data into a new file as follows:

delphes.save('data/delphes_data.h5')
16:36 madminer.delphes     DEBUG   Loading HDF5 data from data/setup.h5 and saving file to data/delphes_data.h5
16:36 madminer.delphes     DEBUG   Weight names: [u'sm', 'w', 'neg_w', 'ww', 'neg_ww', 'morphing_basis_vector_5']
16:36 madminer.utils.inter DEBUG   HDF5 file does not contain is_reference field.
16:36 madminer.utils.inter DEBUG   Benchmark morphing_basis_vector_5 already in benchmark_names_phys
16:36 madminer.utils.inter DEBUG   Benchmark neg_w already in benchmark_names_phys
16:36 madminer.utils.inter DEBUG   Benchmark neg_ww already in benchmark_names_phys
16:36 madminer.utils.inter DEBUG   Benchmark sm already in benchmark_names_phys
16:36 madminer.utils.inter DEBUG   Benchmark w already in benchmark_names_phys
16:36 madminer.utils.inter DEBUG   Benchmark ww already in benchmark_names_phys
16:36 madminer.utils.inter DEBUG   Combined benchmark names: [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5']
16:36 madminer.utils.inter DEBUG   Combined is_nuisance: [0 0 0 0 0 0]
16:36 madminer.utils.inter DEBUG   Combined is_reference: [1 0 0 0 0 0]
16:36 madminer.utils.inter DEBUG   Weight names found in event file: [u'sm', 'w', 'neg_w', 'ww', 'neg_ww', 'morphing_basis_vector_5']
16:36 madminer.utils.inter DEBUG   Benchmarks found in MadMiner file: [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5']
16:36 madminer.utils.inter DEBUG   Sorted benchmarks: [u'sm', u'w', u'neg_w', u'ww', u'neg_ww', u'morphing_basis_vector_5']

5. Plot distributions#

Let’s see what our MC run produced:

_ = plot_distributions(
    filename='data/delphes_data.h5',
    parameter_points=['sm', np.array([10.,0.])],
    line_labels=['SM', 'BSM'],
    uncertainties='none',
    n_bins=20,
    n_cols=3,
    normalize=True,
)
16:36 madminer.analysis    INFO    Loading data from data/delphes_data.h5
16:36 madminer.analysis    INFO    Found 2 parameters
16:36 madminer.analysis    DEBUG      CWL2 (LHA: dim6 2, maximal power in squared ME: (2,), range: (-20.0, 20.0))
16:36 madminer.analysis    DEBUG      CPWL2 (LHA: dim6 5, maximal power in squared ME: (2,), range: (-20.0, 20.0))
16:36 madminer.analysis    INFO    Did not find nuisance parameters
16:36 madminer.analysis    INFO    Found 6 benchmarks, of which 6 physical
16:36 madminer.analysis    DEBUG      sm: CWL2 = 0.00e+00, CPWL2 = 0.00e+00
16:36 madminer.analysis    DEBUG      w: CWL2 = 15.20, CPWL2 = 0.10
16:36 madminer.analysis    DEBUG      neg_w: CWL2 = -1.54e+01, CPWL2 = 0.20
16:36 madminer.analysis    DEBUG      ww: CWL2 = 0.30, CPWL2 = 15.10
16:36 madminer.analysis    DEBUG      neg_ww: CWL2 = 0.40, CPWL2 = -1.53e+01
16:36 madminer.analysis    DEBUG      morphing_basis_vector_5: CWL2 = -1.17e+01, CPWL2 = -1.34e+01
16:36 madminer.analysis    INFO    Found 3 observables
16:36 madminer.analysis    DEBUG      0 pt_j1
16:36 madminer.analysis    DEBUG      1 delta_phi_jj
16:36 madminer.analysis    DEBUG      2 met
16:36 madminer.analysis    INFO    Found 4288 events
16:36 madminer.analysis    INFO      4288 signal events sampled from benchmark sm
16:36 madminer.analysis    INFO    Found morphing setup with 6 components
16:36 madminer.analysis    INFO    Did not find nuisance morphing setup
16:36 madminer.plotting    DEBUG   Observable indices: [0, 1, 2]
16:36 madminer.plotting    DEBUG   Calculated 2 theta matrices
16:36 madminer.analysis    DEBUG   Sampling benchmark closest to None: None
16:36 madminer.analysis    DEBUG   Events per benchmark: [4288.    0.    0.    0.    0.    0.]
16:36 madminer.analysis    DEBUG   Sampling factors: [1. 0. 0. 0. 0. 0. 1.]
16:36 madminer.utils.inter DEBUG   Sampling IDs: [0 0 0 ... 0 0 0]
16:36 madminer.utils.inter DEBUG   k-factors: [1. 1. 1. ... 1. 1. 1.]
16:36 madminer.plotting    DEBUG   Loaded raw data with shapes (4288, 3), (4288, 6)
16:36 madminer.analysis    DEBUG   Distances from [0. 0.]: [0.0, 15.200328943809078, 15.401298646542765, 15.102979838429237, 15.305227865013968, 17.77189833875769]
16:36 madminer.analysis    DEBUG   n_events_generated_per_benchmark: [4288    0    0    0    0    0]
16:36 madminer.analysis    DEBUG   Sampling benchmark closest to [0. 0.]: 0
16:36 madminer.analysis    DEBUG   Sampling factors: [1. 1. 1. 1. 1. 1. 1.]
16:36 madminer.analysis    DEBUG   Distances from [10.  0.]: [10.0, 5.200961449578337, 25.400787389370432, 17.947144619688114, 18.062391868188442, 25.487445178149255]
16:36 madminer.analysis    DEBUG   n_events_generated_per_benchmark: [4288    0    0    0    0    0]
16:36 madminer.analysis    DEBUG   Sampling benchmark closest to [10.  0.]: 0
16:36 madminer.analysis    DEBUG   Sampling factors: [1. 1. 1. 1. 1. 1. 1.]
16:36 madminer.plotting    DEBUG   Plotting panel 0: observable 0, label pt_j1
16:36 madminer.plotting    DEBUG   Ranges for observable pt_j1: min = [23.0212345123291, 23.0212345123291], max = [300.41729499795395, 714.2537841796875]
16:36 madminer.plotting    DEBUG   Plotting panel 1: observable 1, label delta_phi_jj
16:36 madminer.plotting    DEBUG   Ranges for observable delta_phi_jj: min = [-3.1396645307540894, -3.1396645307540894], max = [3.1406261588083666, 3.1406261588083666]
16:36 madminer.plotting    DEBUG   Plotting panel 2: observable 2, label met
16:36 madminer.plotting    DEBUG   Ranges for observable met: min = [0.24806453287601474, 0.24806453287601474], max = [58.92380250200134, 212.003926152471]
../../../_images/bd98479470ef62d2e85965125c21f921c87a8b794ecffb1a7b591ad9223c512f.png

6. Combine and shuffle different samples#

To reduce disk usage, you can generate several small event samples with the steps given above, and combine them now. Note that (for now) it is essential that all of them are generated with the same setup, including the same benchmark points / morphing basis!

This is generally good practice even if you use just one sample, since the events might have some inherent ordering (e.g. from sampling from different hypotheses). Later when we split the events into a training and test fraction, such an ordering could cause problems.

combine_and_shuffle(
    ['data/delphes_data.h5'],
    'data/delphes_data_shuffled.h5'
)
16:36 madminer.sampling    DEBUG   Combining and shuffling samples
16:36 madminer.sampling    INFO    Copying setup from data/delphes_data.h5 to data/delphes_data_shuffled.h5
16:36 madminer.sampling    INFO    Loading samples from file 1 / 1 at data/delphes_data.h5, multiplying weights with k factor 1.0
16:36 madminer.sampling    DEBUG   Sampling benchmarks: [0 0 0 ... 0 0 0]
16:36 madminer.sampling    DEBUG   Combined sampling benchmarks: [0 0 0 ... 0 0 0]
16:36 madminer.sampling    DEBUG   Recalculated event numbers per benchmark: [4288    0    0    0    0    0], background: 0