MadMiner physics tutorial (part 3C)

Johann Brehmer, Felix Kling, Irina Espejo, and Kyle Cranmer 2018-2019

In part 3C of this tutorial we will train a third neural estimator: this time of the likelihood function itself (rather than its ratio). We assume that you have run part 1 and 2A of this tutorial. If, instead of 2A, you have run part 2B, you just have to load a different filename later.

Preparations

import logging
import numpy as np
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

from madminer.sampling import SampleAugmenter
from madminer import sampling
from madminer.ml import LikelihoodEstimator
# MadMiner output
logging.basicConfig(
    format='%(asctime)-5.5s %(name)-20.20s %(levelname)-7.7s %(message)s',
    datefmt='%H:%M',
    level=logging.INFO
)

# Output of all other modules (e.g. matplotlib)
for key in logging.Logger.manager.loggerDict:
    if "madminer" not in key:
        logging.getLogger(key).setLevel(logging.WARNING)

1. Make (unweighted) training and test samples with augmented data

At this point, we have all the information we need from the simulations. But the data is not quite ready to be used for machine learning. The madminer.sampling class SampleAugmenter will take care of the remaining book-keeping steps before we can train our estimators:

First, it unweights the samples, i.e. for a given parameter vector theta (or a distribution p(theta)) it picks events x such that their distribution follows p(x|theta). The selected samples will all come from the event file we have so far, but their frequency is changed – some events will appear multiple times, some will disappear.

Second, SampleAugmenter calculates all the augmented data (“gold”) that is the key to our new inference methods. Depending on the specific technique, these are the joint likelihood ratio and / or the joint score. It saves all these pieces of information for the selected events in a set of numpy files that can easily be used in any machine learning framework.

sampler = SampleAugmenter('data/lhe_data_shuffled.h5')
# sampler = SampleAugmenter('data/delphes_data_shuffled.h5')
03:42 madminer.analysis    INFO    Loading data from data/lhe_data_shuffled.h5
03:42 madminer.analysis    INFO    Found 2 parameters
03:42 madminer.analysis    INFO    Did not find nuisance parameters
03:42 madminer.analysis    INFO    Found 6 benchmarks, of which 6 physical
03:42 madminer.analysis    INFO    Found 3 observables
03:42 madminer.analysis    INFO    Found 89991 events
03:42 madminer.analysis    INFO      49991 signal events sampled from benchmark sm
03:42 madminer.analysis    INFO      10000 signal events sampled from benchmark w
03:42 madminer.analysis    INFO      10000 signal events sampled from benchmark neg_w
03:42 madminer.analysis    INFO      10000 signal events sampled from benchmark ww
03:42 madminer.analysis    INFO      10000 signal events sampled from benchmark neg_ww
03:42 madminer.analysis    INFO    Found morphing setup with 6 components
03:42 madminer.analysis    INFO    Did not find nuisance morphing setup

The SampleAugmenter class defines five different high-level functions to generate train or test samples:

  • sample_train_plain(), which only saves observations x, for instance for histograms or ABC;

  • sample_train_local() for methods like SALLY and SALLINO, which will be demonstrated in the second part of the tutorial;

  • sample_train_density() for neural density estimation techniques like MAF or SCANDAL;

  • sample_train_ratio() for techniques like CARL, ROLR, CASCAL, and RASCAL, when only theta0 is parameterized;

  • sample_train_more_ratios() for the same techniques, but with both theta0 and theta1 parameterized;

  • sample_test() for the evaluation of any method.

For the arguments theta, theta0, or theta1, you can (and should!) use the helper functions benchmark(), benchmarks(), morphing_point(), morphing_points(), and random_morphing_points(), all defined in the madminer.sampling module.

Here we’ll train a likelihood estimator with the SCANDAL method, so we focus on the extract_samples_train_density() function. We’ll sample the numerator hypothesis in the likelihood ratio with 1000 points drawn from a Gaussian prior, and fix the denominator hypothesis to the SM.

Note the keyword sample_only_from_closest_benchmark=True, which makes sure that for each parameter point we only use the events that were originally (in MG) generated from the closest benchmark. This reduces the statistical fluctuations in the outcome quite a bit.

x, theta, t_xz, _ = sampler.sample_train_density(
    theta=sampling.random_morphing_points(10000, [('gaussian', 0., 0.5), ('gaussian', 0., 0.5)]),
    n_samples=500000,
    folder='./data/samples',
    filename='train_density',
    sample_only_from_closest_benchmark=True,
)
03:42 madminer.sampling    INFO    Extracting training sample for non-local score-based methods. Sampling and score evaluation according to (u'random_morphing_points', (10000, [(u'gaussian', 0.0, 0.5), (u'gaussian', 0.0, 0.5)]))
03:42 madminer.sampling    INFO    Starting sampling serially
03:42 madminer.sampling    WARNING Large statistical uncertainty on the total cross section when sampling from theta = [0.58710176 1.2685223 ]: (0.001242 +/- 0.000441) pb (35.5351101412644 %). Skipping these warnings in the future...
03:43 madminer.sampling    INFO    Sampling from parameter point 500 / 10000
03:43 madminer.sampling    INFO    Sampling from parameter point 1000 / 10000
03:44 madminer.sampling    INFO    Sampling from parameter point 1500 / 10000
03:45 madminer.sampling    INFO    Sampling from parameter point 2000 / 10000
03:45 madminer.sampling    INFO    Sampling from parameter point 2500 / 10000
03:47 madminer.sampling    INFO    Sampling from parameter point 3000 / 10000
03:48 madminer.sampling    INFO    Sampling from parameter point 3500 / 10000
03:48 madminer.sampling    INFO    Sampling from parameter point 4000 / 10000
03:49 madminer.sampling    WARNING For this value of theta, 1 / 29993 events have negative weight and will be ignored
03:49 madminer.sampling    INFO    Sampling from parameter point 4500 / 10000
03:50 madminer.sampling    INFO    Sampling from parameter point 5000 / 10000
03:50 madminer.sampling    INFO    Sampling from parameter point 5500 / 10000
03:51 madminer.sampling    INFO    Sampling from parameter point 6000 / 10000
03:52 madminer.sampling    INFO    Sampling from parameter point 6500 / 10000
03:53 madminer.sampling    INFO    Sampling from parameter point 7000 / 10000
03:53 madminer.sampling    INFO    Sampling from parameter point 7500 / 10000
03:54 madminer.sampling    INFO    Sampling from parameter point 8000 / 10000
04:55 madminer.sampling    INFO    Sampling from parameter point 8500 / 10000
05:16 madminer.sampling    INFO    Sampling from parameter point 9000 / 10000
05:17 madminer.sampling    INFO    Sampling from parameter point 9500 / 10000
05:18 madminer.sampling    INFO    Sampling from parameter point 10000 / 10000
05:18 madminer.sampling    INFO    Effective number of samples: mean 637.1639228954328, with individual thetas ranging from 11.197537298152856 to 28628.63290722551

For the evaluation we’ll need a test sample:

_ = sampler.sample_test(
    theta=sampling.benchmark('sm'),
    n_samples=1000,
    folder='./data/samples',
    filename='test'
)
05:18 madminer.sampling    INFO    Extracting evaluation sample. Sampling according to sm
05:18 madminer.sampling    INFO    Starting sampling serially
05:18 madminer.sampling    INFO    Sampling from parameter point 1 / 1
05:18 madminer.sampling    INFO    Effective number of samples: mean 10015.0, with individual thetas ranging from 10015.0 to 10015.0

2. Train likelihood estimator

It’s now time to build the neural network that estimates the likelihood ratio. The central object for this is the madminer.ml.ParameterizedRatioEstimator class. It defines functions that train, save, load, and evaluate the estimators.

In the initialization, the keywords n_hidden and activation define the architecture of the (fully connected) neural network:

estimator = LikelihoodEstimator(
    n_mades=3,
    n_hidden=(60,60),
    activation="relu"
)

To train this model we will minimize the SCANDAL loss function described in “Mining gold from implicit models to improve likelihood-free inference”.

estimator.train(
    method='scandal',
    theta='data/samples/theta_train_density.npy',
    x='data/samples/x_train_density.npy',
    t_xz='data/samples/t_xz_train_density.npy',
    alpha=10.,
    n_epochs=10,
    initial_lr=0.01,
)

estimator.save('models/scandal')
05:18 madminer.ml          INFO    Starting training
05:18 madminer.ml          INFO      Method:                 scandal
05:18 madminer.ml          INFO      alpha:                  10.0
05:18 madminer.ml          INFO      Batch size:             128
05:18 madminer.ml          INFO      Optimizer:              amsgrad
05:18 madminer.ml          INFO      Epochs:                 10
05:18 madminer.ml          INFO      Learning rate:          0.01 initially, decaying to 0.0001
05:18 madminer.ml          INFO      Validation split:       0.25
05:18 madminer.ml          INFO      Early stopping:         True
05:18 madminer.ml          INFO      Scale inputs:           True
05:18 madminer.ml          INFO      Shuffle labels          False
05:18 madminer.ml          INFO      Samples:                all
05:18 madminer.ml          INFO    Loading training data
05:18 madminer.utils.vario INFO      Loading data/samples/theta_train_density.npy into RAM
05:18 madminer.utils.vario INFO      Loading data/samples/x_train_density.npy into RAM
05:18 madminer.utils.vario INFO      Loading data/samples/t_xz_train_density.npy into RAM
05:18 madminer.ml          INFO    Found 500000 samples with 2 parameters and 3 observables
05:18 madminer.ml          INFO    Setting up input rescaling
05:18 madminer.ml          INFO    Rescaling parameters
05:18 madminer.ml          INFO    Setting up parameter rescaling
05:18 madminer.ml          INFO    Training model
05:18 madminer.utils.ml.tr INFO    Training on CPU with single precision
05:19 madminer.utils.ml.tr INFO      Epoch   1: train loss  4.92008 (nll:  3.866, mse_score:  0.105)
05:19 madminer.utils.ml.tr INFO                 val. loss   4.88054 (nll:  3.877, mse_score:  0.100)
05:20 madminer.utils.ml.tr INFO      Epoch   2: train loss  4.81460 (nll:  3.866, mse_score:  0.095)
05:20 madminer.utils.ml.tr INFO                 val. loss   4.74922 (nll:  3.872, mse_score:  0.088)
05:21 madminer.utils.ml.tr INFO      Epoch   3: train loss  4.70997 (nll:  3.839, mse_score:  0.087)
05:21 madminer.utils.ml.tr INFO                 val. loss   4.68496 (nll:  3.835, mse_score:  0.085)
05:22 madminer.utils.ml.tr INFO      Epoch   4: train loss  4.66875 (nll:  3.823, mse_score:  0.085)
05:22 madminer.utils.ml.tr INFO                 val. loss   4.61774 (nll:  3.826, mse_score:  0.079)
05:23 madminer.utils.ml.tr INFO      Epoch   5: train loss  4.62898 (nll:  3.815, mse_score:  0.081)
05:23 madminer.utils.ml.tr INFO                 val. loss   4.59818 (nll:  3.813, mse_score:  0.078)
05:24 madminer.utils.ml.tr INFO      Epoch   6: train loss  4.60716 (nll:  3.809, mse_score:  0.080)
05:24 madminer.utils.ml.tr INFO                 val. loss   4.59749 (nll:  3.814, mse_score:  0.078)
05:25 madminer.utils.ml.tr INFO      Epoch   7: train loss  4.59577 (nll:  3.806, mse_score:  0.079)
05:25 madminer.utils.ml.tr INFO                 val. loss   4.57310 (nll:  3.807, mse_score:  0.077)
05:26 madminer.utils.ml.tr INFO      Epoch   8: train loss  4.58374 (nll:  3.806, mse_score:  0.078)
05:26 madminer.utils.ml.tr INFO                 val. loss   4.56814 (nll:  3.808, mse_score:  0.076)
05:27 madminer.utils.ml.tr INFO      Epoch   9: train loss  4.58109 (nll:  3.804, mse_score:  0.078)
05:27 madminer.utils.ml.tr INFO                 val. loss   4.56638 (nll:  3.807, mse_score:  0.076)
05:28 madminer.utils.ml.tr INFO      Epoch  10: train loss  4.57864 (nll:  3.804, mse_score:  0.077)
05:28 madminer.utils.ml.tr INFO                 val. loss   4.56464 (nll:  3.806, mse_score:  0.076)
05:28 madminer.utils.ml.tr INFO    Early stopping did not improve performance
05:28 madminer.utils.ml.tr INFO    Training time spend on:
05:28 madminer.utils.ml.tr INFO                      initialize model:   0.00h
05:28 madminer.utils.ml.tr INFO                                   ALL:   0.16h
05:28 madminer.utils.ml.tr INFO                            check data:   0.00h
05:28 madminer.utils.ml.tr INFO                          make dataset:   0.00h
05:28 madminer.utils.ml.tr INFO                       make dataloader:   0.00h
05:28 madminer.utils.ml.tr INFO                       setup optimizer:   0.00h
05:28 madminer.utils.ml.tr INFO                   initialize training:   0.00h
05:28 madminer.utils.ml.tr INFO                                set lr:   0.00h
05:28 madminer.utils.ml.tr INFO                   load training batch:   0.02h
05:28 madminer.utils.ml.tr INFO                        fwd: move data:   0.00h
05:28 madminer.utils.ml.tr INFO                   fwd: check for nans:   0.01h
05:28 madminer.utils.ml.tr INFO                    fwd: model.forward:   0.06h
05:28 madminer.utils.ml.tr INFO                 fwd: calculate losses:   0.00h
05:28 madminer.utils.ml.tr INFO                 training forward pass:   0.06h
05:28 madminer.utils.ml.tr INFO                   training sum losses:   0.00h
05:28 madminer.utils.ml.tr INFO                        opt: zero grad:   0.00h
05:28 madminer.utils.ml.tr INFO                         opt: backward:   0.04h
05:28 madminer.utils.ml.tr INFO                   opt: clip grad norm:   0.00h
05:28 madminer.utils.ml.tr INFO                             opt: step:   0.01h
05:28 madminer.utils.ml.tr INFO                        optimizer step:   0.06h
05:28 madminer.utils.ml.tr INFO                 load validation batch:   0.01h
05:28 madminer.utils.ml.tr INFO               validation forward pass:   0.02h
05:28 madminer.utils.ml.tr INFO                 validation sum losses:   0.00h
05:28 madminer.utils.ml.tr INFO                        early stopping:   0.00h
05:28 madminer.utils.ml.tr INFO                          report epoch:   0.00h
05:28 madminer.ml          INFO    Saving model to models/scandal

(nll: 3.830, mse_score: 0.064

3. Evaluate likelihood estimator

estimator.evaluate_log_likelihood(theta,x) estimated the log likelihood for all combination between the given phase-space points x and parameters theta. That is, if given 100 events x and a grid of 25 theta points, it will return 25*100 estimates for the log likelihood, indexed by [i_theta,i_x].

theta_each = np.linspace(-1.,1.,25)
theta0, theta1 = np.meshgrid(theta_each, theta_each)
theta_grid = np.vstack((theta0.flatten(), theta1.flatten())).T
np.save('data/samples/theta_grid.npy', theta_grid)
estimator.load('models/scandal')

log_p_hat, _ = estimator.evaluate_log_likelihood(
    theta='data/samples/theta_grid.npy',
    x='data/samples/x_test.npy',
    evaluate_score=False
)
05:28 madminer.ml          INFO    Loading model from models/scandal
05:28 madminer.utils.vario INFO      Loading data/samples/theta_grid.npy into RAM
05:28 madminer.utils.vario INFO      Loading data/samples/x_test.npy into RAM
05:28 madminer.ml          INFO    Starting ratio evaluation for 625000 x-theta combinations
05:28 madminer.ml          INFO    Evaluation done

Let’s look at the result:

bin_size = theta_each[1] - theta_each[0]
edges = np.linspace(theta_each[0] - bin_size/2, theta_each[-1] + bin_size/2, len(theta_each)+1)

fig = plt.figure(figsize=(6,5))
ax = plt.gca()

expected_llr = np.mean(log_p_hat,axis=1)
best_fit = theta_grid[np.argmin(-2.*expected_llr)]

cmin, cmax = np.min(-2*expected_llr), np.max(-2*expected_llr)
    
pcm = ax.pcolormesh(edges, edges, -2. * expected_llr.reshape((25,25)),
                    norm=matplotlib.colors.Normalize(vmin=cmin, vmax=cmax),
                    cmap='viridis_r')
cbar = fig.colorbar(pcm, ax=ax, extend='both')

plt.scatter(best_fit[0], best_fit[1], s=80., color='black', marker='*')

plt.xlabel(r'$\theta_0$')
plt.ylabel(r'$\theta_1$')
cbar.set_label(r'$\mathbb{E}_x [ -2\, \log \,\hat{r}(x | \theta, \theta_{SM}) ]$')

plt.tight_layout()
plt.show()
../../../_images/3c_likelihood_23_0.png

Note that in this tutorial our sample size was very small, and the network might not really have a chance to converge to the correct likelihood function. So don’t worry if you find a minimum that is not at the right point (the SM, i.e. the origin in this plot). Feel free to dial up the event numbers in the run card as well as the training samples and see what happens then!