This C++ version of BAT is still being maintained, but addition of new features is unlikely. Check out our new incarnation, BAT.jl, the Bayesian analysis toolkit in Julia. In addition to Metropolis-Hastings sampling, BAT.jl supports Hamiltonian Monte Carlo (HMC) with automatic differentiation, automatic prior-based parameter space transformations, and much more. See the BAT.jl documentation.
Estimating trigger efficiencies - tutorial for BAT
Physics motivation
Triggers are essential at collider experiments if the number of 'interesting events' is accompanied by background processes. The efficiencies for triggers are then needed, e.g., to calculate the production cross-section of the process under study. Usually the trigger efficiency is a function of several variables. This tutorial shows how to evaluate the trigger efficiency for objects at a collider as a function of the transverse momentum. Further information on the details of the physics and the original analysis can be found here.
Tutorial
This tutorial shows how to use a pre-defined BAT model for fitting data with binomial uncertainties from within ROOT. The data come in form of two histograms which are divided.
The tutorial is split into several steps:
- Step 1 - Getting started
- Step 2 - Defining a fit function
- Step 3 - Interpreting the results
- Step 4 - Using different constraints
- Step 5 - Using different priors *
- Step 6 - Adding systematic uncertainties *
Steps marked with * are advanced examples and independent of the limit part of the tutorial. A proposal for additional studies is also given.
Step 1 - Getting started
Copy the histogram file data.root and the ROOT macro skeleton efficiency.cxx to your local directory. The histogram hist_all represents a reference spectrum of the pT distribution from a particular object, e.g., muons measured during collisions. The spectrum can be obtained, e.g., from the Monte Carlo or a set of tagged objects (using the tag-and-probe method). The histogram hist_sub represents a subset of the objects, e.g., all muons which were triggered. These are referred to as probe objects when using tag-and-probe. The trigger efficiency can be estimed by dividing the two histograms bin-by-bin.
Have a look at the two distributions and estimate the trigger efficiency for small and large values by eye. Where is the point of inflection of the efficiency?
#include <TROOT.h>
#include <TDirectory.h>
#include <TFile.h>
#include <TH1D.h>
#include <TF1.h>
#include <TCanvas.h>
#include <TMath.h>
#include <BAT/BCLog.h>
#include <BAT/BCAux.h>
#include <BAT/BCEfficiencyFitter.h>
#include <BAT/BCMath.h>
double fitfunction(double * x, double * par);
// ---------------------------------------------------------
int efficiency()
{
// open log file
BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail);
// set style
BCAux::SetStyle();
// remember this ROOT directory
TDirectory* dir = gDirectory;
// open file
TFile* file = new TFile("data.root", "READ");
// change back into old directory
dir->cd();
// check if file is there
if (!file) {
std::cout << "Could not open file. Exit" << std::endl;
return -1;
}
// get histograms from file
TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone();
TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone();
// check if histograms are there
if (!hist_all || !hist_sub) {
std::cout << "Could not find histograms. Exit." << std::endl;
return -1;
}
// define a fit function
TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4);
// create a new efficiency fitter
BCEfficiencyFitter * fitter = new BCEfficiencyFitter(hist_all, hist_sub, f1);
// set options for evaluating the fit function
fitter -> SetFlagIntegration(false);
// set options for MCMC
fitter->MCMCSetNLag(10);
fitter->MCMCSetNIterationsRun(1000000);
// perform fit
fitter->Fit();
// draw histograms and fit results
TCanvas* c1 = new TCanvas("c1", "", 800, 400);
c1->Divide(2, 1);
c1->cd(1);
hist_all->Draw();
hist_sub->Draw("same");
c1->cd(2);
fitter->DrawFit("", true); // draw with a legend
// print to file
c1->Print("fit.ps");
// print marginalized distributions
fitter->PrintAllMarginalized("distributions.ps");
fitter->PrintResults("results.txt");
// close file and free memory
file->Close();
delete file;
// free memory
delete fitter;
delete f1;
delete c1;
// no error
return 0;
}
// ---------------------------------------------------------
double fitfunction(double * x, double * par)
{
return 0;
}
The code needs to be compiled, i.e., run it with
root -l efficiency.cxx++
If you want to run it without compilation you have to remove the includes.
The point of inflection is approximately at a pT of 40 GeV/c. The efficiencies for small and large values are roughly 0 and 1, respectively.
Step 2 - Defining a fit function
The efficiency can be estimated by fitting the ratio of the two data sets bin-by-bin. How can the uncertainties in each bin be calculated? Are the uncertainties symmetric? Define a proper fit function for the efficiency. How many fit parameters are there?
#include <TROOT.h>
#include <TDirectory.h>
#include <TFile.h>
#include <TH1D.h>
#include <TF1.h>
#include <TCanvas.h>
#include <TMath.h>
#include <BAT/BCLog.h>
#include <BAT/BCAux.h>
#include <BAT/BCEfficiencyFitter.h>
#include <BAT/BCMath.h>
double fitfunction(double * x, double * par);
// ---------------------------------------------------------
int efficiency()
{
// open log file
BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail);
// set style
BCAux::SetStyle();
// remember this ROOT directory
TDirectory* dir = gDirectory;
// open file
TFile* file = new TFile("data.root", "READ");
// change back into old directory
dir->cd();
// check if file is there
if (!file) {
std::cout << "Could not open file. Exit" << std::endl;
return -1;
}
// get histograms from file
TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone();
TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone();
// check if histograms are there
if (!hist_all || !hist_sub) {
std::cout << "Could not find histograms. Exit." << std::endl;
return -1;
}
// define a fit function
TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4);
f1 -> SetParLimits(0, 0.0, 100.0); // inflection point (threshold)
f1 -> SetParLimits(1, 0.0, 100.0); // width of the underlying Gaussian
f1 -> SetParLimits(2, 0.0, 1.0); // efficiency for small pT
f1 -> SetParLimits(3, 0.0, 1.0); // efficiency for large pT
// create a new efficiency fitter
BCEfficiencyFitter * fitter = new BCEfficiencyFitter(hist_all, hist_sub, f1);
// set options for evaluating the fit function
fitter -> SetFlagIntegration(false);
// set options for MCMC
fitter->MCMCSetNLag(10);
fitter->MCMCSetNIterationsRun(1000000);
// perform fit
fitter->Fit();
// draw histograms and fit results
TCanvas* c1 = new TCanvas("c1", "", 800, 400);
c1->Divide(2, 1);
c1->cd(1);
hist_all->Draw();
hist_sub->Draw("same");
c1->cd(2);
fitter->DrawFit("", true); // draw with a legend
// print to file
c1->Print("fit.ps");
// print marginalized distributions
fitter->PrintAllMarginalized("distributions.ps");
fitter->PrintResults("results.txt");
// close file and free memory
file->Close();
delete file;
// free memory
delete fitter;
delete f1;
delete c1;
// no error
return 0;
}
// ---------------------------------------------------------
double fitfunction(double * x, double * par)
{
double ff;
double pt = x[0];
if (pt < par[0])
ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.;
else
ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5);
return ff;
}
- The efficiency in each bin is defined by a binomial distribution (k out of n events pass the trigger requirement).
- The uncertainty is asymmetric for small (around 0) and large (around 1) values of the efficiency. The central region (around 0.5) is symmetric, depending on the statistics available.
- A suitable fit function is the error function. Four parameters need to be taken into account: the point of inflection (parameter 0), the width of the underlying Gaussian (parameter 1), the efficiency for small values of pT (parameter 2) and the efficiency for large values of pT (parameter 3).
Step 3 - Interpreting the results
Run the macro after you have defined the fit function. The MCMC might take a while. Have a look at the plots generated. Does the fit function with the best-fit parameters look reasonable? Compare the results from Minuit and the MCMC. Are both physical? The priors for the fit parameters are chosen to be flat in this example. Check that the minimum and maximum allowed values are chosen appropriatly.
#include <TROOT.h>
#include <TDirectory.h>
#include <TFile.h>
#include <TH1D.h>
#include <TF1.h>
#include <TCanvas.h>
#include <TMath.h>
#include <BAT/BCLog.h>
#include <BAT/BCAux.h>
#include <BAT/BCEfficiencyFitter.h>
#include <BAT/BCMath.h>
double fitfunction(double * x, double * par);
// ---------------------------------------------------------
int efficiency()
{
// open log file
BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail);
// set style
BCAux::SetStyle();
// remember this ROOT directory
TDirectory* dir = gDirectory;
// open file
TFile* file = new TFile("data.root", "READ");
// change back into old directory
dir->cd();
// check if file is there
if (!file) {
std::cout << "Could not open file. Exit" << std::endl;
return -1;
}
// get histograms from file
TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone();
TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone();
// check if histograms are there
if (!hist_all || !hist_sub) {
std::cout << "Could not find histograms. Exit." << std::endl;
return -1;
}
// define a fit function
TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4);
f1 -> SetParLimits(0, 30.0, 50.0); // inflection point (threshold)
f1 -> SetParLimits(1, 10.0, 30.0); // width of the underlying Gaussian
f1 -> SetParLimits(2, 0.0, 0.1); // efficiency for small pT
f1 -> SetParLimits(3, 0.8, 1.0); // efficiency for large pT
// create a new efficiency fitter
BCEfficiencyFitter * fitter = new BCEfficiencyFitter(hist_all, hist_sub, f1);
// set options for evaluating the fit function
fitter -> SetFlagIntegration(false);
// set options for MCMC
fitter->MCMCSetNLag(10);
fitter->MCMCSetNIterationsRun(1000000);
// perform fit
fitter->Fit();
// draw histograms and fit results
TCanvas* c1 = new TCanvas("c1", "", 800, 400);
c1->Divide(2, 1);
c1->cd(1);
hist_all->Draw();
hist_sub->Draw("same");
c1->cd(2);
fitter->DrawFit("", true); // draw with a legend
// print to file
c1->Print("fit.ps");
// print marginalized distributions
fitter->PrintAllMarginalized("distributions.ps");
fitter->PrintResults("results.txt");
// close file and free memory
file->Close();
delete file;
// free memory
delete fitter;
delete f1;
delete c1;
// no error
return 0;
}
// ---------------------------------------------------------
double fitfunction(double * x, double * par)
{
double ff;
double pt = x[0];
if (pt < par[0])
ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.;
else
ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5);
return ff;
}
The marginalized distributions are reasonable. In particular, they are bound to physical values (efficiencies between 0 and 1, postivite momenta). The results from Minuit are not bound to physical values and give, e.g., efficiencies smaller than 0.
Step 4 - Using different constraints
Set the maximum efficiency for low pT values to smaller values.
#include <TROOT.h>
#include <TDirectory.h>
#include <TFile.h>
#include <TH1D.h>
#include <TF1.h>
#include <TCanvas.h>
#include <TMath.h>
#include <BAT/BCLog.h>
#include <BAT/BCAux.h>
#include <BAT/BCEfficiencyFitter.h>
#include <BAT/BCMath.h>
double fitfunction(double * x, double * par);
// ---------------------------------------------------------
int efficiency()
{
// open log file
BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail);
// set style
BCAux::SetStyle();
// remember this ROOT directory
TDirectory* dir = gDirectory;
// open file
TFile* file = new TFile("data.root", "READ");
// change back into old directory
dir->cd();
// check if file is there
if (!file) {
std::cout << "Could not open file. Exit" << std::endl;
return -1;
}
// get histograms from file
TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone();
TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone();
// check if histograms are there
if (!hist_all || !hist_sub) {
std::cout << "Could not find histograms. Exit." << std::endl;
return -1;
}
// define a fit function
TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4);
f1 -> SetParLimits(0, 30.0, 50.0); // inflection point (threshold)
f1 -> SetParLimits(1, 10.0, 30.0); // width of the underlying Gaussian
f1 -> SetParLimits(2, 0.0, 0.000001); // efficiency for small pT
f1 -> SetParLimits(3, 0.8, 1.0); // efficiency for large pT
// create a new efficiency fitter
BCEfficiencyFitter * fitter = new BCEfficiencyFitter(hist_all, hist_sub, f1);
// set options for evaluating the fit function
fitter -> SetFlagIntegration(false);
// set options for MCMC
fitter->MCMCSetNLag(10);
fitter->MCMCSetNIterationsRun(1000000);
// perform fit
fitter->Fit();
// draw histograms and fit results
TCanvas* c1 = new TCanvas("c1", "", 800, 400);
c1->Divide(2, 1);
c1->cd(1);
hist_all->Draw();
hist_sub->Draw("same");
c1->cd(2);
fitter->DrawFit("", true); // draw with a legend
// print to file
c1->Print("fit.ps");
// print marginalized distributions
fitter->PrintAllMarginalized("distributions.ps");
fitter->PrintResults("results.txt");
// close file and free memory
file->Close();
delete file;
// free memory
delete fitter;
delete f1;
delete c1;
// no error
return 0;
}
// ---------------------------------------------------------
double fitfunction(double * x, double * par)
{
double ff;
double pt = x[0];
if (pt < par[0])
ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.;
else
ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5);
return ff;
}
Step 5 - Using different priors
The BCEfficiencyFitter in its current implementation does not forsee different priors. One way out it to create a new class which inherits from it and overload the method LogAprioriProbability. Add the following priors to this class:
- Point of inflection: Gauss around 35 GeV/c with a width of 2 GeV/c.
- Efficiency for small pT: exponential with a decay constant of -0.05.
- Efficiency for large pT: Gauss around 1.0 with a widht of 0.05.
#include <TROOT.h>
#include <TDirectory.h>
#include <TFile.h>
#include <TH1D.h>
#include <TF1.h>
#include <TCanvas.h>
#include <TMath.h>
#include <BAT/BCLog.h>
#include <BAT/BCAux.h>
#include <BAT/BCEfficiencyFitter.h>
#include <BAT/BCMath.h>
class MyFitterClass : public BCEfficiencyFitter
{
public:
// constructor
MyFitterClass(TH1D * hist1, TH1D * hist2, TF1 * func) : BCEfficiencyFitter(hist1, hist2, func) {};
// destructor
~MyFitterClass() {};
// prior probability
double LogAPrioriProbability(std::vector <double> parameters) {
double logprob = 0;
// prior on parameters
logprob += BCMath::LogGaus(parameters.at(0), 35.0, 2.0);
logprob += -parameters.at(2)/0.05;
logprob += BCMath::LogGaus(parameters.at(3), 1.0, 0.05);
return logprob;
};
};
double fitfunction(double * x, double * par);
// ---------------------------------------------------------
int efficiency()
{
// open log file
BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail);
// set style
BCAux::SetStyle();
// remember this ROOT directory
TDirectory* dir = gDirectory;
// open file
TFile* file = new TFile("data.root", "READ");
// change back into old directory
dir->cd();
// check if file is there
if (!file) {
std::cout << "Could not open file. Exit" << std::endl;
return -1;
}
// get histograms from file
TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone();
TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone();
// check if histograms are there
if (!hist_all || !hist_sub) {
std::cout << "Could not find histograms. Exit." << std::endl;
return -1;
}
// define a fit function
TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4);
f1 -> SetParLimits(0, 30.0, 50.0); // inflection point (threshold)
f1 -> SetParLimits(1, 10.0, 30.0); // width of the underlying Gaussian
f1 -> SetParLimits(2, 0.0, 0.1); // efficiency for small pT
f1 -> SetParLimits(3, 0.8, 1.0); // efficiency for large pT
// create a new efficiency fitter
MyFitterClass * fitter = new MyFitterClass(hist_all, hist_sub, f1);
// set options for evaluating the fit function
fitter -> SetFlagIntegration(false);
// set options for MCMC
fitter->MCMCSetNLag(10);
fitter->MCMCSetNIterationsRun(1000000);
// perform fit
fitter->Fit();
// draw histograms and fit results
TCanvas* c1 = new TCanvas("c1", "", 800, 400);
c1->Divide(2, 1);
c1->cd(1);
hist_all->Draw();
hist_sub->Draw("same");
c1->cd(2);
fitter->DrawFit("", true); // draw with a legend
// print to file
c1->Print("fit.ps");
// print marginalized distributions
fitter->PrintAllMarginalized("distributions.ps");
fitter->PrintResults("results.txt");
// close file and free memory
file->Close();
delete file;
// free memory
delete fitter;
delete f1;
delete c1;
// no error
return 0;
}
// ---------------------------------------------------------
double fitfunction(double * x, double * par)
{
double ff;
double pt = x[0];
if (pt < par[0])
ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.;
else
ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5);
return ff;
}
Step 6 - Adding systematic uncertainties
Say, the scale of the transverse momentum is only known with some precision of 2.5 GeV/c. Include the uncertainty as a nuisance parameter. Hint: add a new parameter with a Gaussian prior around 0 and add the parameter value to x in the definition of the function.
#include <TROOT.h>
#include <TDirectory.h>
#include <TFile.h>
#include <TH1D.h>
#include <TF1.h>
#include <TCanvas.h>
#include <TMath.h>
#include <BAT/BCLog.h>
#include <BAT/BCAux.h>
#include <BAT/BCEfficiencyFitter.h>
#include <BAT/BCMath.h>
class MyFitterClass : public BCEfficiencyFitter
{
public:
// constructor
MyFitterClass(TH1D * hist1, TH1D * hist2, TF1 * func) : BCEfficiencyFitter(hist1, hist2, func) {};
// destructor
~MyFitterClass() {};
// prior probability
double LogAPrioriProbability(std::vector parameters) {
double logprob = 0;
// prior on parameters
logprob += BCMath::LogGaus(parameters.at(0), 35.0, 2.0);
logprob += -parameters.at(2)/0.05;
logprob += BCMath::LogGaus(parameters.at(3), 1.0, 0.05);
// systematic uncertainty on pT-scale
logprob += BCMath::LogGaus(parameters.at(4), 0.0, 1.0);
return logprob;
};
};
double fitfunction(double * x, double * par);
// ---------------------------------------------------------
int efficiency()
{
// open log file
BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail);
// set style
BCAux::SetStyle();
// remember this ROOT directory
TDirectory* dir = gDirectory;
// open file
TFile* file = new TFile("data.root", "READ");
// change back into old directory
dir->cd();
// check if file is there
if (!file) {
std::cout << "Could not open file. Exit" << std::endl;
return -1;
}
// get histograms from file
TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone();
TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone();
// check if histograms are there
if (!hist_all || !hist_sub) {
std::cout << "Could not find histograms. Exit." << std::endl;
return -1;
}
// define a fit function
TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 5);
f1 -> SetParLimits(0, 30.0, 50.0); // inflection point (threshold)
f1 -> SetParLimits(1, 10.0, 30.0); // width of the underlying Gaussian
f1 -> SetParLimits(2, 0.0, 0.1); // efficiency for small pT
f1 -> SetParLimits(3, 0.8, 1.0); // efficiency for large pT
f1 -> SetParLimits(4, -5.0, 5.0); // nuisance parameter: pT-scale
// create a new efficiency fitter
MyFitterClass * fitter = new MyFitterClass(hist_all, hist_sub, f1);
// set options for evaluating the fit function
fitter -> SetFlagIntegration(false);
// set options for MCMC
fitter->MCMCSetNLag(10);
fitter->MCMCSetNIterationsRun(1000000);
// perform fit
fitter->Fit();
// draw histograms and fit results
TCanvas* c1 = new TCanvas("c1", "", 800, 400);
c1->Divide(2, 1);
c1->cd(1);
hist_all->Draw();
hist_sub->Draw("same");
c1->cd(2);
fitter->DrawFit("", true); // draw with a legend
// print to file
c1->Print("fit.ps");
// print marginalized distributions
fitter->PrintAllMarginalized("distributions.ps");
fitter->PrintResults("results.txt");
// close file and free memory
file->Close();
delete file;
// free memory
delete fitter;
delete f1;
delete c1;
// no error
return 0;
}
// ---------------------------------------------------------
double fitfunction(double * x, double * par)
{
double ff;
double pt = x[0] + par[4]*2.5;
if (pt < par[0])
ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.;
else
ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5);
return ff;
}
Additional studies
- Calculate the fraction of events which pass the trigger requirement if the pT distribution where a Gaussian around 50 GeV/c with a width of 20 Gev/c. Overload the function MCMCUserInterface to perform the propagation of uncertainties.
Further material and documentation
An introduction to BAT can be found here.
An introduction to this tutorial can be found here.
Kevin Kröninger