This C++ version of BAT is still being maintained, but addition of new features is unlikely. Check out our new incarnation, BAT.jl, the Bayesian analysis toolkit in Julia. In addition to Metropolis-Hastings sampling, BAT.jl supports Hamiltonian Monte Carlo (HMC) with automatic differentiation, automatic prior-based parameter space transformations, and much more. See the BAT.jl documentation.
Estimating trigger efficiencies - tutorial for BAT
Physics motivation
Triggers are essential at collider experiments if the number of 'interesting events' is accompanied by background processes. The efficiencies for triggers are then needed, e.g., to calculate the production cross-section of the process under study. Usually the trigger efficiency is a function of several variables. This tutorial shows how to evaluate the trigger efficiency for objects at a collider as a function of the transverse momentum. Further information on the details of the physics and the original analysis can be found here.
Tutorial
This tutorial shows how to use a pre-defined BAT model for fitting data with binomial uncertainties from within ROOT. The data come in form of two histograms which are divided.
The tutorial is split into several steps:
- Step 1 - Getting started
- Step 2 - Defining a fit function
- Step 3 - Interpreting the results
- Step 4 - Using different constraints
- Step 5 - Using different priors *
- Step 6 - Adding systematic uncertainties *
Steps marked with * are advanced examples and independent of the limit part of the tutorial. A proposal for additional studies is also given.
Step 1 - Getting started
Copy the histogram file data.root and the ROOT macro skeleton efficiency.cxx to your local directory. The histogram hist_all represents a reference spectrum of the pT distribution from a particular object, e.g., muons measured during collisions. The spectrum can be obtained, e.g., from the Monte Carlo or a set of tagged objects (using the tag-and-probe method). The histogram hist_sub represents a subset of the objects, e.g., all muons which were triggered. These are referred to as probe objects when using tag-and-probe. The trigger efficiency can be estimed by dividing the two histograms bin-by-bin.
Have a look at the two distributions and estimate the trigger efficiency for small and large values by eye. Where is the point of inflection of the efficiency?
#include <TROOT.h> #include <TDirectory.h> #include <TFile.h> #include <TH1D.h> #include <TF1.h> #include <TCanvas.h> #include <TMath.h> #include <BAT/BCLog.h> #include <BAT/BCAux.h> #include <BAT/BCEfficiencyFitter.h> #include <BAT/BCMath.h> double fitfunction(double * x, double * par); // --------------------------------------------------------- int efficiency() { // open log file BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail); // set style BCAux::SetStyle(); // remember this ROOT directory TDirectory* dir = gDirectory; // open file TFile* file = new TFile("data.root", "READ"); // change back into old directory dir->cd(); // check if file is there if (!file) { std::cout << "Could not open file. Exit" << std::endl; return -1; } // get histograms from file TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone(); TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone(); // check if histograms are there if (!hist_all || !hist_sub) { std::cout << "Could not find histograms. Exit." << std::endl; return -1; } // define a fit function TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4); // create a new efficiency fitter BCEfficiencyFitter * fitter = new BCEfficiencyFitter(hist_all, hist_sub, f1); // set options for evaluating the fit function fitter -> SetFlagIntegration(false); // set options for MCMC fitter->MCMCSetNLag(10); fitter->MCMCSetNIterationsRun(1000000); // perform fit fitter->Fit(); // draw histograms and fit results TCanvas* c1 = new TCanvas("c1", "", 800, 400); c1->Divide(2, 1); c1->cd(1); hist_all->Draw(); hist_sub->Draw("same"); c1->cd(2); fitter->DrawFit("", true); // draw with a legend // print to file c1->Print("fit.ps"); // print marginalized distributions fitter->PrintAllMarginalized("distributions.ps"); fitter->PrintResults("results.txt"); // close file and free memory file->Close(); delete file; // free memory delete fitter; delete f1; delete c1; // no error return 0; } // --------------------------------------------------------- double fitfunction(double * x, double * par) { return 0; }
The code needs to be compiled, i.e., run it with
root -l efficiency.cxx++
If you want to run it without compilation you have to remove the includes.
The point of inflection is approximately at a pT of 40 GeV/c. The efficiencies for small and large values are roughly 0 and 1, respectively.
Step 2 - Defining a fit function
The efficiency can be estimated by fitting the ratio of the two data sets bin-by-bin. How can the uncertainties in each bin be calculated? Are the uncertainties symmetric? Define a proper fit function for the efficiency. How many fit parameters are there?
#include <TROOT.h> #include <TDirectory.h> #include <TFile.h> #include <TH1D.h> #include <TF1.h> #include <TCanvas.h> #include <TMath.h> #include <BAT/BCLog.h> #include <BAT/BCAux.h> #include <BAT/BCEfficiencyFitter.h> #include <BAT/BCMath.h> double fitfunction(double * x, double * par); // --------------------------------------------------------- int efficiency() { // open log file BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail); // set style BCAux::SetStyle(); // remember this ROOT directory TDirectory* dir = gDirectory; // open file TFile* file = new TFile("data.root", "READ"); // change back into old directory dir->cd(); // check if file is there if (!file) { std::cout << "Could not open file. Exit" << std::endl; return -1; } // get histograms from file TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone(); TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone(); // check if histograms are there if (!hist_all || !hist_sub) { std::cout << "Could not find histograms. Exit." << std::endl; return -1; } // define a fit function TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4); f1 -> SetParLimits(0, 0.0, 100.0); // inflection point (threshold) f1 -> SetParLimits(1, 0.0, 100.0); // width of the underlying Gaussian f1 -> SetParLimits(2, 0.0, 1.0); // efficiency for small pT f1 -> SetParLimits(3, 0.0, 1.0); // efficiency for large pT // create a new efficiency fitter BCEfficiencyFitter * fitter = new BCEfficiencyFitter(hist_all, hist_sub, f1); // set options for evaluating the fit function fitter -> SetFlagIntegration(false); // set options for MCMC fitter->MCMCSetNLag(10); fitter->MCMCSetNIterationsRun(1000000); // perform fit fitter->Fit(); // draw histograms and fit results TCanvas* c1 = new TCanvas("c1", "", 800, 400); c1->Divide(2, 1); c1->cd(1); hist_all->Draw(); hist_sub->Draw("same"); c1->cd(2); fitter->DrawFit("", true); // draw with a legend // print to file c1->Print("fit.ps"); // print marginalized distributions fitter->PrintAllMarginalized("distributions.ps"); fitter->PrintResults("results.txt"); // close file and free memory file->Close(); delete file; // free memory delete fitter; delete f1; delete c1; // no error return 0; } // --------------------------------------------------------- double fitfunction(double * x, double * par) { double ff; double pt = x[0]; if (pt < par[0]) ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.; else ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5); return ff; }
- The efficiency in each bin is defined by a binomial distribution (k out of n events pass the trigger requirement).
- The uncertainty is asymmetric for small (around 0) and large (around 1) values of the efficiency. The central region (around 0.5) is symmetric, depending on the statistics available.
- A suitable fit function is the error function. Four parameters need to be taken into account: the point of inflection (parameter 0), the width of the underlying Gaussian (parameter 1), the efficiency for small values of pT (parameter 2) and the efficiency for large values of pT (parameter 3).
Step 3 - Interpreting the results
Run the macro after you have defined the fit function. The MCMC might take a while. Have a look at the plots generated. Does the fit function with the best-fit parameters look reasonable? Compare the results from Minuit and the MCMC. Are both physical? The priors for the fit parameters are chosen to be flat in this example. Check that the minimum and maximum allowed values are chosen appropriatly.
#include <TROOT.h> #include <TDirectory.h> #include <TFile.h> #include <TH1D.h> #include <TF1.h> #include <TCanvas.h> #include <TMath.h> #include <BAT/BCLog.h> #include <BAT/BCAux.h> #include <BAT/BCEfficiencyFitter.h> #include <BAT/BCMath.h> double fitfunction(double * x, double * par); // --------------------------------------------------------- int efficiency() { // open log file BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail); // set style BCAux::SetStyle(); // remember this ROOT directory TDirectory* dir = gDirectory; // open file TFile* file = new TFile("data.root", "READ"); // change back into old directory dir->cd(); // check if file is there if (!file) { std::cout << "Could not open file. Exit" << std::endl; return -1; } // get histograms from file TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone(); TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone(); // check if histograms are there if (!hist_all || !hist_sub) { std::cout << "Could not find histograms. Exit." << std::endl; return -1; } // define a fit function TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4); f1 -> SetParLimits(0, 30.0, 50.0); // inflection point (threshold) f1 -> SetParLimits(1, 10.0, 30.0); // width of the underlying Gaussian f1 -> SetParLimits(2, 0.0, 0.1); // efficiency for small pT f1 -> SetParLimits(3, 0.8, 1.0); // efficiency for large pT // create a new efficiency fitter BCEfficiencyFitter * fitter = new BCEfficiencyFitter(hist_all, hist_sub, f1); // set options for evaluating the fit function fitter -> SetFlagIntegration(false); // set options for MCMC fitter->MCMCSetNLag(10); fitter->MCMCSetNIterationsRun(1000000); // perform fit fitter->Fit(); // draw histograms and fit results TCanvas* c1 = new TCanvas("c1", "", 800, 400); c1->Divide(2, 1); c1->cd(1); hist_all->Draw(); hist_sub->Draw("same"); c1->cd(2); fitter->DrawFit("", true); // draw with a legend // print to file c1->Print("fit.ps"); // print marginalized distributions fitter->PrintAllMarginalized("distributions.ps"); fitter->PrintResults("results.txt"); // close file and free memory file->Close(); delete file; // free memory delete fitter; delete f1; delete c1; // no error return 0; } // --------------------------------------------------------- double fitfunction(double * x, double * par) { double ff; double pt = x[0]; if (pt < par[0]) ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.; else ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5); return ff; }
The marginalized distributions are reasonable. In particular, they are bound to physical values (efficiencies between 0 and 1, postivite momenta). The results from Minuit are not bound to physical values and give, e.g., efficiencies smaller than 0.
Step 4 - Using different constraints
Set the maximum efficiency for low pT values to smaller values.
#include <TROOT.h> #include <TDirectory.h> #include <TFile.h> #include <TH1D.h> #include <TF1.h> #include <TCanvas.h> #include <TMath.h> #include <BAT/BCLog.h> #include <BAT/BCAux.h> #include <BAT/BCEfficiencyFitter.h> #include <BAT/BCMath.h> double fitfunction(double * x, double * par); // --------------------------------------------------------- int efficiency() { // open log file BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail); // set style BCAux::SetStyle(); // remember this ROOT directory TDirectory* dir = gDirectory; // open file TFile* file = new TFile("data.root", "READ"); // change back into old directory dir->cd(); // check if file is there if (!file) { std::cout << "Could not open file. Exit" << std::endl; return -1; } // get histograms from file TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone(); TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone(); // check if histograms are there if (!hist_all || !hist_sub) { std::cout << "Could not find histograms. Exit." << std::endl; return -1; } // define a fit function TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4); f1 -> SetParLimits(0, 30.0, 50.0); // inflection point (threshold) f1 -> SetParLimits(1, 10.0, 30.0); // width of the underlying Gaussian f1 -> SetParLimits(2, 0.0, 0.000001); // efficiency for small pT f1 -> SetParLimits(3, 0.8, 1.0); // efficiency for large pT // create a new efficiency fitter BCEfficiencyFitter * fitter = new BCEfficiencyFitter(hist_all, hist_sub, f1); // set options for evaluating the fit function fitter -> SetFlagIntegration(false); // set options for MCMC fitter->MCMCSetNLag(10); fitter->MCMCSetNIterationsRun(1000000); // perform fit fitter->Fit(); // draw histograms and fit results TCanvas* c1 = new TCanvas("c1", "", 800, 400); c1->Divide(2, 1); c1->cd(1); hist_all->Draw(); hist_sub->Draw("same"); c1->cd(2); fitter->DrawFit("", true); // draw with a legend // print to file c1->Print("fit.ps"); // print marginalized distributions fitter->PrintAllMarginalized("distributions.ps"); fitter->PrintResults("results.txt"); // close file and free memory file->Close(); delete file; // free memory delete fitter; delete f1; delete c1; // no error return 0; } // --------------------------------------------------------- double fitfunction(double * x, double * par) { double ff; double pt = x[0]; if (pt < par[0]) ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.; else ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5); return ff; }
Step 5 - Using different priors
The BCEfficiencyFitter in its current implementation does not forsee different priors. One way out it to create a new class which inherits from it and overload the method LogAprioriProbability. Add the following priors to this class:
- Point of inflection: Gauss around 35 GeV/c with a width of 2 GeV/c.
- Efficiency for small pT: exponential with a decay constant of -0.05.
- Efficiency for large pT: Gauss around 1.0 with a widht of 0.05.
#include <TROOT.h> #include <TDirectory.h> #include <TFile.h> #include <TH1D.h> #include <TF1.h> #include <TCanvas.h> #include <TMath.h> #include <BAT/BCLog.h> #include <BAT/BCAux.h> #include <BAT/BCEfficiencyFitter.h> #include <BAT/BCMath.h> class MyFitterClass : public BCEfficiencyFitter { public: // constructor MyFitterClass(TH1D * hist1, TH1D * hist2, TF1 * func) : BCEfficiencyFitter(hist1, hist2, func) {}; // destructor ~MyFitterClass() {}; // prior probability double LogAPrioriProbability(std::vector <double> parameters) { double logprob = 0; // prior on parameters logprob += BCMath::LogGaus(parameters.at(0), 35.0, 2.0); logprob += -parameters.at(2)/0.05; logprob += BCMath::LogGaus(parameters.at(3), 1.0, 0.05); return logprob; }; }; double fitfunction(double * x, double * par); // --------------------------------------------------------- int efficiency() { // open log file BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail); // set style BCAux::SetStyle(); // remember this ROOT directory TDirectory* dir = gDirectory; // open file TFile* file = new TFile("data.root", "READ"); // change back into old directory dir->cd(); // check if file is there if (!file) { std::cout << "Could not open file. Exit" << std::endl; return -1; } // get histograms from file TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone(); TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone(); // check if histograms are there if (!hist_all || !hist_sub) { std::cout << "Could not find histograms. Exit." << std::endl; return -1; } // define a fit function TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 4); f1 -> SetParLimits(0, 30.0, 50.0); // inflection point (threshold) f1 -> SetParLimits(1, 10.0, 30.0); // width of the underlying Gaussian f1 -> SetParLimits(2, 0.0, 0.1); // efficiency for small pT f1 -> SetParLimits(3, 0.8, 1.0); // efficiency for large pT // create a new efficiency fitter MyFitterClass * fitter = new MyFitterClass(hist_all, hist_sub, f1); // set options for evaluating the fit function fitter -> SetFlagIntegration(false); // set options for MCMC fitter->MCMCSetNLag(10); fitter->MCMCSetNIterationsRun(1000000); // perform fit fitter->Fit(); // draw histograms and fit results TCanvas* c1 = new TCanvas("c1", "", 800, 400); c1->Divide(2, 1); c1->cd(1); hist_all->Draw(); hist_sub->Draw("same"); c1->cd(2); fitter->DrawFit("", true); // draw with a legend // print to file c1->Print("fit.ps"); // print marginalized distributions fitter->PrintAllMarginalized("distributions.ps"); fitter->PrintResults("results.txt"); // close file and free memory file->Close(); delete file; // free memory delete fitter; delete f1; delete c1; // no error return 0; } // --------------------------------------------------------- double fitfunction(double * x, double * par) { double ff; double pt = x[0]; if (pt < par[0]) ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.; else ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5); return ff; }
Step 6 - Adding systematic uncertainties
Say, the scale of the transverse momentum is only known with some precision of 2.5 GeV/c. Include the uncertainty as a nuisance parameter. Hint: add a new parameter with a Gaussian prior around 0 and add the parameter value to x in the definition of the function.
#include <TROOT.h> #include <TDirectory.h> #include <TFile.h> #include <TH1D.h> #include <TF1.h> #include <TCanvas.h> #include <TMath.h> #include <BAT/BCLog.h> #include <BAT/BCAux.h> #include <BAT/BCEfficiencyFitter.h> #include <BAT/BCMath.h> class MyFitterClass : public BCEfficiencyFitter { public: // constructor MyFitterClass(TH1D * hist1, TH1D * hist2, TF1 * func) : BCEfficiencyFitter(hist1, hist2, func) {}; // destructor ~MyFitterClass() {}; // prior probability double LogAPrioriProbability(std::vectorparameters) { double logprob = 0; // prior on parameters logprob += BCMath::LogGaus(parameters.at(0), 35.0, 2.0); logprob += -parameters.at(2)/0.05; logprob += BCMath::LogGaus(parameters.at(3), 1.0, 0.05); // systematic uncertainty on pT-scale logprob += BCMath::LogGaus(parameters.at(4), 0.0, 1.0); return logprob; }; }; double fitfunction(double * x, double * par); // --------------------------------------------------------- int efficiency() { // open log file BCLog::OpenLog("log.txt", BCLog::detail, BCLog::detail); // set style BCAux::SetStyle(); // remember this ROOT directory TDirectory* dir = gDirectory; // open file TFile* file = new TFile("data.root", "READ"); // change back into old directory dir->cd(); // check if file is there if (!file) { std::cout << "Could not open file. Exit" << std::endl; return -1; } // get histograms from file TH1D* hist_all = (TH1D*) file->Get("hist_all")->Clone(); TH1D* hist_sub = (TH1D*) file->Get("hist_sub")->Clone(); // check if histograms are there if (!hist_all || !hist_sub) { std::cout << "Could not find histograms. Exit." << std::endl; return -1; } // define a fit function TF1 * f1 = new TF1("f1", fitfunction, 0.0, 100.0, 5); f1 -> SetParLimits(0, 30.0, 50.0); // inflection point (threshold) f1 -> SetParLimits(1, 10.0, 30.0); // width of the underlying Gaussian f1 -> SetParLimits(2, 0.0, 0.1); // efficiency for small pT f1 -> SetParLimits(3, 0.8, 1.0); // efficiency for large pT f1 -> SetParLimits(4, -5.0, 5.0); // nuisance parameter: pT-scale // create a new efficiency fitter MyFitterClass * fitter = new MyFitterClass(hist_all, hist_sub, f1); // set options for evaluating the fit function fitter -> SetFlagIntegration(false); // set options for MCMC fitter->MCMCSetNLag(10); fitter->MCMCSetNIterationsRun(1000000); // perform fit fitter->Fit(); // draw histograms and fit results TCanvas* c1 = new TCanvas("c1", "", 800, 400); c1->Divide(2, 1); c1->cd(1); hist_all->Draw(); hist_sub->Draw("same"); c1->cd(2); fitter->DrawFit("", true); // draw with a legend // print to file c1->Print("fit.ps"); // print marginalized distributions fitter->PrintAllMarginalized("distributions.ps"); fitter->PrintResults("results.txt"); // close file and free memory file->Close(); delete file; // free memory delete fitter; delete f1; delete c1; // no error return 0; } // --------------------------------------------------------- double fitfunction(double * x, double * par) { double ff; double pt = x[0] + par[4]*2.5; if (pt < par[0]) ff = par[2] + (par[3]-par[2]) * TMath::Erfc((par[0]-pt)/par[1])/2.; else ff = par[2] + (par[3]-par[2]) * (TMath::Erf((pt-par[0])/par[1])/2. + 0.5); return ff; }
Additional studies
- Calculate the fraction of events which pass the trigger requirement if the pT distribution where a Gaussian around 50 GeV/c with a width of 20 Gev/c. Overload the function MCMCUserInterface to perform the propagation of uncertainties.
Further material and documentation
An introduction to BAT can be found here.
An introduction to this tutorial can be found here.
Kevin Kröninger