You are here

RooFit in 20 Minutes

Purpose

The RooFit library provides a toolkit for modeling the expected distribution of events in a physics analysis. Models can be used to perform unbinned maximum likelihood fits, produce plots, and generate "toy Monte Carlo" samples for various studies. RooFit was originally developed for the BaBar collaboration, a particle physics experiment at the Stanford Linear Accelerator Center. The software is primarily designed as a particle physics data analysis tool, but its general nature and open architecture make it useful for other types of data analysis also.

Mathematical model

The core functionality of RooFit is to enable the modeling of ‘event data’ distributions, where each event is a discrete occurrence in time, and has one or more measured observables associated with it. Experiments of this nature result in datasets obeying Poisson (or binomial) statistics.

The natural modeling language for such distributions are probability density functions F(x;p) that describe the probability density the distribution of observables x in terms of function in parameter p. The defining properties of probability density functions, unit normalization with respect to all observables and positive definiteness, also provide important benefits for the design of a structured modeling language: p.d.f.s are easily added with intuitive interpretation of fraction coefficients, they allow construction of higher dimensional p.d.f.s out of lower dimensional building block with an intuitive language to introduce and describe correlations between observables, they allow the universal implementation of toy Monte Carlo sampling techniques, and are of course an prerequisite for the use of (unbinned) maximum likelihood parameter estimation technique.

Design

RooFit introduces a granular structure in its mapping of mathematical data models components to C++ objects: rather than aiming at a monolithic entity that describes a data model, each math symbol is presented by a separate object. A feature of this design philosophy is that all RooFit models always consist of multiple objects.

For example a Gaussian probability density function consists typically of four objects, three objects representing the observable, the mean and the sigma parameters, and one object representing a Gaussian probability density function.

// --- Observable with lower and upper bound
RooRealVar mes("mes","m_{ES} (GeV)",5.20,5.30);

// --- Parameters with starting value, lower bound, upper bound
RooRealVar sigmean("sigmean","B^{#pm} mass",5.28,5.20,5.30);
RooRealVar sigwidth("sigwidth","B^{#pm} width",0.0027,0.001,1.);

// --- Build Gaussian PDF ---
RooGaussian signal("signal","signal PDF",mes,sigmean,sigwidth) ;

Model building operations such as addition, multiplication, integration are represented by separate operator objects and make the modeling language scale well to models of arbitrary complexity.

Examples

Signal + Background Model

Making use of the Gaussian signal p.d.f. defined above, the following example constructs a one-dimensional probability density function with a Gaussian signal component and a 'ARGUS' phase space background component.

#include "RooRealVar.h"
#include "RooConstVar.h"
#include "RooGaussian.h"
#include "RooArgusBG.h"
#include "RooAddPdf.h"
#include "RooDataSet.h"
#include "RooPlot.h"

using namespace RooFit;

void runArgusModel() {
   // --- Observable ---
   RooRealVar mes("mes","m_{ES} (GeV)",5.20,5.30);

   // --- Parameters ---
   RooRealVar sigmean("sigmean","B^{#pm} mass",5.28,5.20,5.30);
   RooRealVar sigwidth("sigwidth","B^{#pm} width",0.0027,0.001,1.);

   // --- Build Gaussian PDF ---
   RooGaussian signalModel("signal","signal PDF",mes,sigmean,sigwidth);

   // --- Build Argus background PDF ---
   RooRealVar argpar("argpar","argus shape parameter",-20.0,-100.,-1.);
   RooArgusBG background("background","Argus PDF",mes,RooConst(5.291),argpar);

   // --- Construct signal+background PDF ---
   RooRealVar nsig("nsig","#signal events",200,0.,10000);
   RooRealVar nbkg("nbkg","#background events",800,0.,10000);
   RooAddPdf model("model","g+a",RooArgList(signalModel,background),RooArgList(nsig,nbkg));

   //We can now use this p.d.f. to generate an unbinned toy dataset, fit the p.d.f to that dataset using an unbinned maximum likelihood fit
   //and visualise the data with the p.d.f overlaid.

   // --- Generate a toyMC sample from composite PDF ---
   RooDataSet *data = model.generate(mes, 2000);

   // --- Perform extended ML fit of composite PDF to toy data ---
   model.fitTo(*data);

   // --- Plot toy data and composite PDF overlaid ---
   RooPlot* mesframe = mes.frame();
   data->plotOn(mesframe);
   model.plotOn(mesframe);
   model.plotOn(mesframe, Components(background), LineStyle(ELineStyle::kDashed));

   mesframe->Draw();
}

In the example above, all indivual components of the RooFit p.d.f (the variables, component p.d.f.s and combined p.d.f.) are all created individually by invoking the constructors directly. It is also possible to organize them in a container class 'the workspace' that has an associated factory tool to create trees of RooFit objects of arbitrary complexity using a construction language. An equivalent macro that could be run using cling would be:

{
   using namespace RooFit;

   RooWorkspace w("myWorkspace", true);
   w.factory("Gaussian::gauss(mes[5.20,5.30],mean[5.28,5.2,5.3],width[0.0027,0.001,1])");
   w.factory("ArgusBG::argus(mes,5.291,argpar[-20,-100,-1])");
   w.factory("SUM::sum(nsig[200,0,10000]*gauss,nbkg[800,0,10000]*argus)");

   // Retrieve pointers to variables and pdfs for later use
   auto sum = w.pdf("sum"); // Returns as RooAbsPdf*
   auto mes = w.var("mes"); // Returns as RooRealVar*

   // --- Generate a toyMC sample from composite PDF ---
   RooDataSet *data = sum->generate(*mes, 2000);

   // --- Perform extended ML fit of composite PDF to toy data ---
   sum->fitTo(*data);

   // --- Plot toy data and composite PDF overlaid ---
   auto mesframe = mes->frame();
   data->plotOn(mesframe);
   sum->plotOn(mesframe);
   sum->plotOn(mesframe, Components(*w.pdf("argus")), LineStyle(kDashed));

   mesframe->Draw();
}

After executing the macro, the objects defined in the workspace are also available in a namespace with the same name as the workspace if the second argument of the workspace constructor is set to true.
That is, typing myWorkspace::sum at the root prompt yields:

root [2] myWorkspace::sum  
(RooAddPdf &) RooAddPdf::sum[ nsig * gauss + nbkg * argus ] = 0.369501

Convolution of two p.d.f.s

The first example illustrated the use of the addition operator p.d.f. RooAddPdf. It is also possible to construct convolutions of p.d.f.s using the FFT convolution operator p.d.f.

{
   RooWorkspace w("w");

   w.factory("Landau::landau(t[-10,30],ml[5,-20,20],sl[1,0.1,10])");
   w.factory("Gaussian::gauss(t,mg[0],sg[2,0.1,10])");
   auto t = w.var("t");
   auto landau = w.pdf("landau");
   auto gauss = w.pdf("gauss");

   // Construct convoluted pdf lxg(x) = landau(x) (*) gauss(x)
   RooFFTConvPdf lxg("lxg", "landau (X) gauss", *t, *landau, *gauss);

   // Alternative construction method using workspace
   // w.factory("FCONV::lxg(x,landau,gauss)");
}

The above p.d.f. lxg can be used for fitting, plotting and event generation in exactly the same way as the p.d.f. model of the first example. The picture below shows the fitted lxg overlaid on data, as well as the unconvoluted Landau p.d.f. (dashed).

Example of a multi-dimensional p.d.f.

One can also construct multi-dimensional p.d.f.s with and without correlations using the product operator RooProdPdf. The example below shows how to construct a 2-dimensional p.d.f. with correlations of the form F(x|y)*G(y) where the conditional p.d.f. F(x|y) describes the distribution in observable x given a value of y, and p.d.f G(y) describes the distribution in observable y

{
   RooWorkspace w("w");

   // Construct F(x|y) -- a Gaussian in x with a mean that depends on y
   w.factory("PolyVar::meanx(y[-5,5],{a0[-0.5,-5,5],a1[-0.5,-1,1]})");
   w.factory("Gaussian::gaussx(x[-5,5],meanx,sigmax[0.5])");

   // Construct G(y) -- an Exponential in y
   w.factory("Exponential::gaussy(y,-0.2)");

   // Construct M(x,y) = F(x|y)*G(y)
   w.factory("PROD::model(gaussx|y,gaussy)");

   auto x = w.var("x");
   auto model = w.pdf("model");
   model->Print("");
}

Result:
RooProdPdf::model[ gaussy * gaussx|y ] = 0.606531

Fitting, plotting and event generation with multi-dimensional p.d.f.s is very similar to that of one-dimensional p.d.f.s. Continuing the above example, one can use

[...]
   RooDataSet *data = model->generate(RooArgSet(*x, *w.var("y")), 10000);

   model->fitTo(*data);

   auto frame = x->frame();
   data->plotOn(frame);
   model->plotOn(frame);

   frame->Draw();
}

The picture below shows a 2-D visualization of the M(x,y) and the projection on the x observable (i.e. Int M(x,y) dy)) overlaid on the x distribution of the data

Working with likelihood functions and profile likelihood

The previous examples illustrated various techniques to construct probability density functions in RooFit. This examples illustrates various operations that can be applied at the likelihood level.

Given a p.d.f. and a dataset, the likelihood function can be constructed as

RooAbsReal* nll = pdf.createNLL(data);

The likelihood function behaves like a regular RooFit function and can be plotted the same way probability density functions:

RooPlot* frame = myparam.frame();
nll->plotOn(frame);

Since likelihood evaluations are potentially time-consuming, RooFit facilitates calculation of likelihood in parallel on multiple processes. This parallelization process is transparent to the user. To request parallel calculation on 8 processors (on the same host), construct the likelihood function as follows

RooAbsReal* nll = pdf.createNLL(data, NumCPU(8)) ;

One can also construct along similar lines the profile likelihood, which is the likelihood minimized w.r.t. the nuisance parameters, i.e for a likelihood L(p,q) where p is a parameter of interest and q is a nuisance parameter, the value of the profile likelihood PL(p) is the value of L(p,q) at the value of q where L(p,q) is lowest. A profile likelihood is construct as follows

RooAbsReal* pll = nll->createProfile(<paramOfInterest>);

This example constructs a toy p.d.f. and dataset and compares a likelihood scan and a profile likelihood scan in one of the parameters:

{
   using namespace RooFit;

   // Construct model
   RooWorkspace w("w");
   // Here, cast into appropriate type directly. Since return type overloading is impossible in C++, this has to be done manually
   auto g1 = (RooAbsPdf*) w.factory("Gaussian::g1(x[-20,20],mean[-10,10],sigma_g1[3])");
   auto g2 = (RooAbsPdf*) w.factory("Gaussian::g2(x,mean,sigma_g2[4,3,6])");
   auto model = (RooAbsPdf*) w.factory("SUM::model(frac[0.5,0,1]*g1,g2)");

   auto x = w.var("x");
   auto frac = w.var("frac");

   // Generate 1000 events 
   RooDataSet* data = model->generate(*x, 1000);

   // Create likelihood function
   RooAbsReal* nll = model->createNLL(*data, NumCPU(2));

   // Minimize likelihood
   RooMinuit(*nll).migrad();

   // Plot likelihood scan in parameter frac
   RooPlot* frame1 = frac->frame(Bins(10), Range(0.01,0.95));
   nll->plotOn(frame1,ShiftToZero());

   // Plot the profile likelihood in frac
   RooAbsReal* pll_frac = nll->createProfile(*frac);
   pll_frac->plotOn(frame1, LineColor(kRed));

   frame1->Draw();
}

The resulting plot containing the likelihood and the profile likelihood in the frac parameter, as well as a similar plot for the sigma_g2 parameter is shown below.

Python

From Python, the same would look like this:

import ROOT

# Construct model
w = ROOT.RooWorkspace("w")
g1 = w.factory("Gaussian::g1(x[-20,20],mean[-10,10],sigma_g1[3])")
g2 = w.factory("Gaussian::g2(x,mean,sigma_g2[4,3,6])")
model = w.factory("SUM::model(frac[0.5,0,1]*g1,g2)")

x = w.var("x")
frac = w.var("frac")

# Generate 1000 events
varsForGeneration = ROOT.RooArgSet(x)
data = model.generate(varsForGeneration, 1000)

# Create likelihood function
nll = model.createNLL(data, ROOT.RooFit.NumCPU(2))

# Minimize likelihood
minimiser = ROOT.RooMinuit(nll)
minimiser.migrad()

# Plot likelihood scan in parameter frac
frame1 = frac.frame(ROOT.RooFit.Bins(10), ROOT.RooFit.Range(0.01,0.95))
nll.plotOn(frame1, ROOT.RooFit.ShiftToZero())

# Plot the profile likelihood in frac
profileVariables = ROOT.RooArgSet(frac)
pll_frac = nll.createProfile(profileVariables)
pll_frac.plotOn(frame1, ROOT.RooFit.LineColor(ROOT.kRed))

frame1.Draw()

Documentation

Please refer to this page for more documentation

Tutorial macros

A set of tutorial macros, often in both python and C++, is available in the ROOT distribution in $ROOTSYS/tutorials/roofit as well as here.

Doxygen

The detailed documentation of all class methods and data members is available for the core classes and the pdf classes.

Other documentation

Here is a link to a 200 slide presentation on RooFit presented in the French School of Statistics 2008 (slides are in English)

Forum for help and questions

Please post your questions on RooFit in the ROOT Math & Statistics tools forum