Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
rf611_weightedfits.C File Reference

Detailed Description

View in nbviewer Open in SWAN
Likelihood and minimization: Parameter uncertainties for weighted unbinned ML fits

Parameter uncertainties for weighted unbinned ML fits

Based on example from https://arxiv.org/abs/1911.01303

This example compares different approaches to determining parameter uncertainties in weighted unbinned maximum likelihood fits. Performing a weighted unbinned maximum likelihood fits can be useful to account for acceptance effects and to statistically subtract background events using the sPlot formalism. It is however well known that the inverse Hessian matrix does not yield parameter uncertainties with correct coverage in the presence of event weights. Three approaches to the determination of parameter uncertainties are compared in this example:

  1. Using the inverse weighted Hessian matrix [SumW2Error(false)]
  2. Using the expression [SumW2Error(true)]

    \[ V_{ij} = H_{ik}^{-1} C_{kl} H_{lj}^{-1} \]

    where H is the weighted Hessian matrix and C is the Hessian matrix with squared weights
  3. The asymptotically correct approach (for details please see https://arxiv.org/abs/1911.01303) [Asymptotic(true)]

    \[ V_{ij} = H_{ik}^{-1} D_{kl} H_{lj}^{-1} \]

    where H is the weighted Hessian matrix and D is given by

    \[ D_{kl} = \sum_{e=1}^{N} w_e^2 \frac{\partial \log(P)}{\partial \lambda_k}\frac{\partial \log(P)}{\partial \lambda_l} \]

    with the event weight \(w_e\).

The example performs the fit of a second order polynomial in the angle cos(theta) [-1,1] to a weighted data set. The polynomial is given by

\[ P = \frac{ 1 + c_0 \cdot \cos(\theta) + c_1 \cdot \cos(\theta) \cdot \cos(\theta) }{\mathrm{Norm}} \]

The two coefficients \( c_0 \) and \( c_1 \) and their uncertainties are to be determined in the fit.

The per-event weight is used to correct for an acceptance effect, two different acceptance models can be studied:

  • acceptancemodel==1: eff = \( 0.3 + 0.7 \cdot \cos(\theta) \cdot \cos(\theta) \)
  • acceptancemodel==2: eff = \( 1.0 - 0.7 \cdot \cos(\theta) \cdot \cos(\theta) \) The data is generated to be flat before the acceptance effect.

The performance of the different approaches to determine parameter uncertainties is compared using the pull distributions from a large number of pseudoexperiments. The pull is defined as \( (\lambda_i - \lambda_{gen})/\sigma(\lambda_i) \), where \( \lambda_i \) is the fitted parameter and \( \sigma(\lambda_i) \) its uncertainty for pseudoexperiment number i. If the fit is unbiased and the parameter uncertainties are estimated correctly, the pull distribution should be a Gaussian centered around zero with a width of one.

#include "TH1D.h"
#include "TCanvas.h"
#include "TROOT.h"
#include "TStyle.h"
#include "TRandom3.h"
#include "TLegend.h"
#include "RooRealVar.h"
#include "RooFitResult.h"
#include "RooDataSet.h"
#include "RooPolynomial.h"
using namespace RooFit;
int rf611_weightedfits(int acceptancemodel=2) {
// I n i t i a l i s a t i o n a n d S e t u p
//------------------------------------------------
//plotting options
gStyle->SetTitleSize(0.05, "XY");
gStyle->SetLabelSize(0.05, "XY");
gStyle->SetTitleOffset(0.9, "XY");
//initialise TRandom3
TRandom3* rnd = new TRandom3();
rnd->SetSeed(191101303);
//accepted events and events weighted to account for the acceptance
TH1D* haccepted = new TH1D("haccepted", "Generated events;cos(#theta);#events", 40, -1.0, 1.0);
TH1D* hweighted = new TH1D("hweighted", "Generated events;cos(#theta);#events", 40, -1.0, 1.0);
//histograms holding pull distributions
//using the inverse Hessian matrix
TH1D* hc0pull1 = new TH1D("hc0pull1", "Inverse weighted Hessian matrix [SumW2Error(false)];Pull (c_{0}^{fit}-c_{0}^{gen})/#sigma(c_{0});", 20, -5.0, 5.0);
TH1D* hc1pull1 = new TH1D("hc1pull1", "Inverse weighted Hessian matrix [SumW2Error(false)];Pull (c_{1}^{fit}-c_{1}^{gen})/#sigma(c_{1});", 20, -5.0, 5.0);
//using the correction with the Hessian matrix with squared weights
TH1D* hc0pull2 = new TH1D("hc0pull2", "Hessian matrix with squared weights [SumW2Error(true)];Pull (c_{0}^{fit}-c_{0}^{gen})/#sigma(c_{0});", 20, -5.0, 5.0);
TH1D* hc1pull2 = new TH1D("hc1pull2", "Hessian matrix with squared weights [SumW2Error(true)];Pull (c_{1}^{fit}-c_{1}^{gen})/#sigma(c_{1});", 20, -5.0, 5.0);
//asymptotically correct approach
TH1D* hc0pull3 = new TH1D("hc0pull3", "Asymptotically correct approach [Asymptotic(true)];Pull (c_{0}^{fit}-c_{0}^{gen})/#sigma(c_{0});", 20, -5.0, 5.0);
TH1D* hc1pull3 = new TH1D("hc1pull3", "Asymptotically correct approach [Asymptotic(true)];Pull (c_{1}^{fit}-c_{1}^{gen})/#sigma(c_{1});", 20, -5.0, 5.0);
//number of pseudoexperiments (toys) and number of events per pseudoexperiment
constexpr unsigned int ntoys = 500;
constexpr unsigned int nstats = 5000;
//parameters used in the generation
constexpr double c0gen = 0.0;
constexpr double c1gen = 0.0;
// Silence fitting and minimisation messages
auto& msgSv = RooMsgService::instance();
msgSv.getStream(1).removeTopic(RooFit::Minimization);
msgSv.getStream(1).removeTopic(RooFit::Fitting);
std::cout << "Running " << ntoys*3 << " toy fits ..." << std::endl;
// M a i n l o o p : r u n p s e u d o e x p e r i m e n t s
//----------------------------------------------------------------
for (unsigned int i=0; i<ntoys; i++) {
//S e t u p p a r a m e t e r s a n d P D F
//-----------------------------------------------
//angle theta and the weight to account for the acceptance effect
RooRealVar costheta("costheta","costheta", -1.0, 1.0);
RooRealVar weight("weight","weight", 0.0, 1000.0);
//initialise parameters to fit
RooRealVar c0("c0","0th-order coefficient", c0gen, -1.0, 1.0);
RooRealVar c1("c1","1st-order coefficient", c1gen, -1.0, 1.0);
c0.setError(0.01);
c1.setError(0.01);
//create simple second-order polynomial as probability density function
RooPolynomial pol("pol", "pol", costheta, RooArgList(c0, c1), 1);
//G e n e r a t e d a t a s e t f o r p s e u d o e x p e r i m e n t i
//-------------------------------------------------------------------------------
RooDataSet data("data","data",RooArgSet(costheta, weight), WeightVar("weight"));
//generate nstats events
for (unsigned int j=0; j<nstats; j++) {
bool finished = false;
//use simple accept/reject for generation
while (!finished) {
costheta = 2.0*rnd->Rndm()-1.0;
//efficiency for the specific value of cos(theta)
double eff = 1.0;
if (acceptancemodel == 1)
eff = 1.0 - 0.7 * costheta.getVal()*costheta.getVal();
else
eff = 0.3 + 0.7 * costheta.getVal()*costheta.getVal();
//use 1/eff as weight to account for acceptance
weight = 1.0/eff;
//accept/reject
if (10.0*rnd->Rndm() < eff*pol.getVal())
finished = true;
}
haccepted->Fill(costheta.getVal());
hweighted->Fill(costheta.getVal(), weight.getVal());
data.add(RooArgSet(costheta, weight), weight.getVal());
}
//F i t t o y u s i n g t h e t h r e e d i f f e r e n t a p p r o a c h e s t o u n c e r t a i n t y d e t e r m i n a t i o n
//-------------------------------------------------------------------------------------------------------------------------------------------------
//this uses the inverse weighted Hessian matrix
std::unique_ptr<RooFitResult> result{pol.fitTo(data, Save(true), SumW2Error(false), PrintLevel(-1), EvalBackend("cpu"))};
hc0pull1->Fill((c0.getVal()-c0gen)/c0.getError());
hc1pull1->Fill((c1.getVal()-c1gen)/c1.getError());
//this uses the correction with the Hesse matrix with squared weights
result = std::unique_ptr<RooFitResult>{pol.fitTo(data, Save(true), SumW2Error(true), PrintLevel(-1), EvalBackend("cpu"))};
hc0pull2->Fill((c0.getVal()-c0gen)/c0.getError());
hc1pull2->Fill((c1.getVal()-c1gen)/c1.getError());
//this uses the asymptotically correct approach
result = std::unique_ptr<RooFitResult>{pol.fitTo(data, Save(true), AsymptoticError(true), PrintLevel(-1), EvalBackend("cpu"))};
hc0pull3->Fill((c0.getVal()-c0gen)/c0.getError());
hc1pull3->Fill((c1.getVal()-c1gen)/c1.getError());
}
std::cout << "... done." << std::endl;
// P l o t o u t p u t d i s t r i b u t i o n s
//--------------------------------------------------
//plot accepted (weighted) events
TCanvas* cevents = new TCanvas("cevents", "cevents", 800, 600);
cevents->cd(1);
hweighted->SetMinimum(0.0);
hweighted->SetLineColor(2);
hweighted->Draw("hist");
haccepted->Draw("same hist");
TLegend* leg = new TLegend(0.6, 0.8, 0.9, 0.9);
leg->AddEntry(haccepted, "Accepted");
leg->AddEntry(hweighted, "Weighted");
leg->Draw();
cevents->Update();
//plot pull distributions
TCanvas* cpull = new TCanvas("cpull", "cpull", 1200, 800);
cpull->Divide(3,2);
cpull->cd(1);
hc0pull1->Fit("gaus");
hc0pull1->Draw("ep");
cpull->cd(2);
hc0pull2->Fit("gaus");
hc0pull2->Draw("ep");
cpull->cd(3);
hc0pull3->Fit("gaus");
hc0pull3->Draw("ep");
cpull->cd(4);
hc1pull1->Fit("gaus");
hc1pull1->Draw("ep");
cpull->cd(5);
hc1pull2->Fit("gaus");
hc1pull2->Draw("ep");
cpull->cd(6);
hc1pull3->Fit("gaus");
hc1pull3->Draw("ep");
cpull->Update();
return 0;
}
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void data
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t result
R__EXTERN TStyle * gStyle
Definition TStyle.h:433
RooArgList is a container object that can hold multiple RooAbsArg objects.
Definition RooArgList.h:22
RooArgSet is a container object that can hold multiple RooAbsArg objects.
Definition RooArgSet.h:55
RooDataSet is a container class to hold unbinned data.
Definition RooDataSet.h:57
static RooMsgService & instance()
Return reference to singleton instance.
RooPolynomial implements a polynomial p.d.f of the form.
RooRealVar represents a variable that can be changed from the outside.
Definition RooRealVar.h:37
virtual void SetLineColor(Color_t lcolor)
Set the line color.
Definition TAttLine.h:40
virtual void SetMarkerStyle(Style_t mstyle=1)
Set the marker style.
Definition TAttMarker.h:40
virtual void SetMarkerSize(Size_t msize=1)
Set the marker size.
Definition TAttMarker.h:45
virtual void SetTextSize(Float_t tsize=1)
Set the text size.
Definition TAttText.h:47
The Canvas class.
Definition TCanvas.h:23
TVirtualPad * cd(Int_t subpadnumber=0) override
Set current canvas & pad.
Definition TCanvas.cxx:716
void Update() override
Update canvas pad buffers.
Definition TCanvas.cxx:2475
1-D histogram with a double per channel (see TH1 documentation)}
Definition TH1.h:620
virtual TFitResultPtr Fit(const char *formula, Option_t *option="", Option_t *goption="", Double_t xmin=0, Double_t xmax=0)
Fit histogram with function fname.
Definition TH1.cxx:3901
virtual Int_t Fill(Double_t x)
Increment bin with abscissa X by 1.
Definition TH1.cxx:3345
void Draw(Option_t *option="") override
Draw this histogram with options.
Definition TH1.cxx:3067
virtual void SetMinimum(Double_t minimum=-1111)
Definition TH1.h:401
This class displays a legend box (TPaveText) containing several legend entries.
Definition TLegend.h:23
void Divide(Int_t nx=1, Int_t ny=1, Float_t xmargin=0.01, Float_t ymargin=0.01, Int_t color=0) override
Automatic pad generation by division.
Definition TPad.cxx:1153
Random number generator class based on M.
Definition TRandom3.h:27
Double_t Rndm() override
Machine independent random number generator.
Definition TRandom3.cxx:99
void SetSeed(ULong_t seed=0) override
Set the random generator sequence if seed is 0 (default value) a TUUID is generated and used to fill ...
Definition TRandom3.cxx:206
void SetPadTopMargin(Float_t margin=0.1)
Definition TStyle.h:356
void SetOptStat(Int_t stat=1)
The type of information printed in the histogram statistics box can be selected via the parameter mod...
Definition TStyle.cxx:1636
void SetPadBottomMargin(Float_t margin=0.1)
Definition TStyle.h:355
void SetPaintTextFormat(const char *format="g")
Definition TStyle.h:383
void SetEndErrorSize(Float_t np=2)
Set the size (in pixels) of the small lines drawn at the end of the error bars (TH1 or TGraphErrors).
Definition TStyle.cxx:1336
void SetPadRightMargin(Float_t margin=0.1)
Definition TStyle.h:358
void SetTitleOffset(Float_t offset=1, Option_t *axis="X")
Specify a parameter offset to control the distance between the axis and the axis title.
Definition TStyle.cxx:1794
void SetPadLeftMargin(Float_t margin=0.1)
Definition TStyle.h:357
void SetHistLineColor(Color_t color=1)
Definition TStyle.h:377
void SetTitleSize(Float_t size=0.02, Option_t *axis="X")
Definition TStyle.cxx:1813
void SetHistLineWidth(Width_t width=1)
Definition TStyle.h:380
void SetLabelSize(Float_t size=0.04, Option_t *axis="X")
Set size of axis labels.
Definition TStyle.cxx:1440
void SetOptFit(Int_t fit=1)
The type of information about fit parameters printed in the histogram statistics box can be selected ...
Definition TStyle.cxx:1589
RooCmdArg AsymptoticError(bool flag)
RooCmdArg Save(bool flag=true)
RooCmdArg SumW2Error(bool flag)
RooCmdArg PrintLevel(Int_t code)
return c1
Definition legend1.C:41
leg
Definition legend1.C:34
The namespace RooFit contains mostly switches that change the behaviour of functions of PDFs (or othe...
Definition JSONIO.h:26
Running 1500 toy fits ...
... done.
****************************************
Minimizer is Minuit2 / Migrad
Chi2 = 13.0237
NDf = 12
Edm = 1.95569e-06
NCalls = 53
Constant = 80.8095 +/- 4.62329
Mean = 0.000448726 +/- 0.055767
Sigma = 1.20469 +/- 0.0430167 (limited)
****************************************
Minimizer is Minuit2 / Migrad
Chi2 = 6.16188
NDf = 9
Edm = 8.41795e-06
NCalls = 53
Constant = 97.9367 +/- 5.62558
Mean = 0.0045812 +/- 0.0461646
Sigma = 1.00838 +/- 0.0369873 (limited)
****************************************
Minimizer is Minuit2 / Migrad
Chi2 = 5.92322
NDf = 9
Edm = 8.00253e-06
NCalls = 53
Constant = 96.7162 +/- 5.56394
Mean = 0.013144 +/- 0.0471138
Sigma = 1.02199 +/- 0.0377697 (limited)
****************************************
Minimizer is Minuit2 / Migrad
Chi2 = 9.99353
NDf = 12
Edm = 7.71043e-09
NCalls = 52
Constant = 73.133 +/- 4.15612
Mean = -0.00353982 +/- 0.0624075
Sigma = 1.34426 +/- 0.0486401 (limited)
****************************************
Minimizer is Minuit2 / Migrad
Chi2 = 13.8263
NDf = 17
Edm = 7.99952e-08
NCalls = 60
Constant = 37.3642 +/- 2.43124
Mean = -0.356131 +/- 0.124662
Sigma = 2.2529 +/- 0.112506 (limited)
****************************************
Minimizer is Minuit2 / Migrad
Chi2 = 6.06878
NDf = 8
Edm = 7.02517e-06
NCalls = 53
Constant = 99.6685 +/- 5.51119
Mean = 0.0292424 +/- 0.0461063
Sigma = 0.995578 +/- 0.0338752 (limited)
(int) 0
Date
November 2019
Author
Christoph Langenbruch

Definition in file rf611_weightedfits.C.