Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
TSPlot.cxx
Go to the documentation of this file.
1// @(#)root/splot:$Id$
2// Author: Muriel Pivk, Anna Kreshuk 10/2005
3
4/**********************************************************************
5 * *
6 * Copyright (c) 2005 ROOT Foundation, CERN/PH-SFT *
7 * *
8 **********************************************************************/
9
10#include "TSPlot.h"
11#include "TVirtualFitter.h"
12#include "TH1.h"
13#include "TTreePlayer.h"
14#include "TTreeFormula.h"
15#include "TTreeFormulaManager.h"
16#include "TSelectorDraw.h"
17#include "TBrowser.h"
18#include "TClass.h"
19#include "TMath.h"
20#include "snprintf.h"
21
22#include <sstream>
23#include <string>
24
25extern void Yields(Int_t &, Double_t *, Double_t &f, Double_t *x, Int_t iflag);
26
27
28/** \class TSPlot
29
30\legacy{TSPlot}
31
32A common method used in High Energy Physics to perform measurements is
33the maximum Likelihood method, exploiting discriminating variables to
34disentangle signal from background. The crucial point for such an
35analysis to be reliable is to use an exhaustive list of sources of
36events combined with an accurate description of all the Probability
37Density Functions (PDF).
38
39To assess the validity of the fit, a convincing quality check
40is to explore further the data sample by examining the distributions of
41control variables. A control variable can be obtained for instance by
42removing one of the discriminating variables before performing again
43the maximum Likelihood fit: this removed variable is a control
44variable. The expected distribution of this control variable, for
45signal, is to be compared to the one extracted, for signal, from the
46data sample. In order to be able to do so, one must be able to unfold
47from the distribution of the whole data sample.
48
49The TSPlot method allows to reconstruct the distributions for
50the control variable, independently for each of the various sources of
51events, without making use of any <em>a priori</em> knowledge on <u>this</u>
52variable. The aim is thus to use the knowledge available for the
53discriminating variables to infer the behaviour of the individual
54sources of events with respect to the control variable.
55
56TSPlot is optimal if the control variable is uncorrelated with the discriminating variables.
57
58
59A detail description of the formalism itself, called \f$\hbox{$_s$}{\cal P}lot\f$, is given
60in [[1](https://arxiv.org/abs/physics/0402083)].
61
62### The method
63
64
65The \f$\hbox{$_s$}{\cal P}lot\f$ technique is developed in the above context of a
66maximum Likelihood method making use of discriminating variables.
67
68One considers a data sample in which are merged several species
69of events. These species represent various signal components and
70background components which all together account for the data sample.
71The different terms of the log-Likelihood are:
72
73 - \f$N\f$ : the total number of events in the data sample,
74 - \f${\rm N}_{\rm s}\f$ : the number of species of events populating the data sample,
75 - \f$N_i\f$ : the number of events expected on the average for the \f$i^{\rm th}\f$ species,
76 - \f${\rm f}_i(y_e)\f$ : the value of the PDFs of the discriminating variables
77 \f$y\f$ for the\f$i^{th}\f$ species and for event\f$e\f$,
78 - \f$x\f$ : the set of control variables which, by definition, do not appear in
79 the expression of the Likelihood function \f${\cal L}\f$.
80
81The extended log-Likelihood reads:
82
83 \f[
84{\cal L}=\sum_{e=1}^{N}\ln \Big\{ \sum_{i=1}^{{\rm N}_{\rm s}}N_i{\rm f}_i(y_e) \Big\} -\sum_{i=1}^{{\rm N}_{\rm s}}N_i \tag{1}
85\f]
86
87From this expression, after maximization of \f${\cal L}\f$ with respect to the \f$N_i\f$ parameters,
88a weight can be computed for every event and each species, in order to obtain later the true distribution
89\f$\hbox{M}_i(x)\f$ of variable \f$x\f$. If \f${\rm n}\f$ is one of the
90 \f${\rm N}_{\rm s}\f$ species present in the data sample, the weight for this species is defined by:
91
92
93\f[
94\fbox{$
95{_s{\cal P}}_{\rm n}(y_e)={\sum_{j=1}^{{\rm N}_{\rm s}} \hbox{V}_{{\rm n}j}{\rm f}_j(y_e)\over\sum_{k=1}^{{\rm N}_{\rm s}}N_k{\rm f}_k(y_e) } $} , \tag{2}
96\f]
97
98
99where \f$\hbox{V}_{{\rm n}j}\f$
100
101is the covariance matrix resulting from the Likelihood maximization.
102This matrix can be used directly from the fit, but this is numerically
103less accurate than the direct computation:
104
105
106\f[
107\hbox{ V}^{-1}_{{\rm n}j}~=~
108{\partial^2(-{\cal L})\over\partial N_{\rm n}\partial N_j}~=~
109\sum_{e=1}^N {{\rm f}_{\rm n}(y_e){\rm f}_j(y_e)\over(\sum_{k=1}^{{\rm N}_{\rm s}}N_k{\rm f}_k(y_e))^2} . \tag{3}
110\f]
111
112
113The distribution of the control variable \f$x\f$ obtained by histogramming the weighted
114events reproduces, on average, the true distribution
115\f${\hbox{ {M}}}_{\rm n}(x)\f$
116
117The class TSPlot allows to reconstruct the true distribution
118\f${\hbox{ {M}}}_{\rm n}(x)\f$
119
120of a control variable \f$x\f$ for each of the \f${\rm N}_{\rm s}\f$ species from
121the sole knowledge of the PDFs of the discriminating variables \f${\rm f}_i(y)\f$.
122The plots obtained thanks to the TSPlot class are called \f$\hbox {$_s$}{\cal P}lots\f$.
123
124
125### Some properties and checks
126
127
128Beside reproducing the true distribution,\f$\hbox {$_s$}{\cal P}lots\f$ bear remarkable properties:
129
130
131 - Each \f$x\f$ - distribution is properly normalized:
132
133\f[
134\sum_{e=1}^{N} {_s{\cal P}}_{\rm n}(y_e)~=~N_{\rm n} ~. \tag{4}
135\f]
136
137
138 - For any event:
139
140\f[
141\sum_{l=1}^{{\rm N}_{\rm s}} {_s{\cal P}}_l(y_e) ~=~1 ~. \tag{5}
142\f]
143
144That is to say that, summing up the \f${\rm N}_{\rm s}\f$ \f$\hbox {$_s$}{\cal P}lots\f$,
145one recovers the data sample distribution in \f$x\f$, and summing up the number of events
146entering in a \f$\hbox{$_s$}{\cal P}lot\f$ for a given species, one recovers the yield of the
147species, as provided by the fit.
148The property 4 is implemented in the TSPlot class as a check.
149
150 - the sum of the statistical uncertainties per bin
151
152\f[
153\sigma[N_{\rm n}\ _s\tilde{\rm M}_{\rm n}(x) {\delta x}]~=~\sqrt{\sum_{e \subset {\delta x}} ({_s{\cal P}}_{\rm n})^2} ~. \tag{6}
154\f]
155
156reproduces the statistical uncertainty on the yield \f$N_{\rm n}\f$, as provided by the fit:
157\f$\sigma[N_{\rm n}]\equiv\sqrt{\hbox{ V}_{{\rm n}{\rm n}}}\f$ .
158Because of that and since the determination of the yields is optimal
159when obtained using a Likelihood fit, one can conclude that the \f$\hbox{$_s$}{\cal P}lot\f$
160technique is itself an optimal method to reconstruct distributions of control variables.
161
162
163### Different steps followed by TSPlot
164
165
166 1. A maximum Likelihood fit is performed to obtain the yields \f$N_i\f$
167 of the various species.The fit relies on discriminating variables \f$y\f$
168 uncorrelated with a control variable \f$x\f$:
169 the later is therefore totally absent from the fit.
170
171 2. The weights \f${_s{\cal P}}\f$ are calculated using Eq.
172 (2)
173 where the covariance matrix is taken from Minuit.
174
175 3. Histograms of \f$x\f$ are filled by weighting the events with \f${_s{\cal P}}\f$ .
176
177 4. Error bars per bin are given by Eq. (6).
178
179
180The \f$\hbox {$_s$}{\cal P}lots\f$ reproduce the true distributions of the species
181in the control variable \f$x\f$, within the above defined statistical uncertainties.
182
183### Illustrations
184
185
186To illustrate the technique, one considers an example derived from the analysis where
187\f$\hbox {$_s$}{\cal P}lots\f$
188have been first used (charmless B decays). One is dealing with a data
189sample in which two species are present: the first is termed signal and
190the second background. A maximum Likelihood fit is performed to obtain
191the two yields \f$N_1\f$ and \f$N_2\f$ . The fit relies on two discriminating
192variables collectively denoted \f$y\f$ which are chosen within three possible
193variables denoted \f${m_{\rm ES}}\f$ , \f$\Delta E\f$ and \f${\cal F}\f$.
194The variable which is not incorporated in \f$y\f$ is used as the control variable
195\f$x\f$ . The six distributions of the three variables are assumed to be the ones
196depicted in Fig. 1.
197
198
199\image html splot_pdfmesNIM.png Figure 1 width=800
200
201Distributions of the three discriminating variables available to perform the Likelihood fit:
202\f${m_{\rm ES}}\f$ , \f$\Delta E\f$ , \f${\cal F}\f$ .
203Among the three variables, two are used to perform the fit while one is
204kept out of the fit to serve the purpose of a control variable. The
205three distributions on the top (resp. bottom) of the figure correspond
206to the signal (resp. background). The unit of the vertical axis is
207chosen such that it indicates the number of entries per bin, if one
208slices the histograms in 25 bins.
209
210A data sample being built through a Monte Carlo simulation based on the
211distributions shown in Fig.
2121,
213one obtains the three distributions of Fig. 2.
214Whereas the distribution of \f$\Delta E\f$ clearly indicates the presence of the signal,
215the distribution of \f${m_{\rm ES}}\f$ and \f${\cal F}\f$ are less obviously populated by signal.
216
217
218\image html splot_genfiTOTNIM.png Figure 2 width=800
219
220Distributions of the three discriminating variables for signal plus
221background. The three distributions are the ones obtained from a data
222sample obtained through a Monte Carlo simulation based on the
223distributions shown in Fig.
2241.
225The data sample consists of 500 signal events and 5000 background events.
226
227
228Choosing \f$\Delta E\f$ and \f${\cal F}\f$ as discriminating variables to determine
229\f$N_1\f$ and \f$N_2\f$ through a maximum Likelihood fit, one builds, for the control
230variable \f${m_{\rm ES}}\f$ which is unknown to the fit, the two \f$\hbox {$_s$}{\cal P}lots\f$
231for signal and background shown in
232Fig. 3.
233One observes that the \f$\hbox{$_s$}{\cal P}lot\f$
234for signal reproduces correctly the PDF even where the latter vanishes,
235although the error bars remain sizeable. This results from the almost
236complete cancellation between positive and negative weights: the sum of
237weights is close to zero while the sum of weights squared is not. The
238occurrence of negative weights occurs through the appearance of the
239covariance matrix, and its negative components, in the definition of
240Eq. (2).
241
242
243A word of caution is in order with respect to the error bars. Whereas
244their sum in quadrature is identical to the statistical uncertainties
245of the yields determined by the fit, and if, in addition, they are
246asymptotically correct, the error bars should be handled with care for
247low statistics and/or for too fine binning. This is because the error
248bars do not incorporate two known properties of the PDFs: PDFs are
249positive definite and can be non-zero in a given x-bin, even if in the
250particular data sample at hand, no event is observed in this bin. The
251latter limitation is not specific to \f$\hbox {$_s$}{\cal P}lots\f$ ,
252rather it is always present when one is willing to infer the PDF at the
253origin of an histogram, when, for some bins, the number of entries does
254not guaranty the applicability of the Gaussian regime. In such
255situations, a satisfactory practice is to attach allowed ranges to the
256histogram to indicate the upper and lower limits of the PDF value which
257are consistent with the actual observation, at a given confidence
258level.
259
260
261\image html splot_mass-bkg-sPlot.png Figure 3 width=600
262
263The \f$\hbox {$_s$}{\cal P}lots\f$ (signal on top, background on bottom)
264obtained for \f${m_{\rm ES}}\f$ are represented as dots with error bars.
265They are obtained from a fit using only information from \f$\Delta E\f$ and
266\f${\cal F}\f$
267
268<p>
269Choosing \f${m_{\rm ES}}\f$ and \f$\Delta E\f$ as discriminating variables to
270determine \f$N_1\f$ and \f$N_2\f$ through a maximum Likelihood fit, one builds,
271for the control variable \f${\cal F}\f$ which is unknown to the fit, the two
272\f$\hbox {$_s$}{\cal P}lots\f$ for signal and background shown in
273Fig. 4.
274In the \f$\hbox{$_s$}{\cal P}lot\f$ for signal one observes that error bars are
275the largest in the \f$x\f$ regions where the background is the largest.
276
277
278\image html splot_fisher-bkg-sPlot.png Figure 4 width=600
279
280The \f$\hbox {$_s$}{\cal P}lots\f$ (signal on top, background on bottom) obtained
281for \f${\cal F}\f$ are represented as dots with error bars. They are obtained
282from a fit using only information from \f${m_{\rm ES}}\f$ and \f$\Delta E\f$
283
284The results above can be obtained by running the tutorial TestSPlot.C
285*/
286
287
288////////////////////////////////////////////////////////////////////////////////
289/// default constructor (used by I/O only)
290
292{
293 fNx = 0;
294 fNy=0;
295 fNevents = 0;
296 fNSpecies=0;
297 fNumbersOfEvents=nullptr;
298}
299
300////////////////////////////////////////////////////////////////////////////////
301/// Normal TSPlot constructor
302/// - nx : number of control variables
303/// - ny : number of discriminating variables
304/// - ne : total number of events
305/// - ns : number of species
306/// - tree: input data
307
309{
310 fNx = nx;
311 fNy=ny;
312 fNevents = ne;
313 fNSpecies=ns;
314
319 fTree = tree;
320 fNumbersOfEvents = nullptr;
321}
322
323////////////////////////////////////////////////////////////////////////////////
324/// Destructor
325
327{
329 delete [] fNumbersOfEvents;
330 if (!fXvarHists.IsEmpty())
332 if (!fYvarHists.IsEmpty())
334 if (!fYpdfHists.IsEmpty())
336}
337
338////////////////////////////////////////////////////////////////////////////////
339/// To browse the histograms
340
342{
343 if (!fSWeightsHists.IsEmpty()) {
344 TIter next(&fSWeightsHists);
345 TH1D* h = nullptr;
346 while ((h = (TH1D*)next()))
347 b->Add(h,h->GetName());
348 }
349
350 if (!fYpdfHists.IsEmpty()) {
351 TIter next(&fYpdfHists);
352 TH1D* h = nullptr;
353 while ((h = (TH1D*)next()))
354 b->Add(h,h->GetName());
355 }
356 if (!fYvarHists.IsEmpty()) {
357 TIter next(&fYvarHists);
358 TH1D* h = nullptr;
359 while ((h = (TH1D*)next()))
360 b->Add(h,h->GetName());
361 }
362 if (!fXvarHists.IsEmpty()) {
363 TIter next(&fXvarHists);
364 TH1D* h = nullptr;
365 while ((h = (TH1D*)next()))
366 b->Add(h,h->GetName());
367 }
368 b->Add(&fSWeights, "sWeights");
369}
370
371////////////////////////////////////////////////////////////////////////////////
372/// Set the initial number of events of each species - used
373/// as initial estimates in minuit
374
382
383////////////////////////////////////////////////////////////////////////////////
384/// Calculates the sWeights
385///
386/// The option controls the print level
387/// - "Q" - no print out
388/// - "V" - prints the estimated `#of events` in species - default
389/// - "VV" - as "V" + the minuit printing + sums of weights for control
390
392{
393
394 if (!fNumbersOfEvents){
395 Error("MakeSPlot","Initial numbers of events in species have not been set");
396 return;
397 }
398 Int_t i, j, ispecies;
399
400 TString opt = option;
401 opt.ToUpper();
402 opt.ReplaceAll("VV", "W");
403
404 //make sure that global fitter is minuit
405 char s[]="TFitter";
408 if (strdiff!=0)
410 }
411
412
415
416 //now let's do it, excluding different yvars
417 //for iplot = -1 none is excluded
418 for (Int_t iplot=-1; iplot<fNy; iplot++){
419 for (i=0; i<fNevents; i++){
421 fPdfTot(i, ispecies)=1;
422 for (j=0; j<fNy; j++){
423 if (j!=iplot)
425 }
426 }
427 }
428 minuit->Clear();
429 minuit->SetFCN(Yields);
430 Double_t arglist[10];
431 //set the print level
432 if (opt.Contains("Q")||opt.Contains("V")){
433 arglist[0]=-1;
434 }
435 if (opt.Contains("W"))
436 arglist[0]=0;
437 minuit->ExecuteCommand("SET PRINT", arglist, 1);
438
439 minuit->SetObjectFit(&fPdfTot); //a tricky way to get fPdfTot matrix to fcn
441 minuit->SetParameter(ispecies, "", fNumbersOfEvents[ispecies], 1, 0, 0);
442
443 minuit->ExecuteCommand("MIGRAD", arglist, 0);
446 if (!opt.Contains("Q"))
447 printf("estimated #of events in species %d = %f\n", ispecies, fNumbersOfEvents[ispecies]);
448 }
449 if (!opt.Contains("Q"))
450 printf("\n");
451 Double_t *covmat = minuit->GetCovarianceMatrix();
453
454 if (opt.Contains("W")){
456 for (i=0; i<fNSpecies; i++){
457 sumweight[i]=0;
458 for (j=0; j<fNevents; j++)
459 sumweight[i]+=fSWeights(j, (iplot+1)*fNSpecies + i);
460 printf("checking sum of weights[%d]=%f\n", i, sumweight[i]);
461 }
462 printf("\n");
463 delete [] sumweight;
464 }
465 }
466}
467
468////////////////////////////////////////////////////////////////////////////////
469/// Computes the sWeights from the covariance matrix
470
472{
473 Int_t i, ispecies, k;
474 Double_t numerator, denominator;
475 for (i=0; i<fNevents; i++){
476 denominator=0;
478 denominator+=fNumbersOfEvents[ispecies]*fPdfTot(i, ispecies);
480 numerator=0;
481 for (k=0; k<fNSpecies; k++)
482 numerator+=covmat[ispecies*fNSpecies+k]*fPdfTot(i, k);
483 fSWeights(i, (i_excl+1)*fNSpecies + ispecies)=numerator/denominator;
484 }
485 }
486
487}
488
489////////////////////////////////////////////////////////////////////////////////
490/// Returns the matrix of sweights
491
493{
494 if (weights.GetNcols()!=fNSpecies*(fNy+1) || weights.GetNrows()!=fNevents)
495 weights.ResizeTo(fNevents, fNSpecies*(fNy+1));
496 weights = fSWeights;
497}
498
499////////////////////////////////////////////////////////////////////////////////
500/// Returns the matrix of sweights. It is assumed that the array passed in the
501/// argurment is big enough
502
504{
505 for (Int_t i=0; i<fNevents; i++){
506 for (Int_t j=0; j<fNSpecies; j++){
507 weights[i*fNSpecies+j]=fSWeights(i, j);
508 }
509 }
510}
511
512////////////////////////////////////////////////////////////////////////////////
513/// Fills the histograms of x variables (not weighted) with nbins
514
516{
517 Int_t i, j;
518
519 if (!fXvarHists.IsEmpty()){
520 if (((TH1D*)fXvarHists.First())->GetNbinsX()!=nbins)
522 else
523 return;
524 }
525
526 //make the histograms
527 for (i=0; i<fNx; i++){
528 std::string name = "x" + std::to_string(i);
529 TH1D *h = new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, i), fMinmax(1, i));
530 for (j=0; j<fNevents; j++)
531 h->Fill(fXvar(j, i));
533 }
534
535}
536
537////////////////////////////////////////////////////////////////////////////////
538/// Returns the array of histograms of x variables (not weighted).
539/// If histograms have not already
540/// been filled, they are filled with default binning 100.
541
543{
544 Int_t nbins = 100;
545 if (fXvarHists.IsEmpty())
546 FillXvarHists(nbins);
547 else if (((TH1D*)fXvarHists.First())->GetNbinsX()!=nbins)
548 FillXvarHists(nbins);
549 return &fXvarHists;
550}
551
552////////////////////////////////////////////////////////////////////////////////
553///Returns the histogram of variable ixvar.
554/// If histograms have not already
555/// been filled, they are filled with default binning 100.
556
558{
559 Int_t nbins = 100;
560 if (fXvarHists.IsEmpty())
561 FillXvarHists(nbins);
562 else if (((TH1D*)fXvarHists.First())->GetNbinsX()!=nbins)
563 FillXvarHists(nbins);
564
566}
567
568////////////////////////////////////////////////////////////////////////////////
569/// Fill the histograms of y variables
570
572{
573 Int_t i, j;
574
575 if (!fYvarHists.IsEmpty()){
576 if (((TH1D*)fYvarHists.First())->GetNbinsX()!=nbins)
578 else
579 return;
580 }
581
582 //make the histograms
583 for (i=0; i<fNy; i++){
584 std::string name = "y" + std::to_string(i);
585 TH1D *h=new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, fNx+i), fMinmax(1, fNx+i));
586 for (j=0; j<fNevents; j++)
587 h->Fill(fYvar(j, i));
589 }
590}
591
592////////////////////////////////////////////////////////////////////////////////
593/// Returns the array of histograms of y variables. If histograms have not already
594/// been filled, they are filled with default binning 100.
595
597{
598 Int_t nbins = 100;
599 if (fYvarHists.IsEmpty())
600 FillYvarHists(nbins);
601 else if (((TH1D*)fYvarHists.First())->GetNbinsX()!=nbins)
602 FillYvarHists(nbins);
603 return &fYvarHists;
604}
605
606////////////////////////////////////////////////////////////////////////////////
607/// Returns the histogram of variable iyvar.If histograms have not already
608/// been filled, they are filled with default binning 100.
609
611{
612 Int_t nbins = 100;
613 if (fYvarHists.IsEmpty())
614 FillYvarHists(nbins);
615 else if (((TH1D*)fYvarHists.First())->GetNbinsX()!=nbins)
616 FillYvarHists(nbins);
618}
619
620////////////////////////////////////////////////////////////////////////////////
621/// Fills the histograms of pdf-s of y variables with binning nbins
622
624{
625 Int_t i, j, ispecies;
626
627 if (!fYpdfHists.IsEmpty()){
628 if (((TH1D*)fYpdfHists.First())->GetNbinsX()!=nbins)
630 else
631 return;
632 }
633
635 for (i=0; i<fNy; i++){
636 std::stringstream nameStream;
637 nameStream << "pdf_species" << ispecies << "_y" << i;
638 std::string name = nameStream.str();
639 //TH1D *h = new TH1D(name, name, nbins, ypdfmin[ispecies*fNy+i], ypdfmax[ispecies*fNy+i]);
640 TH1D *h = new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, fNx+fNy+ispecies*fNy+i), fMinmax(1, fNx+fNy+ispecies*fNy+i));
641 for (j=0; j<fNevents; j++)
642 h->Fill(fYpdf(j, ispecies*fNy+i));
644 }
645 }
646}
647
648////////////////////////////////////////////////////////////////////////////////
649/// Returns the array of histograms of pdf's of y variables with binning nbins.
650/// If histograms have not already
651/// been filled, they are filled with default binning 100.
652
654{
655 Int_t nbins = 100;
656 if (fYpdfHists.IsEmpty())
657 FillYpdfHists(nbins);
658
659 return &fYpdfHists;
660}
661
662////////////////////////////////////////////////////////////////////////////////
663/// Returns the histogram of the pdf of variable iyvar for species `#ispecies`, binning nbins.
664/// If histograms have not already
665/// been filled, they are filled with default binning 100.
666
668{
669 Int_t nbins = 100;
670 if (fYpdfHists.IsEmpty())
671 FillYpdfHists(nbins);
672
674}
675
676////////////////////////////////////////////////////////////////////////////////
677/// The order of histograms in the array:
678///
679/// x0_species0, x0_species1,..., x1_species0, x1_species1,..., y0_species0, y0_species1,...
680///
681/// If the histograms have already been filled with a different binning, they are refilled
682/// and all histograms are deleted
683
685{
686 if (fSWeights.GetNoElements()==0){
687 Error("GetSWeightsHists", "SWeights were not computed");
688 return;
689 }
690
691 if (!fSWeightsHists.IsEmpty()){
692 if (((TH1D*)fSWeightsHists.First())->GetNbinsX()!=nbins)
694 else
695 return;
696 }
697
698 //Fill histograms of x-variables weighted with sWeights
699 for (Int_t ivar=0; ivar<fNx; ivar++){
701 std::stringstream nameStream;
702 nameStream << "x" << ivar << "_species" << ispecies;
703 std::string name = nameStream.str();
704 TH1D *h = new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, ivar), fMinmax(1, ivar));
705 h->Sumw2();
706 for (Int_t ievent=0; ievent<fNevents; ievent++)
709 }
710 }
711
712 //Fill histograms of y-variables (excluded from the fit), weighted with sWeights
713 for (Int_t iexcl=0; iexcl<fNy; iexcl++){
715 std::stringstream nameStream;
716 nameStream << "y" << iexcl << "_species" << ispecies;
717 std::string name = nameStream.str();
718 TH1D *h = new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, fNx+iexcl), fMinmax(1, fNx+iexcl));
719 h->Sumw2();
720 for (Int_t ievent=0; ievent<fNevents; ievent++)
723 }
724 }
725}
726
727////////////////////////////////////////////////////////////////////////////////
728/// Returns an array of all histograms of variables, weighted with sWeights.
729/// If histograms have not been already filled, they are filled with default binning 50
730/// The order of histograms in the array:
731///
732/// x0_species0, x0_species1,..., x1_species0, x1_species1,..., y0_species0, y0_species1,...
733
735{
736 Int_t nbins = 50; //default binning
738 FillSWeightsHists(nbins);
739
740 return &fSWeightsHists;
741}
742
743////////////////////////////////////////////////////////////////////////////////
744/// The Fill...Hist() methods fill the histograms with the real limits on the variables
745/// This method allows to refill the specified histogram with user-set boundaries min and max
746///
747///Parameters:
748///
749/// - type = 1 - histogram of x variable `#%nvar`
750/// - type = 2 - histogram of y variable `#%nvar`
751/// - type = 3 - histogram of y_pdf for y `#%nvar` and species #%nspecies`
752/// - type = 4 - histogram of x variable `#%nvar`, species `#%nspecies`, WITH sWeights
753/// - type = 5 - histogram of y variable `#%nvar`, species `#%nspecies`, WITH sWeights
754
756{
757 if (type<1 || type>5){
758 Error("RefillHist", "type must lie between 1 and 5");
759 return;
760 }
761 char name[20];
762 Int_t j;
763 TH1D *hremove;
764 if (type==1){
766 delete hremove;
767 snprintf(name,20,"x%d",nvar);
768 TH1D *h = new TH1D(name, name, nbins, min, max);
769 for (j=0; j<fNevents;j++)
770 h->Fill(fXvar(j, nvar));
771 fXvarHists.AddAt(h, nvar);
772 }
773 if (type==2){
775 delete hremove;
776 snprintf(name,20, "y%d", nvar);
777 TH1D *h = new TH1D(name, name, nbins, min, max);
778 for (j=0; j<fNevents;j++)
779 h->Fill(fYvar(j, nvar));
780 fXvarHists.AddAt(h, nvar);
781 }
782 if (type==3){
784 delete hremove;
785 snprintf(name,20, "pdf_species%d_y%d", nspecies, nvar);
786 TH1D *h=new TH1D(name, name, nbins, min, max);
787 for (j=0; j<fNevents; j++)
788 h->Fill(fYpdf(j, nspecies*fNy+nvar));
790 }
791 if (type==4){
793 delete hremove;
794 snprintf(name,20, "x%d_species%d", nvar, nspecies);
795 TH1D *h = new TH1D(name, name, nbins, min, max);
796 h->Sumw2();
797 for (Int_t ievent=0; ievent<fNevents; ievent++)
798 h->Fill(fXvar(ievent, nvar), fSWeights(ievent, nspecies));
800 }
801 if (type==5){
803 delete hremove;
804 snprintf(name,20, "y%d_species%d", nvar, nspecies);
805 TH1D *h = new TH1D(name, name, nbins, min, max);
806 h->Sumw2();
807 for (Int_t ievent=0; ievent<fNevents; ievent++)
808 h->Fill(fYvar(ievent, nvar), fSWeights(ievent, nspecies));
810 }
811}
812////////////////////////////////////////////////////////////////////////////////
813/// Returns the histogram of a variable, weighted with sWeights.
814/// - If histograms have not been already filled, they are filled with default binning 50
815/// - If parameter ixvar!=-1, the histogram of x-variable ixvar is returned for species ispecies
816/// - If parameter ixvar==-1, the histogram of y-variable iyexcl is returned for species ispecies
817/// - If the histogram has already been filled and the binning is different from the parameter nbins
818/// all histograms with old binning will be deleted and refilled.
819
821{
822
823 Int_t nbins = 50; //default binning
825 FillSWeightsHists(nbins);
826
827 if (ixvar==-1)
829 else
831
832}
833
834
835////////////////////////////////////////////////////////////////////////////////
836/// Set the input Tree
837
839{
840 fTree = tree;
841}
842
843////////////////////////////////////////////////////////////////////////////////
844///Specifies the variables from the tree to be used for splot
845///
846///Variables fNx, fNy, fNSpecies and fNEvents should already be set!
847///
848///In the 1st parameter it is assumed that first fNx variables are x(control variables),
849///then fNy y variables (discriminating variables),
850///then fNy*fNSpecies ypdf variables (probability distribution functions of discriminating
851///variables for different species). The order of pdfs should be: species0_y0, species0_y1,...
852///species1_y0, species1_y1,...species[fNSpecies-1]_y0...
853///The 2nd parameter allows to make a cut
854///TTree::Draw method description contains more details on specifying expression and selection
855
857{
858 TTreeFormula **var;
859 std::vector<TString> cnames;
861 TSelectorDraw *selector = (TSelectorDraw*)(((TTreePlayer*)fTree->GetPlayer())->GetSelector());
862
864 Int_t i,nch;
865 Int_t ncols;
867
869 if (varexp)
870 fVarexp = new TString(varexp);
871 if (selection)
873
874 nch = varexp ? strlen(varexp) : 0;
875
876
877//*-*- Compile selection expression if there is one
878 TTreeFormula *select = nullptr;
879 if (selection && strlen(selection)) {
880 select = new TTreeFormula("Selection",selection,fTree);
881 if (!select) return;
882 if (!select->GetNdim()) { delete select; return; }
883 formulaList.Add(select);
884 }
885//*-*- if varexp is empty, take first nx + ny + ny*nspecies columns by default
886
887 if (nch == 0) {
888 ncols = fNx + fNy + fNy*fNSpecies;
889 for (i=0;i<ncols;i++) {
890 cnames.push_back( leaves->At(i)->GetName() );
891 }
892//*-*- otherwise select only the specified columns
893 } else {
894 ncols = selector->SplitNames(varexp,cnames);
895 }
896 var = new TTreeFormula* [ncols];
898
900 for (i=0; i<ncols; i++){
901 fMinmax(0, i)=1e30;
902 fMinmax(1, i)=-1e30;
903 }
904
905//*-*- Create the TreeFormula objects corresponding to each column
906 for (i=0;i<ncols;i++) {
907 var[i] = new TTreeFormula("Var1",cnames[i].Data(),fTree);
908 formulaList.Add(var[i]);
909 }
910
911//*-*- Create a TreeFormulaManager to coordinate the formulas
912 TTreeFormulaManager *manager = nullptr;
913 if (formulaList.LastIndex()>=0) {
915 for(i=0;i<=formulaList.LastIndex();i++) {
916 manager->Add((TTreeFormula*)formulaList.At(i));
917 }
918 manager->Sync();
919 formulaList.Clear();
920 }
921//*-*- loop on all selected entries
922 // fSelectedRows = 0;
923 Int_t tnumber = -1;
927 if (entryNumber < 0) break;
929 if (localEntry < 0) break;
930 if (tnumber != fTree->GetTreeNumber()) {
932 if (manager) manager->UpdateFormulaLeaves();
933 }
934 Int_t ndata = 1;
935 if (manager && manager->GetMultiplicity()) {
936 ndata = manager->GetNdata();
937 }
938
939 for(Int_t inst=0;inst<ndata;inst++) {
941 if (select) {
942 if (select->EvalInstance(inst) == 0) {
943 continue;
944 }
945 }
946
947 if (inst==0) loaded = kTRUE;
948 else if (!loaded) {
949 // EvalInstance(0) always needs to be called so that
950 // the proper branches are loaded.
951 for (i=0;i<ncols;i++) {
952 var[i]->EvalInstance(0);
953 }
954 loaded = kTRUE;
955 }
956
957 for (i=0;i<ncols;i++) {
958 xvars[i] = var[i]->EvalInstance(inst);
959 }
960
961 // curentry = entry-firstentry;
962 //printf("event#%d\n", curentry);
963 //for (i=0; i<ncols; i++)
964 // printf("xvars[%d]=%f\n", i, xvars[i]);
965 //selectedrows++;
966 for (i=0; i<fNx; i++){
967 fXvar(selectedrows, i) = xvars[i];
968 if (fXvar(selectedrows, i) < fMinmax(0, i))
969 fMinmax(0, i)=fXvar(selectedrows, i);
970 if (fXvar(selectedrows, i) > fMinmax(1, i))
971 fMinmax(1, i)=fXvar(selectedrows, i);
972 }
973 for (i=0; i<fNy; i++){
974 fYvar(selectedrows, i) = xvars[i+fNx];
975 //printf("y_in_loop(%d, %d)=%f, xvars[%d]=%f\n", selectedrows, i, fYvar(selectedrows, i), i+fNx, xvars[i+fNx]);
976 if (fYvar(selectedrows, i) < fMinmax(0, i+fNx))
977 fMinmax(0, i+fNx) = fYvar(selectedrows, i);
978 if (fYvar(selectedrows, i) > fMinmax(1, i+fNx))
979 fMinmax(1, i+fNx) = fYvar(selectedrows, i);
980 for (Int_t j=0; j<fNSpecies; j++){
981 fYpdf(selectedrows, j*fNy + i)=xvars[j*fNy + i+fNx+fNy];
982 if (fYpdf(selectedrows, j*fNy+i) < fMinmax(0, j*fNy+i+fNx+fNy))
983 fMinmax(0, j*fNy+i+fNx+fNy) = fYpdf(selectedrows, j*fNy+i);
984 if (fYpdf(selectedrows, j*fNy+i) > fMinmax(1, j*fNy+i+fNx+fNy))
985 fMinmax(1, j*fNy+i+fNx+fNy) = fYpdf(selectedrows, j*fNy+i);
986 }
987 }
988 selectedrows++;
989 }
990 }
992 // for (i=0; i<fNevents; i++){
993 // printf("event#%d\n", i);
994 //for (Int_t iy=0; iy<fNy; iy++)
995 // printf("y[%d]=%f\n", iy, fYvar(i, iy));
996 //for (Int_t ispecies=0; ispecies<fNSpecies; ispecies++){
997 // for (Int_t iy=0; iy<fNy; iy++)
998 // printf("ypdf[sp. %d, y %d]=%f\n", ispecies, iy, fYpdf(i, ispecies*fNy+iy));
999 // }
1000 //}
1001
1002 delete [] xvars;
1003 delete [] var;
1004}
1005
1006////////////////////////////////////////////////////////////////////////////////
1007/// FCN-function for Minuit
1008
1009void Yields(Int_t &, Double_t *, Double_t &f, Double_t *x, Int_t /*iflag*/)
1010{
1011 Double_t likelihood;
1012 Int_t i, ispecies;
1013
1015 TMatrixD *pdftot = (TMatrixD*)fitter->GetObjectFit();
1016 Int_t nev = pdftot->GetNrows();
1017 Int_t nes = pdftot->GetNcols();
1018 f=0;
1019 for (i=0; i<nev; i++){
1020 likelihood=0;
1021 for (ispecies=0; ispecies<nes; ispecies++)
1022 likelihood+=x[ispecies]*(*pdftot)(i, ispecies);
1023 if (likelihood<0) likelihood=1;
1024 f+=TMath::Log(likelihood);
1025 }
1026 //extended likelihood, equivalent to chi2
1027 Double_t ntot=0;
1028 for (i=0; i<nes; i++)
1029 ntot += x[i];
1030 f = -2*(f-ntot);
1031}
1032
#define b(i)
Definition RSha256.hxx:100
#define f(i)
Definition RSha256.hxx:104
#define h(i)
Definition RSha256.hxx:106
int Int_t
Signed integer 4 bytes (int)
Definition RtypesCore.h:59
constexpr Bool_t kFALSE
Definition RtypesCore.h:108
double Double_t
Double 8 bytes.
Definition RtypesCore.h:73
long long Long64_t
Portable signed long integer 8 bytes.
Definition RtypesCore.h:83
constexpr Bool_t kTRUE
Definition RtypesCore.h:107
const char Option_t
Option string (const char)
Definition RtypesCore.h:80
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
Option_t Option_t option
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
char name[80]
Definition TGX11.cxx:110
void Yields(Int_t &, Double_t *, Double_t &f, Double_t *x, Int_t iflag)
FCN-function for Minuit.
Definition TSPlot.cxx:1009
#define snprintf
Definition civetweb.c:1579
Using a TBrowser one can browse all ROOT objects.
Definition TBrowser.h:37
1-D histogram with a double per channel (see TH1 documentation)
Definition TH1.h:927
virtual Bool_t Add(TF1 *h1, Double_t c1=1, Option_t *option="")
Performs the operation: this = this + c1*f1 if errors are defined (see TH1::Sumw2),...
Definition TH1.cxx:813
A doubly linked list.
Definition TList.h:38
Int_t GetNrows() const
Int_t GetNoElements() const
Int_t GetNcols() const
TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1) override
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
const char * GetName() const override
Returns name of object.
Definition TNamed.h:49
An array of TObjects.
Definition TObjArray.h:31
void AddAt(TObject *obj, Int_t idx) override
Add object at position ids.
void Delete(Option_t *option="") override
Remove all objects from the array AND delete all heap based objects.
void AddLast(TObject *obj) override
Add object in the next empty slot in the array.
TObject * UncheckedAt(Int_t i) const
Definition TObjArray.h:84
TObject * First() const override
Return the object in the first slot.
Bool_t IsEmpty() const override
Definition TObjArray.h:65
TObject * RemoveAt(Int_t idx) override
Remove object at index idx.
void Add(TObject *obj) override
Definition TObjArray.h:68
virtual const char * GetName() const
Returns name of object.
Definition TObject.cxx:457
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition TObject.cxx:1071
void FillYvarHists(Int_t nbins=100)
Fill the histograms of y variables.
Definition TSPlot.cxx:571
TObjArray fXvarHists
Definition TSPlot.h:30
TObjArray fSWeightsHists
Definition TSPlot.h:33
void RefillHist(Int_t type, Int_t var, Int_t nbins, Double_t min, Double_t max, Int_t nspecies=-1)
The Fill...Hist() methods fill the histograms with the real limits on the variables This method allow...
Definition TSPlot.cxx:755
~TSPlot() override
Destructor.
Definition TSPlot.cxx:326
void Browse(TBrowser *b) override
To browse the histograms.
Definition TSPlot.cxx:341
TMatrixD fSWeights
Definition TSPlot.h:28
Int_t fNevents
Definition TSPlot.h:44
Int_t fNy
Definition TSPlot.h:42
Int_t fNSpecies
Definition TSPlot.h:43
TString * fTreename
Definition TSPlot.h:36
void SPlots(Double_t *covmat, Int_t i_excl)
Computes the sWeights from the covariance matrix.
Definition TSPlot.cxx:471
void GetSWeights(TMatrixD &weights)
Returns the matrix of sweights.
Definition TSPlot.cxx:492
TMatrixD fMinmax
Definition TSPlot.h:27
void FillSWeightsHists(Int_t nbins=50)
The order of histograms in the array:
Definition TSPlot.cxx:684
Int_t fNx
Definition TSPlot.h:41
TString * fSelection
Definition TSPlot.h:38
TH1D * GetSWeightsHist(Int_t ixvar, Int_t ispecies, Int_t iyexcl=-1)
Returns the histogram of a variable, weighted with sWeights.
Definition TSPlot.cxx:820
TMatrixD fPdfTot
Definition TSPlot.h:26
void MakeSPlot(Option_t *option="v")
Calculates the sWeights.
Definition TSPlot.cxx:391
TH1D * GetYpdfHist(Int_t iyvar, Int_t ispecies)
Returns the histogram of the pdf of variable iyvar for species #ispecies, binning nbins.
Definition TSPlot.cxx:667
TObjArray * GetSWeightsHists()
Returns an array of all histograms of variables, weighted with sWeights.
Definition TSPlot.cxx:734
TTree * fTree
Definition TSPlot.h:35
TObjArray * GetYpdfHists()
Returns the array of histograms of pdf's of y variables with binning nbins.
Definition TSPlot.cxx:653
void FillXvarHists(Int_t nbins=100)
Fills the histograms of x variables (not weighted) with nbins.
Definition TSPlot.cxx:515
TObjArray * GetXvarHists()
Returns the array of histograms of x variables (not weighted).
Definition TSPlot.cxx:542
TObjArray fYvarHists
Definition TSPlot.h:31
TString * fVarexp
Definition TSPlot.h:37
void SetTree(TTree *tree)
Set the input Tree.
Definition TSPlot.cxx:838
TObjArray * GetYvarHists()
Returns the array of histograms of y variables.
Definition TSPlot.cxx:596
void FillYpdfHists(Int_t nbins=100)
Fills the histograms of pdf-s of y variables with binning nbins.
Definition TSPlot.cxx:623
void SetTreeSelection(const char *varexp="", const char *selection="", Long64_t firstentry=0)
Specifies the variables from the tree to be used for splot.
Definition TSPlot.cxx:856
TMatrixD fXvar
Definition TSPlot.h:23
TSPlot()
default constructor (used by I/O only)
Definition TSPlot.cxx:291
TClass * IsA() const override
Definition TSPlot.h:90
TH1D * GetYvarHist(Int_t iyvar)
Returns the histogram of variable iyvar.If histograms have not already been filled,...
Definition TSPlot.cxx:610
void SetInitialNumbersOfSpecies(Int_t *numbers)
Set the initial number of events of each species - used as initial estimates in minuit.
Definition TSPlot.cxx:375
TObjArray fYpdfHists
Definition TSPlot.h:32
Double_t * fNumbersOfEvents
Definition TSPlot.h:46
TMatrixD fYpdf
Definition TSPlot.h:25
TH1D * GetXvarHist(Int_t ixvar)
Returns the histogram of variable ixvar.
Definition TSPlot.cxx:557
TMatrixD fYvar
Definition TSPlot.h:24
A specialized TSelector for TTree::Draw.
virtual UInt_t SplitNames(const TString &varexp, std::vector< TString > &names)
Build Index array for names in varexp.
Basic string class.
Definition TString.h:138
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition TString.h:712
void ToUpper()
Change string to upper case.
Definition TString.cxx:1202
Bool_t Contains(const char *pat, ECaseCompare cmp=kExact) const
Definition TString.h:640
Used to coordinate one or more TTreeFormula objects.
Used to pass a selection expression to the Tree drawing routine.
T EvalInstance(Int_t i=0, const char *stringStack[]=nullptr)
Evaluate this treeformula.
Implement some of the functionality of the class TTree requiring access to extra libraries (Histogram...
Definition TTreePlayer.h:37
A TTree represents a columnar dataset.
Definition TTree.h:89
virtual TObjArray * GetListOfLeaves()
Definition TTree.h:568
TVirtualTreePlayer * GetPlayer()
Load the TTreePlayer (if not already done).
Definition TTree.cxx:6386
virtual Long64_t GetEntryNumber(Long64_t entry) const
Return entry number corresponding to entry.
Definition TTree.cxx:5945
virtual Long64_t LoadTree(Long64_t entry)
Set current entry.
Definition TTree.cxx:6554
virtual Int_t GetTreeNumber() const
Definition TTree.h:598
Abstract Base Class for Fitting.
virtual TObject * GetObjectFit() const
static TVirtualFitter * GetFitter()
static: return the current Fitter
static TVirtualFitter * Fitter(TObject *obj, Int_t maxpar=25)
Static function returning a pointer to the current fitter.
Double_t x[n]
Definition legend1.C:17
Double_t Log(Double_t x)
Returns the natural logarithm of x.
Definition TMath.h:767