Logo ROOT  
Reference Guide
 
All Classes Namespaces Files Functions Variables Typedefs Enumerations Enumerator Properties Friends Macros Modules Pages
Loading...
Searching...
No Matches
TSPlot.cxx
Go to the documentation of this file.
1// @(#)root/splot:$Id$
2// Author: Muriel Pivk, Anna Kreshuk 10/2005
3
4/**********************************************************************
5 * *
6 * Copyright (c) 2005 ROOT Foundation, CERN/PH-SFT *
7 * *
8 **********************************************************************/
9
10#include "TSPlot.h"
11#include "TVirtualFitter.h"
12#include "TH1.h"
13#include "TTreePlayer.h"
14#include "TTreeFormula.h"
15#include "TTreeFormulaManager.h"
16#include "TSelectorDraw.h"
17#include "TBrowser.h"
18#include "TClass.h"
19#include "TMath.h"
20#include "snprintf.h"
21
22#include <sstream>
23#include <string>
24
25extern void Yields(Int_t &, Double_t *, Double_t &f, Double_t *x, Int_t iflag);
26
28
29/** \class TSPlot
30
31\legacy{TSPlot}
32
33A common method used in High Energy Physics to perform measurements is
34the maximum Likelihood method, exploiting discriminating variables to
35disentangle signal from background. The crucial point for such an
36analysis to be reliable is to use an exhaustive list of sources of
37events combined with an accurate description of all the Probability
38Density Functions (PDF).
39
40To assess the validity of the fit, a convincing quality check
41is to explore further the data sample by examining the distributions of
42control variables. A control variable can be obtained for instance by
43removing one of the discriminating variables before performing again
44the maximum Likelihood fit: this removed variable is a control
45variable. The expected distribution of this control variable, for
46signal, is to be compared to the one extracted, for signal, from the
47data sample. In order to be able to do so, one must be able to unfold
48from the distribution of the whole data sample.
49
50The TSPlot method allows to reconstruct the distributions for
51the control variable, independently for each of the various sources of
52events, without making use of any <em>a priori</em> knowledge on <u>this</u>
53variable. The aim is thus to use the knowledge available for the
54discriminating variables to infer the behaviour of the individual
55sources of events with respect to the control variable.
56
57TSPlot is optimal if the control variable is uncorrelated with the discriminating variables.
58
59
60A detail description of the formalism itself, called \f$\hbox{$_s$}{\cal P}lot\f$, is given
61in [[1](https://arxiv.org/abs/physics/0402083)].
62
63### The method
64
65
66The \f$\hbox{$_s$}{\cal P}lot\f$ technique is developed in the above context of a
67maximum Likelihood method making use of discriminating variables.
68
69One considers a data sample in which are merged several species
70of events. These species represent various signal components and
71background components which all together account for the data sample.
72The different terms of the log-Likelihood are:
73
74 - \f$N\f$ : the total number of events in the data sample,
75 - \f${\rm N}_{\rm s}\f$ : the number of species of events populating the data sample,
76 - \f$N_i\f$ : the number of events expected on the average for the \f$i^{\rm th}\f$ species,
77 - \f${\rm f}_i(y_e)\f$ : the value of the PDFs of the discriminating variables
78 \f$y\f$ for the\f$i^{th}\f$ species and for event\f$e\f$,
79 - \f$x\f$ : the set of control variables which, by definition, do not appear in
80 the expression of the Likelihood function \f${\cal L}\f$.
81
82The extended log-Likelihood reads:
83
84 \f[
85{\cal L}=\sum_{e=1}^{N}\ln \Big\{ \sum_{i=1}^{{\rm N}_{\rm s}}N_i{\rm f}_i(y_e) \Big\} -\sum_{i=1}^{{\rm N}_{\rm s}}N_i \tag{1}
86\f]
87
88From this expression, after maximization of \f${\cal L}\f$ with respect to the \f$N_i\f$ parameters,
89a weight can be computed for every event and each species, in order to obtain later the true distribution
90\f$\hbox{M}_i(x)\f$ of variable \f$x\f$. If \f${\rm n}\f$ is one of the
91 \f${\rm N}_{\rm s}\f$ species present in the data sample, the weight for this species is defined by:
92
93
94\f[
95\fbox{$
96{_s{\cal P}}_{\rm n}(y_e)={\sum_{j=1}^{{\rm N}_{\rm s}} \hbox{V}_{{\rm n}j}{\rm f}_j(y_e)\over\sum_{k=1}^{{\rm N}_{\rm s}}N_k{\rm f}_k(y_e) } $} , \tag{2}
97\f]
98
99
100where \f$\hbox{V}_{{\rm n}j}\f$
101
102is the covariance matrix resulting from the Likelihood maximization.
103This matrix can be used directly from the fit, but this is numerically
104less accurate than the direct computation:
105
106
107\f[
108\hbox{ V}^{-1}_{{\rm n}j}~=~
109{\partial^2(-{\cal L})\over\partial N_{\rm n}\partial N_j}~=~
110\sum_{e=1}^N {{\rm f}_{\rm n}(y_e){\rm f}_j(y_e)\over(\sum_{k=1}^{{\rm N}_{\rm s}}N_k{\rm f}_k(y_e))^2} . \tag{3}
111\f]
112
113
114The distribution of the control variable \f$x\f$ obtained by histogramming the weighted
115events reproduces, on average, the true distribution
116\f${\hbox{ {M}}}_{\rm n}(x)\f$
117
118The class TSPlot allows to reconstruct the true distribution
119\f${\hbox{ {M}}}_{\rm n}(x)\f$
120
121of a control variable \f$x\f$ for each of the \f${\rm N}_{\rm s}\f$ species from
122the sole knowledge of the PDFs of the discriminating variables \f${\rm f}_i(y)\f$.
123The plots obtained thanks to the TSPlot class are called \f$\hbox {$_s$}{\cal P}lots\f$.
124
125
126### Some properties and checks
127
128
129Beside reproducing the true distribution,\f$\hbox {$_s$}{\cal P}lots\f$ bear remarkable properties:
130
131
132 - Each \f$x\f$ - distribution is properly normalized:
133
134\f[
135\sum_{e=1}^{N} {_s{\cal P}}_{\rm n}(y_e)~=~N_{\rm n} ~. \tag{4}
136\f]
137
138
139 - For any event:
140
141\f[
142\sum_{l=1}^{{\rm N}_{\rm s}} {_s{\cal P}}_l(y_e) ~=~1 ~. \tag{5}
143\f]
144
145That is to say that, summing up the \f${\rm N}_{\rm s}\f$ \f$\hbox {$_s$}{\cal P}lots\f$,
146one recovers the data sample distribution in \f$x\f$, and summing up the number of events
147entering in a \f$\hbox{$_s$}{\cal P}lot\f$ for a given species, one recovers the yield of the
148species, as provided by the fit.
149The property 4 is implemented in the TSPlot class as a check.
150
151 - the sum of the statistical uncertainties per bin
152
153\f[
154\sigma[N_{\rm n}\ _s\tilde{\rm M}_{\rm n}(x) {\delta x}]~=~\sqrt{\sum_{e \subset {\delta x}} ({_s{\cal P}}_{\rm n})^2} ~. \tag{6}
155\f]
156
157reproduces the statistical uncertainty on the yield \f$N_{\rm n}\f$, as provided by the fit:
158\f$\sigma[N_{\rm n}]\equiv\sqrt{\hbox{ V}_{{\rm n}{\rm n}}}\f$ .
159Because of that and since the determination of the yields is optimal
160when obtained using a Likelihood fit, one can conclude that the \f$\hbox{$_s$}{\cal P}lot\f$
161technique is itself an optimal method to reconstruct distributions of control variables.
162
163
164### Different steps followed by TSPlot
165
166
167 1. A maximum Likelihood fit is performed to obtain the yields \f$N_i\f$
168 of the various species.The fit relies on discriminating variables \f$y\f$
169 uncorrelated with a control variable \f$x\f$:
170 the later is therefore totally absent from the fit.
171
172 2. The weights \f${_s{\cal P}}\f$ are calculated using Eq.
173 (2)
174 where the covariance matrix is taken from Minuit.
175
176 3. Histograms of \f$x\f$ are filled by weighting the events with \f${_s{\cal P}}\f$ .
177
178 4. Error bars per bin are given by Eq. (6).
179
180
181The \f$\hbox {$_s$}{\cal P}lots\f$ reproduce the true distributions of the species
182in the control variable \f$x\f$, within the above defined statistical uncertainties.
183
184### Illustrations
185
186
187To illustrate the technique, one considers an example derived from the analysis where
188\f$\hbox {$_s$}{\cal P}lots\f$
189have been first used (charmless B decays). One is dealing with a data
190sample in which two species are present: the first is termed signal and
191the second background. A maximum Likelihood fit is performed to obtain
192the two yields \f$N_1\f$ and \f$N_2\f$ . The fit relies on two discriminating
193variables collectively denoted \f$y\f$ which are chosen within three possible
194variables denoted \f${m_{\rm ES}}\f$ , \f$\Delta E\f$ and \f${\cal F}\f$.
195The variable which is not incorporated in \f$y\f$ is used as the control variable
196\f$x\f$ . The six distributions of the three variables are assumed to be the ones
197depicted in Fig. 1.
198
199
200\image html splot_pdfmesNIM.png Figure 1 width=800
201
202Distributions of the three discriminating variables available to perform the Likelihood fit:
203\f${m_{\rm ES}}\f$ , \f$\Delta E\f$ , \f${\cal F}\f$ .
204Among the three variables, two are used to perform the fit while one is
205kept out of the fit to serve the purpose of a control variable. The
206three distributions on the top (resp. bottom) of the figure correspond
207to the signal (resp. background). The unit of the vertical axis is
208chosen such that it indicates the number of entries per bin, if one
209slices the histograms in 25 bins.
210
211A data sample being built through a Monte Carlo simulation based on the
212distributions shown in Fig.
2131,
214one obtains the three distributions of Fig. 2.
215Whereas the distribution of \f$\Delta E\f$ clearly indicates the presence of the signal,
216the distribution of \f${m_{\rm ES}}\f$ and \f${\cal F}\f$ are less obviously populated by signal.
217
218
219\image html splot_genfiTOTNIM.png Figure 2 width=800
220
221Distributions of the three discriminating variables for signal plus
222background. The three distributions are the ones obtained from a data
223sample obtained through a Monte Carlo simulation based on the
224distributions shown in Fig.
2251.
226The data sample consists of 500 signal events and 5000 background events.
227
228
229Choosing \f$\Delta E\f$ and \f${\cal F}\f$ as discriminating variables to determine
230\f$N_1\f$ and \f$N_2\f$ through a maximum Likelihood fit, one builds, for the control
231variable \f${m_{\rm ES}}\f$ which is unknown to the fit, the two \f$\hbox {$_s$}{\cal P}lots\f$
232for signal and background shown in
233Fig. 3.
234One observes that the \f$\hbox{$_s$}{\cal P}lot\f$
235for signal reproduces correctly the PDF even where the latter vanishes,
236although the error bars remain sizeable. This results from the almost
237complete cancellation between positive and negative weights: the sum of
238weights is close to zero while the sum of weights squared is not. The
239occurrence of negative weights occurs through the appearance of the
240covariance matrix, and its negative components, in the definition of
241Eq. (2).
242
243
244A word of caution is in order with respect to the error bars. Whereas
245their sum in quadrature is identical to the statistical uncertainties
246of the yields determined by the fit, and if, in addition, they are
247asymptotically correct, the error bars should be handled with care for
248low statistics and/or for too fine binning. This is because the error
249bars do not incorporate two known properties of the PDFs: PDFs are
250positive definite and can be non-zero in a given x-bin, even if in the
251particular data sample at hand, no event is observed in this bin. The
252latter limitation is not specific to \f$\hbox {$_s$}{\cal P}lots\f$ ,
253rather it is always present when one is willing to infer the PDF at the
254origin of an histogram, when, for some bins, the number of entries does
255not guaranty the applicability of the Gaussian regime. In such
256situations, a satisfactory practice is to attach allowed ranges to the
257histogram to indicate the upper and lower limits of the PDF value which
258are consistent with the actual observation, at a given confidence
259level.
260
261
262\image html splot_mass-bkg-sPlot.png Figure 3 width=600
263
264The \f$\hbox {$_s$}{\cal P}lots\f$ (signal on top, background on bottom)
265obtained for \f${m_{\rm ES}}\f$ are represented as dots with error bars.
266They are obtained from a fit using only information from \f$\Delta E\f$ and
267\f${\cal F}\f$
268
269<p>
270Choosing \f${m_{\rm ES}}\f$ and \f$\Delta E\f$ as discriminating variables to
271determine \f$N_1\f$ and \f$N_2\f$ through a maximum Likelihood fit, one builds,
272for the control variable \f${\cal F}\f$ which is unknown to the fit, the two
273\f$\hbox {$_s$}{\cal P}lots\f$ for signal and background shown in
274Fig. 4.
275In the \f$\hbox{$_s$}{\cal P}lot\f$ for signal one observes that error bars are
276the largest in the \f$x\f$ regions where the background is the largest.
277
278
279\image html splot_fisher-bkg-sPlot.png Figure 4 width=600
280
281The \f$\hbox {$_s$}{\cal P}lots\f$ (signal on top, background on bottom) obtained
282for \f${\cal F}\f$ are represented as dots with error bars. They are obtained
283from a fit using only information from \f${m_{\rm ES}}\f$ and \f$\Delta E\f$
284
285The results above can be obtained by running the tutorial TestSPlot.C
286*/
287
288
289////////////////////////////////////////////////////////////////////////////////
290/// default constructor (used by I/O only)
291
293{
294 fNx = 0;
295 fNy=0;
296 fNevents = 0;
297 fNSpecies=0;
298 fNumbersOfEvents=nullptr;
299}
300
301////////////////////////////////////////////////////////////////////////////////
302/// Normal TSPlot constructor
303/// - nx : number of control variables
304/// - ny : number of discriminating variables
305/// - ne : total number of events
306/// - ns : number of species
307/// - tree: input data
308
310{
311 fNx = nx;
312 fNy=ny;
313 fNevents = ne;
314 fNSpecies=ns;
315
320 fTree = tree;
321 fNumbersOfEvents = nullptr;
322}
323
324////////////////////////////////////////////////////////////////////////////////
325/// Destructor
326
328{
330 delete [] fNumbersOfEvents;
331 if (!fXvarHists.IsEmpty())
333 if (!fYvarHists.IsEmpty())
335 if (!fYpdfHists.IsEmpty())
337}
338
339////////////////////////////////////////////////////////////////////////////////
340/// To browse the histograms
341
343{
344 if (!fSWeightsHists.IsEmpty()) {
345 TIter next(&fSWeightsHists);
346 TH1D* h = nullptr;
347 while ((h = (TH1D*)next()))
348 b->Add(h,h->GetName());
349 }
350
351 if (!fYpdfHists.IsEmpty()) {
352 TIter next(&fYpdfHists);
353 TH1D* h = nullptr;
354 while ((h = (TH1D*)next()))
355 b->Add(h,h->GetName());
356 }
357 if (!fYvarHists.IsEmpty()) {
358 TIter next(&fYvarHists);
359 TH1D* h = nullptr;
360 while ((h = (TH1D*)next()))
361 b->Add(h,h->GetName());
362 }
363 if (!fXvarHists.IsEmpty()) {
364 TIter next(&fXvarHists);
365 TH1D* h = nullptr;
366 while ((h = (TH1D*)next()))
367 b->Add(h,h->GetName());
368 }
369 b->Add(&fSWeights, "sWeights");
370}
371
372////////////////////////////////////////////////////////////////////////////////
373/// Set the initial number of events of each species - used
374/// as initial estimates in minuit
375
383
384////////////////////////////////////////////////////////////////////////////////
385/// Calculates the sWeights
386///
387/// The option controls the print level
388/// - "Q" - no print out
389/// - "V" - prints the estimated `#of events` in species - default
390/// - "VV" - as "V" + the minuit printing + sums of weights for control
391
393{
394
395 if (!fNumbersOfEvents){
396 Error("MakeSPlot","Initial numbers of events in species have not been set");
397 return;
398 }
399 Int_t i, j, ispecies;
400
401 TString opt = option;
402 opt.ToUpper();
403 opt.ReplaceAll("VV", "W");
404
405 //make sure that global fitter is minuit
406 char s[]="TFitter";
409 if (strdiff!=0)
411 }
412
413
416
417 //now let's do it, excluding different yvars
418 //for iplot = -1 none is excluded
419 for (Int_t iplot=-1; iplot<fNy; iplot++){
420 for (i=0; i<fNevents; i++){
422 fPdfTot(i, ispecies)=1;
423 for (j=0; j<fNy; j++){
424 if (j!=iplot)
426 }
427 }
428 }
429 minuit->Clear();
430 minuit->SetFCN(Yields);
431 Double_t arglist[10];
432 //set the print level
433 if (opt.Contains("Q")||opt.Contains("V")){
434 arglist[0]=-1;
435 }
436 if (opt.Contains("W"))
437 arglist[0]=0;
438 minuit->ExecuteCommand("SET PRINT", arglist, 1);
439
440 minuit->SetObjectFit(&fPdfTot); //a tricky way to get fPdfTot matrix to fcn
442 minuit->SetParameter(ispecies, "", fNumbersOfEvents[ispecies], 1, 0, 0);
443
444 minuit->ExecuteCommand("MIGRAD", arglist, 0);
447 if (!opt.Contains("Q"))
448 printf("estimated #of events in species %d = %f\n", ispecies, fNumbersOfEvents[ispecies]);
449 }
450 if (!opt.Contains("Q"))
451 printf("\n");
452 Double_t *covmat = minuit->GetCovarianceMatrix();
454
455 if (opt.Contains("W")){
457 for (i=0; i<fNSpecies; i++){
458 sumweight[i]=0;
459 for (j=0; j<fNevents; j++)
460 sumweight[i]+=fSWeights(j, (iplot+1)*fNSpecies + i);
461 printf("checking sum of weights[%d]=%f\n", i, sumweight[i]);
462 }
463 printf("\n");
464 delete [] sumweight;
465 }
466 }
467}
468
469////////////////////////////////////////////////////////////////////////////////
470/// Computes the sWeights from the covariance matrix
471
473{
474 Int_t i, ispecies, k;
475 Double_t numerator, denominator;
476 for (i=0; i<fNevents; i++){
477 denominator=0;
479 denominator+=fNumbersOfEvents[ispecies]*fPdfTot(i, ispecies);
481 numerator=0;
482 for (k=0; k<fNSpecies; k++)
483 numerator+=covmat[ispecies*fNSpecies+k]*fPdfTot(i, k);
484 fSWeights(i, (i_excl+1)*fNSpecies + ispecies)=numerator/denominator;
485 }
486 }
487
488}
489
490////////////////////////////////////////////////////////////////////////////////
491/// Returns the matrix of sweights
492
494{
495 if (weights.GetNcols()!=fNSpecies*(fNy+1) || weights.GetNrows()!=fNevents)
496 weights.ResizeTo(fNevents, fNSpecies*(fNy+1));
497 weights = fSWeights;
498}
499
500////////////////////////////////////////////////////////////////////////////////
501/// Returns the matrix of sweights. It is assumed that the array passed in the
502/// argurment is big enough
503
505{
506 for (Int_t i=0; i<fNevents; i++){
507 for (Int_t j=0; j<fNSpecies; j++){
508 weights[i*fNSpecies+j]=fSWeights(i, j);
509 }
510 }
511}
512
513////////////////////////////////////////////////////////////////////////////////
514/// Fills the histograms of x variables (not weighted) with nbins
515
517{
518 Int_t i, j;
519
520 if (!fXvarHists.IsEmpty()){
521 if (((TH1D*)fXvarHists.First())->GetNbinsX()!=nbins)
523 else
524 return;
525 }
526
527 //make the histograms
528 for (i=0; i<fNx; i++){
529 std::string name = "x" + std::to_string(i);
530 TH1D *h = new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, i), fMinmax(1, i));
531 for (j=0; j<fNevents; j++)
532 h->Fill(fXvar(j, i));
534 }
535
536}
537
538////////////////////////////////////////////////////////////////////////////////
539/// Returns the array of histograms of x variables (not weighted).
540/// If histograms have not already
541/// been filled, they are filled with default binning 100.
542
544{
545 Int_t nbins = 100;
546 if (fXvarHists.IsEmpty())
547 FillXvarHists(nbins);
548 else if (((TH1D*)fXvarHists.First())->GetNbinsX()!=nbins)
549 FillXvarHists(nbins);
550 return &fXvarHists;
551}
552
553////////////////////////////////////////////////////////////////////////////////
554///Returns the histogram of variable ixvar.
555/// If histograms have not already
556/// been filled, they are filled with default binning 100.
557
559{
560 Int_t nbins = 100;
561 if (fXvarHists.IsEmpty())
562 FillXvarHists(nbins);
563 else if (((TH1D*)fXvarHists.First())->GetNbinsX()!=nbins)
564 FillXvarHists(nbins);
565
567}
568
569////////////////////////////////////////////////////////////////////////////////
570/// Fill the histograms of y variables
571
573{
574 Int_t i, j;
575
576 if (!fYvarHists.IsEmpty()){
577 if (((TH1D*)fYvarHists.First())->GetNbinsX()!=nbins)
579 else
580 return;
581 }
582
583 //make the histograms
584 for (i=0; i<fNy; i++){
585 std::string name = "y" + std::to_string(i);
586 TH1D *h=new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, fNx+i), fMinmax(1, fNx+i));
587 for (j=0; j<fNevents; j++)
588 h->Fill(fYvar(j, i));
590 }
591}
592
593////////////////////////////////////////////////////////////////////////////////
594/// Returns the array of histograms of y variables. If histograms have not already
595/// been filled, they are filled with default binning 100.
596
598{
599 Int_t nbins = 100;
600 if (fYvarHists.IsEmpty())
601 FillYvarHists(nbins);
602 else if (((TH1D*)fYvarHists.First())->GetNbinsX()!=nbins)
603 FillYvarHists(nbins);
604 return &fYvarHists;
605}
606
607////////////////////////////////////////////////////////////////////////////////
608/// Returns the histogram of variable iyvar.If histograms have not already
609/// been filled, they are filled with default binning 100.
610
612{
613 Int_t nbins = 100;
614 if (fYvarHists.IsEmpty())
615 FillYvarHists(nbins);
616 else if (((TH1D*)fYvarHists.First())->GetNbinsX()!=nbins)
617 FillYvarHists(nbins);
619}
620
621////////////////////////////////////////////////////////////////////////////////
622/// Fills the histograms of pdf-s of y variables with binning nbins
623
625{
626 Int_t i, j, ispecies;
627
628 if (!fYpdfHists.IsEmpty()){
629 if (((TH1D*)fYpdfHists.First())->GetNbinsX()!=nbins)
631 else
632 return;
633 }
634
636 for (i=0; i<fNy; i++){
637 std::stringstream nameStream;
638 nameStream << "pdf_species" << ispecies << "_y" << i;
639 std::string name = nameStream.str();
640 //TH1D *h = new TH1D(name, name, nbins, ypdfmin[ispecies*fNy+i], ypdfmax[ispecies*fNy+i]);
641 TH1D *h = new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, fNx+fNy+ispecies*fNy+i), fMinmax(1, fNx+fNy+ispecies*fNy+i));
642 for (j=0; j<fNevents; j++)
643 h->Fill(fYpdf(j, ispecies*fNy+i));
645 }
646 }
647}
648
649////////////////////////////////////////////////////////////////////////////////
650/// Returns the array of histograms of pdf's of y variables with binning nbins.
651/// If histograms have not already
652/// been filled, they are filled with default binning 100.
653
655{
656 Int_t nbins = 100;
657 if (fYpdfHists.IsEmpty())
658 FillYpdfHists(nbins);
659
660 return &fYpdfHists;
661}
662
663////////////////////////////////////////////////////////////////////////////////
664/// Returns the histogram of the pdf of variable iyvar for species `#ispecies`, binning nbins.
665/// If histograms have not already
666/// been filled, they are filled with default binning 100.
667
669{
670 Int_t nbins = 100;
671 if (fYpdfHists.IsEmpty())
672 FillYpdfHists(nbins);
673
675}
676
677////////////////////////////////////////////////////////////////////////////////
678/// The order of histograms in the array:
679///
680/// x0_species0, x0_species1,..., x1_species0, x1_species1,..., y0_species0, y0_species1,...
681///
682/// If the histograms have already been filled with a different binning, they are refilled
683/// and all histograms are deleted
684
686{
687 if (fSWeights.GetNoElements()==0){
688 Error("GetSWeightsHists", "SWeights were not computed");
689 return;
690 }
691
692 if (!fSWeightsHists.IsEmpty()){
693 if (((TH1D*)fSWeightsHists.First())->GetNbinsX()!=nbins)
695 else
696 return;
697 }
698
699 //Fill histograms of x-variables weighted with sWeights
700 for (Int_t ivar=0; ivar<fNx; ivar++){
702 std::stringstream nameStream;
703 nameStream << "x" << ivar << "_species" << ispecies;
704 std::string name = nameStream.str();
705 TH1D *h = new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, ivar), fMinmax(1, ivar));
706 h->Sumw2();
707 for (Int_t ievent=0; ievent<fNevents; ievent++)
710 }
711 }
712
713 //Fill histograms of y-variables (excluded from the fit), weighted with sWeights
714 for (Int_t iexcl=0; iexcl<fNy; iexcl++){
716 std::stringstream nameStream;
717 nameStream << "y" << iexcl << "_species" << ispecies;
718 std::string name = nameStream.str();
719 TH1D *h = new TH1D(name.c_str(), name.c_str(), nbins, fMinmax(0, fNx+iexcl), fMinmax(1, fNx+iexcl));
720 h->Sumw2();
721 for (Int_t ievent=0; ievent<fNevents; ievent++)
724 }
725 }
726}
727
728////////////////////////////////////////////////////////////////////////////////
729/// Returns an array of all histograms of variables, weighted with sWeights.
730/// If histograms have not been already filled, they are filled with default binning 50
731/// The order of histograms in the array:
732///
733/// x0_species0, x0_species1,..., x1_species0, x1_species1,..., y0_species0, y0_species1,...
734
736{
737 Int_t nbins = 50; //default binning
739 FillSWeightsHists(nbins);
740
741 return &fSWeightsHists;
742}
743
744////////////////////////////////////////////////////////////////////////////////
745/// The Fill...Hist() methods fill the histograms with the real limits on the variables
746/// This method allows to refill the specified histogram with user-set boundaries min and max
747///
748///Parameters:
749///
750/// - type = 1 - histogram of x variable `#%nvar`
751/// - type = 2 - histogram of y variable `#%nvar`
752/// - type = 3 - histogram of y_pdf for y `#%nvar` and species #%nspecies`
753/// - type = 4 - histogram of x variable `#%nvar`, species `#%nspecies`, WITH sWeights
754/// - type = 5 - histogram of y variable `#%nvar`, species `#%nspecies`, WITH sWeights
755
757{
758 if (type<1 || type>5){
759 Error("RefillHist", "type must lie between 1 and 5");
760 return;
761 }
762 char name[20];
763 Int_t j;
764 TH1D *hremove;
765 if (type==1){
767 delete hremove;
768 snprintf(name,20,"x%d",nvar);
769 TH1D *h = new TH1D(name, name, nbins, min, max);
770 for (j=0; j<fNevents;j++)
771 h->Fill(fXvar(j, nvar));
772 fXvarHists.AddAt(h, nvar);
773 }
774 if (type==2){
776 delete hremove;
777 snprintf(name,20, "y%d", nvar);
778 TH1D *h = new TH1D(name, name, nbins, min, max);
779 for (j=0; j<fNevents;j++)
780 h->Fill(fYvar(j, nvar));
781 fXvarHists.AddAt(h, nvar);
782 }
783 if (type==3){
785 delete hremove;
786 snprintf(name,20, "pdf_species%d_y%d", nspecies, nvar);
787 TH1D *h=new TH1D(name, name, nbins, min, max);
788 for (j=0; j<fNevents; j++)
789 h->Fill(fYpdf(j, nspecies*fNy+nvar));
791 }
792 if (type==4){
794 delete hremove;
795 snprintf(name,20, "x%d_species%d", nvar, nspecies);
796 TH1D *h = new TH1D(name, name, nbins, min, max);
797 h->Sumw2();
798 for (Int_t ievent=0; ievent<fNevents; ievent++)
799 h->Fill(fXvar(ievent, nvar), fSWeights(ievent, nspecies));
801 }
802 if (type==5){
804 delete hremove;
805 snprintf(name,20, "y%d_species%d", nvar, nspecies);
806 TH1D *h = new TH1D(name, name, nbins, min, max);
807 h->Sumw2();
808 for (Int_t ievent=0; ievent<fNevents; ievent++)
809 h->Fill(fYvar(ievent, nvar), fSWeights(ievent, nspecies));
811 }
812}
813////////////////////////////////////////////////////////////////////////////////
814/// Returns the histogram of a variable, weighted with sWeights.
815/// - If histograms have not been already filled, they are filled with default binning 50
816/// - If parameter ixvar!=-1, the histogram of x-variable ixvar is returned for species ispecies
817/// - If parameter ixvar==-1, the histogram of y-variable iyexcl is returned for species ispecies
818/// - If the histogram has already been filled and the binning is different from the parameter nbins
819/// all histograms with old binning will be deleted and refilled.
820
822{
823
824 Int_t nbins = 50; //default binning
826 FillSWeightsHists(nbins);
827
828 if (ixvar==-1)
830 else
832
833}
834
835
836////////////////////////////////////////////////////////////////////////////////
837/// Set the input Tree
838
840{
841 fTree = tree;
842}
843
844////////////////////////////////////////////////////////////////////////////////
845///Specifies the variables from the tree to be used for splot
846///
847///Variables fNx, fNy, fNSpecies and fNEvents should already be set!
848///
849///In the 1st parameter it is assumed that first fNx variables are x(control variables),
850///then fNy y variables (discriminating variables),
851///then fNy*fNSpecies ypdf variables (probability distribution functions of discriminating
852///variables for different species). The order of pdfs should be: species0_y0, species0_y1,...
853///species1_y0, species1_y1,...species[fNSpecies-1]_y0...
854///The 2nd parameter allows to make a cut
855///TTree::Draw method description contains more details on specifying expression and selection
856
858{
859 TTreeFormula **var;
860 std::vector<TString> cnames;
862 TSelectorDraw *selector = (TSelectorDraw*)(((TTreePlayer*)fTree->GetPlayer())->GetSelector());
863
865 Int_t i,nch;
866 Int_t ncols;
868
870 if (varexp)
871 fVarexp = new TString(varexp);
872 if (selection)
874
875 nch = varexp ? strlen(varexp) : 0;
876
877
878//*-*- Compile selection expression if there is one
879 TTreeFormula *select = nullptr;
880 if (selection && strlen(selection)) {
881 select = new TTreeFormula("Selection",selection,fTree);
882 if (!select) return;
883 if (!select->GetNdim()) { delete select; return; }
884 formulaList.Add(select);
885 }
886//*-*- if varexp is empty, take first nx + ny + ny*nspecies columns by default
887
888 if (nch == 0) {
889 ncols = fNx + fNy + fNy*fNSpecies;
890 for (i=0;i<ncols;i++) {
891 cnames.push_back( leaves->At(i)->GetName() );
892 }
893//*-*- otherwise select only the specified columns
894 } else {
895 ncols = selector->SplitNames(varexp,cnames);
896 }
897 var = new TTreeFormula* [ncols];
899
901 for (i=0; i<ncols; i++){
902 fMinmax(0, i)=1e30;
903 fMinmax(1, i)=-1e30;
904 }
905
906//*-*- Create the TreeFormula objects corresponding to each column
907 for (i=0;i<ncols;i++) {
908 var[i] = new TTreeFormula("Var1",cnames[i].Data(),fTree);
909 formulaList.Add(var[i]);
910 }
911
912//*-*- Create a TreeFormulaManager to coordinate the formulas
913 TTreeFormulaManager *manager = nullptr;
914 if (formulaList.LastIndex()>=0) {
916 for(i=0;i<=formulaList.LastIndex();i++) {
917 manager->Add((TTreeFormula*)formulaList.At(i));
918 }
919 manager->Sync();
920 formulaList.Clear();
921 }
922//*-*- loop on all selected entries
923 // fSelectedRows = 0;
924 Int_t tnumber = -1;
928 if (entryNumber < 0) break;
930 if (localEntry < 0) break;
931 if (tnumber != fTree->GetTreeNumber()) {
933 if (manager) manager->UpdateFormulaLeaves();
934 }
935 Int_t ndata = 1;
936 if (manager && manager->GetMultiplicity()) {
937 ndata = manager->GetNdata();
938 }
939
940 for(Int_t inst=0;inst<ndata;inst++) {
942 if (select) {
943 if (select->EvalInstance(inst) == 0) {
944 continue;
945 }
946 }
947
948 if (inst==0) loaded = kTRUE;
949 else if (!loaded) {
950 // EvalInstance(0) always needs to be called so that
951 // the proper branches are loaded.
952 for (i=0;i<ncols;i++) {
953 var[i]->EvalInstance(0);
954 }
955 loaded = kTRUE;
956 }
957
958 for (i=0;i<ncols;i++) {
959 xvars[i] = var[i]->EvalInstance(inst);
960 }
961
962 // curentry = entry-firstentry;
963 //printf("event#%d\n", curentry);
964 //for (i=0; i<ncols; i++)
965 // printf("xvars[%d]=%f\n", i, xvars[i]);
966 //selectedrows++;
967 for (i=0; i<fNx; i++){
968 fXvar(selectedrows, i) = xvars[i];
969 if (fXvar(selectedrows, i) < fMinmax(0, i))
970 fMinmax(0, i)=fXvar(selectedrows, i);
971 if (fXvar(selectedrows, i) > fMinmax(1, i))
972 fMinmax(1, i)=fXvar(selectedrows, i);
973 }
974 for (i=0; i<fNy; i++){
975 fYvar(selectedrows, i) = xvars[i+fNx];
976 //printf("y_in_loop(%d, %d)=%f, xvars[%d]=%f\n", selectedrows, i, fYvar(selectedrows, i), i+fNx, xvars[i+fNx]);
977 if (fYvar(selectedrows, i) < fMinmax(0, i+fNx))
978 fMinmax(0, i+fNx) = fYvar(selectedrows, i);
979 if (fYvar(selectedrows, i) > fMinmax(1, i+fNx))
980 fMinmax(1, i+fNx) = fYvar(selectedrows, i);
981 for (Int_t j=0; j<fNSpecies; j++){
982 fYpdf(selectedrows, j*fNy + i)=xvars[j*fNy + i+fNx+fNy];
983 if (fYpdf(selectedrows, j*fNy+i) < fMinmax(0, j*fNy+i+fNx+fNy))
984 fMinmax(0, j*fNy+i+fNx+fNy) = fYpdf(selectedrows, j*fNy+i);
985 if (fYpdf(selectedrows, j*fNy+i) > fMinmax(1, j*fNy+i+fNx+fNy))
986 fMinmax(1, j*fNy+i+fNx+fNy) = fYpdf(selectedrows, j*fNy+i);
987 }
988 }
989 selectedrows++;
990 }
991 }
993 // for (i=0; i<fNevents; i++){
994 // printf("event#%d\n", i);
995 //for (Int_t iy=0; iy<fNy; iy++)
996 // printf("y[%d]=%f\n", iy, fYvar(i, iy));
997 //for (Int_t ispecies=0; ispecies<fNSpecies; ispecies++){
998 // for (Int_t iy=0; iy<fNy; iy++)
999 // printf("ypdf[sp. %d, y %d]=%f\n", ispecies, iy, fYpdf(i, ispecies*fNy+iy));
1000 // }
1001 //}
1002
1003 delete [] xvars;
1004 delete [] var;
1005}
1006
1007////////////////////////////////////////////////////////////////////////////////
1008/// FCN-function for Minuit
1009
1010void Yields(Int_t &, Double_t *, Double_t &f, Double_t *x, Int_t /*iflag*/)
1011{
1012 Double_t likelihood;
1013 Int_t i, ispecies;
1014
1016 TMatrixD *pdftot = (TMatrixD*)fitter->GetObjectFit();
1017 Int_t nev = pdftot->GetNrows();
1018 Int_t nes = pdftot->GetNcols();
1019 f=0;
1020 for (i=0; i<nev; i++){
1021 likelihood=0;
1022 for (ispecies=0; ispecies<nes; ispecies++)
1023 likelihood+=x[ispecies]*(*pdftot)(i, ispecies);
1024 if (likelihood<0) likelihood=1;
1025 f+=TMath::Log(likelihood);
1026 }
1027 //extended likelihood, equivalent to chi2
1028 Double_t ntot=0;
1029 for (i=0; i<nes; i++)
1030 ntot += x[i];
1031 f = -2*(f-ntot);
1032}
1033
#define b(i)
Definition RSha256.hxx:100
#define f(i)
Definition RSha256.hxx:104
#define h(i)
Definition RSha256.hxx:106
int Int_t
Definition RtypesCore.h:45
constexpr Bool_t kFALSE
Definition RtypesCore.h:94
double Double_t
Definition RtypesCore.h:59
long long Long64_t
Definition RtypesCore.h:69
constexpr Bool_t kTRUE
Definition RtypesCore.h:93
const char Option_t
Definition RtypesCore.h:66
#define ClassImp(name)
Definition Rtypes.h:374
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
Option_t Option_t option
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void char Point_t Rectangle_t WindowAttributes_t Float_t Float_t Float_t Int_t Int_t UInt_t UInt_t Rectangle_t Int_t Int_t Window_t TString Int_t GCValues_t GetPrimarySelectionOwner GetDisplay GetScreen GetColormap GetNativeEvent const char const char dpyName wid window const char font_name cursor keysym reg const char only_if_exist regb h Point_t winding char text const char depth char const char Int_t count const char ColorStruct_t color const char Pixmap_t Pixmap_t PictureAttributes_t attr const char char ret_data h unsigned char height h Atom_t Int_t ULong_t ULong_t unsigned char prop_list Atom_t Atom_t Atom_t Time_t type
char name[80]
Definition TGX11.cxx:110
void Yields(Int_t &, Double_t *, Double_t &f, Double_t *x, Int_t iflag)
FCN-function for Minuit.
Definition TSPlot.cxx:1010
#define snprintf
Definition civetweb.c:1540
Using a TBrowser one can browse all ROOT objects.
Definition TBrowser.h:37
1-D histogram with a double per channel (see TH1 documentation)
Definition TH1.h:693
virtual Bool_t Add(TF1 *h1, Double_t c1=1, Option_t *option="")
Performs the operation: this = this + c1*f1 if errors are defined (see TH1::Sumw2),...
Definition TH1.cxx:813
A doubly linked list.
Definition TList.h:38
Int_t GetNrows() const
Int_t GetNoElements() const
Int_t GetNcols() const
TMatrixTBase< Element > & ResizeTo(Int_t nrows, Int_t ncols, Int_t=-1) override
Set size of the matrix to nrows x ncols New dynamic elements are created, the overlapping part of the...
const char * GetName() const override
Returns name of object.
Definition TNamed.h:49
An array of TObjects.
Definition TObjArray.h:31
void AddAt(TObject *obj, Int_t idx) override
Add object at position ids.
void Delete(Option_t *option="") override
Remove all objects from the array AND delete all heap based objects.
void AddLast(TObject *obj) override
Add object in the next empty slot in the array.
TObject * UncheckedAt(Int_t i) const
Definition TObjArray.h:84
TObject * First() const override
Return the object in the first slot.
Bool_t IsEmpty() const override
Definition TObjArray.h:65
TObject * RemoveAt(Int_t idx) override
Remove object at index idx.
void Add(TObject *obj) override
Definition TObjArray.h:68
virtual const char * GetName() const
Returns name of object.
Definition TObject.cxx:456
virtual void Error(const char *method, const char *msgfmt,...) const
Issue error message.
Definition TObject.cxx:1020
<div class="legacybox"><h2>Legacy Code</h2> TSPlot is a legacy interface: there will be no bug fixes ...
Definition TSPlot.h:21
void FillYvarHists(Int_t nbins=100)
Fill the histograms of y variables.
Definition TSPlot.cxx:572
TObjArray fXvarHists
Definition TSPlot.h:30
TObjArray fSWeightsHists
Definition TSPlot.h:33
void RefillHist(Int_t type, Int_t var, Int_t nbins, Double_t min, Double_t max, Int_t nspecies=-1)
The Fill...Hist() methods fill the histograms with the real limits on the variables This method allow...
Definition TSPlot.cxx:756
~TSPlot() override
Destructor.
Definition TSPlot.cxx:327
void Browse(TBrowser *b) override
To browse the histograms.
Definition TSPlot.cxx:342
TMatrixD fSWeights
Definition TSPlot.h:28
Int_t fNevents
Definition TSPlot.h:44
Int_t fNy
Definition TSPlot.h:42
Int_t fNSpecies
Definition TSPlot.h:43
TString * fTreename
Definition TSPlot.h:36
void SPlots(Double_t *covmat, Int_t i_excl)
Computes the sWeights from the covariance matrix.
Definition TSPlot.cxx:472
void GetSWeights(TMatrixD &weights)
Returns the matrix of sweights.
Definition TSPlot.cxx:493
TMatrixD fMinmax
Definition TSPlot.h:27
void FillSWeightsHists(Int_t nbins=50)
The order of histograms in the array:
Definition TSPlot.cxx:685
Int_t fNx
Definition TSPlot.h:41
TString * fSelection
Definition TSPlot.h:38
TH1D * GetSWeightsHist(Int_t ixvar, Int_t ispecies, Int_t iyexcl=-1)
Returns the histogram of a variable, weighted with sWeights.
Definition TSPlot.cxx:821
TMatrixD fPdfTot
Definition TSPlot.h:26
void MakeSPlot(Option_t *option="v")
Calculates the sWeights.
Definition TSPlot.cxx:392
TH1D * GetYpdfHist(Int_t iyvar, Int_t ispecies)
Returns the histogram of the pdf of variable iyvar for species #ispecies, binning nbins.
Definition TSPlot.cxx:668
TObjArray * GetSWeightsHists()
Returns an array of all histograms of variables, weighted with sWeights.
Definition TSPlot.cxx:735
TTree * fTree
Definition TSPlot.h:35
TObjArray * GetYpdfHists()
Returns the array of histograms of pdf's of y variables with binning nbins.
Definition TSPlot.cxx:654
void FillXvarHists(Int_t nbins=100)
Fills the histograms of x variables (not weighted) with nbins.
Definition TSPlot.cxx:516
TObjArray * GetXvarHists()
Returns the array of histograms of x variables (not weighted).
Definition TSPlot.cxx:543
TObjArray fYvarHists
Definition TSPlot.h:31
TString * fVarexp
Definition TSPlot.h:37
void SetTree(TTree *tree)
Set the input Tree.
Definition TSPlot.cxx:839
TObjArray * GetYvarHists()
Returns the array of histograms of y variables.
Definition TSPlot.cxx:597
void FillYpdfHists(Int_t nbins=100)
Fills the histograms of pdf-s of y variables with binning nbins.
Definition TSPlot.cxx:624
void SetTreeSelection(const char *varexp="", const char *selection="", Long64_t firstentry=0)
Specifies the variables from the tree to be used for splot.
Definition TSPlot.cxx:857
TMatrixD fXvar
Definition TSPlot.h:23
TSPlot()
default constructor (used by I/O only)
Definition TSPlot.cxx:292
TClass * IsA() const override
Definition TSPlot.h:90
TH1D * GetYvarHist(Int_t iyvar)
Returns the histogram of variable iyvar.If histograms have not already been filled,...
Definition TSPlot.cxx:611
void SetInitialNumbersOfSpecies(Int_t *numbers)
Set the initial number of events of each species - used as initial estimates in minuit.
Definition TSPlot.cxx:376
TObjArray fYpdfHists
Definition TSPlot.h:32
Double_t * fNumbersOfEvents
Definition TSPlot.h:46
TMatrixD fYpdf
Definition TSPlot.h:25
TH1D * GetXvarHist(Int_t ixvar)
Returns the histogram of variable ixvar.
Definition TSPlot.cxx:558
TMatrixD fYvar
Definition TSPlot.h:24
A specialized TSelector for TTree::Draw.
virtual UInt_t SplitNames(const TString &varexp, std::vector< TString > &names)
Build Index array for names in varexp.
Basic string class.
Definition TString.h:139
TString & ReplaceAll(const TString &s1, const TString &s2)
Definition TString.h:704
void ToUpper()
Change string to upper case.
Definition TString.cxx:1195
Bool_t Contains(const char *pat, ECaseCompare cmp=kExact) const
Definition TString.h:632
Used to coordinate one or more TTreeFormula objects.
Used to pass a selection expression to the Tree drawing routine.
T EvalInstance(Int_t i=0, const char *stringStack[]=nullptr)
Evaluate this treeformula.
Implement some of the functionality of the class TTree requiring access to extra libraries (Histogram...
Definition TTreePlayer.h:37
A TTree represents a columnar dataset.
Definition TTree.h:79
virtual TObjArray * GetListOfLeaves()
Definition TTree.h:529
TVirtualTreePlayer * GetPlayer()
Load the TTreePlayer (if not already done).
Definition TTree.cxx:6298
virtual Long64_t GetEntryNumber(Long64_t entry) const
Return entry number corresponding to entry.
Definition TTree.cxx:5858
virtual Long64_t LoadTree(Long64_t entry)
Set current entry.
Definition TTree.cxx:6466
virtual Int_t GetTreeNumber() const
Definition TTree.h:559
Abstract Base Class for Fitting.
virtual TObject * GetObjectFit() const
static TVirtualFitter * GetFitter()
static: return the current Fitter
static TVirtualFitter * Fitter(TObject *obj, Int_t maxpar=25)
Static function returning a pointer to the current fitter.
Double_t x[n]
Definition legend1.C:17
Double_t Log(Double_t x)
Returns the natural logarithm of x.
Definition TMath.h:760