A class doing the actual fitting of a linear model using rules as base functions.
Definition at line 49 of file RuleFitParams.h.
Public Member Functions | |
| RuleFitParams () | |
| constructor   | |
| virtual | ~RuleFitParams () | 
| destructor   | |
| Int_t | FindGDTau () | 
| This finds the cutoff parameter tau by scanning several different paths.   | |
| UInt_t | GetPathIdx1 () const | 
| UInt_t | GetPathIdx2 () const | 
| UInt_t | GetPerfIdx1 () const | 
| UInt_t | GetPerfIdx2 () const | 
| void | Init () | 
| Initializes all parameters using the RuleEnsemble and the training tree.   | |
| void | InitGD () | 
| Initialize GD path search.   | |
| Double_t | LossFunction (const Event &e) const | 
| Implementation of squared-error ramp loss function (eq 39,40 in ref 1) This is used for binary Classifications where y = {+1,-1} for (sig,bkg)   | |
| Double_t | LossFunction (UInt_t evtidx) const | 
| Implementation of squared-error ramp loss function (eq 39,40 in ref 1) This is used for binary Classifications where y = {+1,-1} for (sig,bkg)   | |
| Double_t | LossFunction (UInt_t evtidx, UInt_t itau) const | 
| Implementation of squared-error ramp loss function (eq 39,40 in ref 1) This is used for binary Classifications where y = {+1,-1} for (sig,bkg)   | |
| void | MakeGDPath () | 
| The following finds the gradient directed path in parameter space.   | |
| Double_t | Penalty () const | 
| This is the "lasso" penalty To be used for regression.   | |
| Double_t | Risk (UInt_t ind1, UInt_t ind2, Double_t neff) const | 
| risk assessment   | |
| Double_t | Risk (UInt_t ind1, UInt_t ind2, Double_t neff, UInt_t itau) const | 
risk assessment for tau model <itau>   | |
| Double_t | RiskPath () const | 
| Double_t | RiskPerf () const | 
| Double_t | RiskPerf (UInt_t itau) const | 
| UInt_t | RiskPerfTst () | 
| Estimates the error rate with the current set of parameters.   | |
| void | SetGDErrScale (Double_t s) | 
| void | SetGDNPathSteps (Int_t np) | 
| void | SetGDPathStep (Double_t s) | 
| void | SetGDTau (Double_t t) | 
| void | SetGDTauPrec (Double_t p) | 
| void | SetGDTauRange (Double_t t0, Double_t t1) | 
| void | SetGDTauScan (UInt_t n) | 
| void | SetMsgType (EMsgType t) | 
| void | SetRuleFit (RuleFit *rf) | 
| Int_t | Type (const Event *e) const | 
Protected Types | |
| typedef std::vector< constTMVA::Event * >::const_iterator | EventItr | 
Protected Member Functions | |
| Double_t | CalcAverageResponse () | 
| calculate the average response - TODO : rewrite bad dependancy on EvaluateAverage() !   | |
| Double_t | CalcAverageResponseOLD () | 
| Double_t | CalcAverageTruth () | 
| calculate the average truth   | |
| void | CalcFStar () | 
| Estimates F* (optimum scoring function) for all events for the given sets.   | |
| void | CalcGDNTau () | 
| void | CalcTstAverageResponse () | 
| calc average response for all test paths - TODO: see comment under CalcAverageResponse() note that 0 offset is used   | |
| Double_t | ErrorRateBin () | 
| Estimates the error rate with the current set of parameters It uses a binary estimate of (y-F*(x)) (y-F*(x)) = (Num of events where sign(F)!=sign(y))/Neve y = {+1 if event is signal, -1 otherwise} — NOT USED —.   | |
| Double_t | ErrorRateReg () | 
| Estimates the error rate with the current set of parameters This code is pretty messy at the moment.   | |
| Double_t | ErrorRateRoc () | 
| Estimates the error rate with the current set of parameters.   | |
| Double_t | ErrorRateRocRaw (std::vector< Double_t > &sFsig, std::vector< Double_t > &sFbkg) | 
| Estimates the error rate with the current set of parameters.   | |
| void | ErrorRateRocTst () | 
| Estimates the error rate with the current set of parameters.   | |
| void | EvaluateAverage (UInt_t ind1, UInt_t ind2, std::vector< Double_t > &avsel, std::vector< Double_t > &avrul) | 
| evaluate the average of each variable and f(x) in the given range   | |
| void | EvaluateAveragePath () | 
| void | EvaluateAveragePerf () | 
| void | FillCoefficients () | 
| helper function to store the rule coefficients in local arrays   | |
| void | InitNtuple () | 
| initializes the ntuple   | |
| void | MakeGradientVector () | 
| make gradient vector   | |
| void | MakeTstGradientVector () | 
| make test gradient vector for all tau same algorithm as MakeGradientVector()   | |
| Double_t | Optimism () | 
| implementation of eq.   | |
| void | UpdateCoefficients () | 
| Establish maximum gradient for rules, linear terms and the offset.   | |
| void | UpdateTstCoefficients () | 
| Establish maximum gradient for rules, linear terms and the offset for all taus TODO: do not need index range!   | |
Protected Attributes | |
| std::vector< Double_t > | fAverageRulePath | 
| average of each rule, same range   | |
| std::vector< Double_t > | fAverageRulePerf | 
| average of each rule, same range   | |
| std::vector< Double_t > | fAverageSelectorPath | 
| average of each variable over the range fPathIdx1,2   | |
| std::vector< Double_t > | fAverageSelectorPerf | 
| average of each variable over the range fPerfIdx1,2   | |
| Double_t | fAverageTruth | 
| average truth, ie sum(y)/N, y=+-1   | |
| Double_t | fbkgave | 
| Average of F(bkg)   | |
| Double_t | fbkgrms | 
| Rms of F(bkg)   | |
| std::vector< Double_t > | fFstar | 
| vector of F*() - filled in CalcFStar()   | |
| Double_t | fFstarMedian | 
| median value of F*() using   | |
| std::vector< std::vector< Double_t > > | fGDCoefLinTst | 
| linear coeffs - one per tau   | |
| std::vector< std::vector< Double_t > > | fGDCoefTst | 
| rule coeffs - one per tau   | |
| Double_t | fGDErrScale | 
| stop scan at error = scale*errmin   | |
| std::vector< Double_t > | fGDErrTst | 
| error rates per tau   | |
| std::vector< Char_t > | fGDErrTstOK | 
| error rate is sufficiently low <— stores boolean   | |
| Int_t | fGDNPathSteps | 
| number of path steps   | |
| UInt_t | fGDNTau | 
| number of tau-paths - calculated in SetGDTauPrec   | |
| UInt_t | fGDNTauTstOK | 
| number of tau in the test-phase that are ok   | |
| TTree * | fGDNtuple | 
| Gradient path ntuple, contains params for each step along the path.   | |
| std::vector< Double_t > | fGDOfsTst | 
| offset per tau   | |
| Double_t | fGDPathStep | 
| step size along path (delta nu in eq 22, ref 1)   | |
| Double_t | fGDTau | 
| selected threshold parameter (tau in eq 26, ref 1)   | |
| Double_t | fGDTauMax | 
| max threshold parameter (tau in eq 26, ref 1)   | |
| Double_t | fGDTauMin | 
| min threshold parameter (tau in eq 26, ref 1)   | |
| Double_t | fGDTauPrec | 
| precision in tau   | |
| UInt_t | fGDTauScan | 
| number scan for tau-paths   | |
| std::vector< Double_t > | fGDTauVec | 
| the tau's   | |
| std::vector< Double_t > | fGradVec | 
| gradient vector - dimension = number of rules in ensemble   | |
| std::vector< Double_t > | fGradVecLin | 
| gradient vector - dimension = number of variables   | |
| std::vector< std::vector< Double_t > > | fGradVecLinTst | 
| gradient vector, linear terms - one per tau   | |
| std::vector< std::vector< Double_t > > | fGradVecTst | 
| gradient vector - one per tau   | |
| Double_t | fNEveEffPath | 
| sum of weights for Path events   | |
| Double_t | fNEveEffPerf | 
| idem for Perf events   | |
| UInt_t | fNLinear | 
| number of linear terms   | |
| UInt_t | fNRules | 
| number of rules   | |
| Double_t * | fNTCoeff | 
| GD path: rule coefficients.   | |
| Double_t | fNTCoefRad | 
| GD path: 'radius' of all rulecoeffs.   | |
| Double_t | fNTErrorRate | 
| GD path: error rate (or performance)   | |
| Double_t * | fNTLinCoeff | 
| GD path: linear coefficients.   | |
| Double_t | fNTNuval | 
| GD path: value of nu.   | |
| Double_t | fNTOffset | 
| GD path: model offset.   | |
| Double_t | fNTRisk | 
| GD path: risk.   | |
| UInt_t | fPathIdx1 | 
| first event index for path search   | |
| UInt_t | fPathIdx2 | 
| last event index for path search   | |
| UInt_t | fPerfIdx1 | 
| first event index for performance evaluation   | |
| UInt_t | fPerfIdx2 | 
| last event index for performance evaluation   | |
| RuleEnsemble * | fRuleEnsemble | 
| rule ensemble   | |
| RuleFit * | fRuleFit | 
| rule fit   | |
| Double_t | fsigave | 
| Sigma of current signal score function F(sig)   | |
| Double_t | fsigrms | 
| Rms of F(sig)   | |
Private Member Functions | |
| MsgLogger & | Log () const | 
Private Attributes | |
| MsgLogger * | fLogger | 
| ! message logger   | |
#include <TMVA/RuleFitParams.h>
      
  | 
  protected | 
Definition at line 130 of file RuleFitParams.h.
| TMVA::RuleFitParams::RuleFitParams | ( | ) | 
constructor
Definition at line 64 of file RuleFitParams.cxx.
      
  | 
  virtual | 
destructor
Definition at line 104 of file RuleFitParams.cxx.
      
  | 
  protected | 
calculate the average response - TODO : rewrite bad dependancy on EvaluateAverage() !
note that 0 offset is used
Definition at line 1512 of file RuleFitParams.cxx.
      
  | 
  protected | 
      
  | 
  protected | 
calculate the average truth
Definition at line 1527 of file RuleFitParams.cxx.
      
  | 
  protected | 
Estimates F* (optimum scoring function) for all events for the given sets.
The result is used in ErrorRateReg(). — NOT USED —
Definition at line 885 of file RuleFitParams.cxx.
      
  | 
  inlineprotected | 
Definition at line 136 of file RuleFitParams.h.
      
  | 
  protected | 
calc average response for all test paths - TODO: see comment under CalcAverageResponse() note that 0 offset is used
Definition at line 1491 of file RuleFitParams.cxx.
      
  | 
  protected | 
Estimates the error rate with the current set of parameters It uses a binary estimate of (y-F*(x)) (y-F*(x)) = (Num of events where sign(F)!=sign(y))/Neve y = {+1 if event is signal, -1 otherwise} — NOT USED —.
Definition at line 1008 of file RuleFitParams.cxx.
      
  | 
  protected | 
Estimates the error rate with the current set of parameters This code is pretty messy at the moment.
Cleanup is needed. – NOT USED —
Definition at line 964 of file RuleFitParams.cxx.
      
  | 
  protected | 
Estimates the error rate with the current set of parameters.
It calculates the area under the bkg rejection vs signal efficiency curve. The value returned is 1-area. This works but is less efficient than calculating the Risk using RiskPerf().
Definition at line 1107 of file RuleFitParams.cxx.
      
  | 
  protected | 
Estimates the error rate with the current set of parameters.
It calculates the area under the bkg rejection vs signal efficiency curve. The value returned is 1-area.
Definition at line 1042 of file RuleFitParams.cxx.
      
  | 
  protected | 
Estimates the error rate with the current set of parameters.
It calculates the area under the bkg rejection vs signal efficiency curve. The value returned is 1-area.
See comment under ErrorRateRoc().
Definition at line 1155 of file RuleFitParams.cxx.
      
  | 
  protected | 
evaluate the average of each variable and f(x) in the given range
Definition at line 208 of file RuleFitParams.cxx.
      
  | 
  inlineprotected | 
Definition at line 177 of file RuleFitParams.h.
      
  | 
  inlineprotected | 
Definition at line 180 of file RuleFitParams.h.
      
  | 
  protected | 
helper function to store the rule coefficients in local arrays
Definition at line 868 of file RuleFitParams.cxx.
| Int_t TMVA::RuleFitParams::FindGDTau | ( | ) | 
This finds the cutoff parameter tau by scanning several different paths.
Definition at line 449 of file RuleFitParams.cxx.
      
  | 
  inline | 
Definition at line 91 of file RuleFitParams.h.
      
  | 
  inline | 
Definition at line 92 of file RuleFitParams.h.
      
  | 
  inline | 
Definition at line 93 of file RuleFitParams.h.
      
  | 
  inline | 
Definition at line 94 of file RuleFitParams.h.
| void TMVA::RuleFitParams::Init | ( | ) | 
Initializes all parameters using the RuleEnsemble and the training tree.
Definition at line 114 of file RuleFitParams.cxx.
| void TMVA::RuleFitParams::InitGD | ( | ) | 
Initialize GD path search.
Definition at line 373 of file RuleFitParams.cxx.
      
  | 
  protected | 
initializes the ntuple
Definition at line 185 of file RuleFitParams.cxx.
      
  | 
  inlineprivate | 
Definition at line 254 of file RuleFitParams.h.
Implementation of squared-error ramp loss function (eq 39,40 in ref 1) This is used for binary Classifications where y = {+1,-1} for (sig,bkg)
Definition at line 278 of file RuleFitParams.cxx.
Implementation of squared-error ramp loss function (eq 39,40 in ref 1) This is used for binary Classifications where y = {+1,-1} for (sig,bkg)
Definition at line 290 of file RuleFitParams.cxx.
Implementation of squared-error ramp loss function (eq 39,40 in ref 1) This is used for binary Classifications where y = {+1,-1} for (sig,bkg)
Definition at line 302 of file RuleFitParams.cxx.
| void TMVA::RuleFitParams::MakeGDPath | ( | ) | 
The following finds the gradient directed path in parameter space.
More work is needed... FT, 24/9/2006
The algorithm is currently as follows (if not otherwise stated, the sample used below is [fPathIdx1,fPathIdx2]):
The algorithm will warn if:
Definition at line 538 of file RuleFitParams.cxx.
      
  | 
  protected | 
make gradient vector
Definition at line 1375 of file RuleFitParams.cxx.
      
  | 
  protected | 
make test gradient vector for all tau same algorithm as MakeGradientVector()
Definition at line 1259 of file RuleFitParams.cxx.
      
  | 
  protected | 
implementation of eq.
7.17 in Hastie,Tibshirani & Friedman book this is the covariance between the estimated response yhat and the true value y. NOT REALLY SURE IF THIS IS CORRECT! — THIS IS NOT USED —
Definition at line 925 of file RuleFitParams.cxx.
| Double_t TMVA::RuleFitParams::Penalty | ( | ) | const | 
This is the "lasso" penalty To be used for regression.
— NOT USED —
Definition at line 356 of file RuleFitParams.cxx.
risk assessment
Definition at line 314 of file RuleFitParams.cxx.
risk assessment for tau model <itau> 
Definition at line 334 of file RuleFitParams.cxx.
      
  | 
  inline | 
Definition at line 108 of file RuleFitParams.h.
      
  | 
  inline | 
Definition at line 109 of file RuleFitParams.h.
Definition at line 110 of file RuleFitParams.h.
| UInt_t TMVA::RuleFitParams::RiskPerfTst | ( | ) | 
Estimates the error rate with the current set of parameters.
using the <Perf> subsample. Return the tau index giving the lowest error 
Definition at line 1201 of file RuleFitParams.cxx.
      
  | 
  inline | 
Definition at line 85 of file RuleFitParams.h.
      
  | 
  inline | 
Definition at line 65 of file RuleFitParams.h.
      
  | 
  inline | 
Definition at line 68 of file RuleFitParams.h.
      
  | 
  inline | 
Definition at line 82 of file RuleFitParams.h.
      
  | 
  inline | 
Definition at line 86 of file RuleFitParams.h.
Definition at line 71 of file RuleFitParams.h.
      
  | 
  inline | 
Definition at line 79 of file RuleFitParams.h.
| void TMVA::RuleFitParams::SetMsgType | ( | EMsgType | t | ) | 
Definition at line 1556 of file RuleFitParams.cxx.
      
  | 
  inline | 
Definition at line 62 of file RuleFitParams.h.
Definition at line 1550 of file RuleFitParams.cxx.
      
  | 
  protected | 
Establish maximum gradient for rules, linear terms and the offset.
Definition at line 1441 of file RuleFitParams.cxx.
      
  | 
  protected | 
Establish maximum gradient for rules, linear terms and the offset for all taus TODO: do not need index range!
Definition at line 1327 of file RuleFitParams.cxx.
      
  | 
  protected | 
average of each rule, same range
Definition at line 205 of file RuleFitParams.h.
      
  | 
  protected | 
average of each rule, same range
Definition at line 207 of file RuleFitParams.h.
      
  | 
  protected | 
average of each variable over the range fPathIdx1,2
Definition at line 204 of file RuleFitParams.h.
      
  | 
  protected | 
average of each variable over the range fPerfIdx1,2
Definition at line 206 of file RuleFitParams.h.
      
  | 
  protected | 
average truth, ie sum(y)/N, y=+-1
Definition at line 232 of file RuleFitParams.h.
      
  | 
  protected | 
Average of F(bkg)
Definition at line 248 of file RuleFitParams.h.
      
  | 
  protected | 
Rms of F(bkg)
Definition at line 249 of file RuleFitParams.h.
      
  | 
  protected | 
vector of F*() - filled in CalcFStar()
Definition at line 234 of file RuleFitParams.h.
      
  | 
  protected | 
median value of F*() using
Definition at line 235 of file RuleFitParams.h.
      
  | 
  protected | 
linear coeffs - one per tau
Definition at line 218 of file RuleFitParams.h.
      
  | 
  protected | 
rule coeffs - one per tau
Definition at line 217 of file RuleFitParams.h.
      
  | 
  protected | 
stop scan at error = scale*errmin
Definition at line 230 of file RuleFitParams.h.
      
  | 
  protected | 
error rates per tau
Definition at line 215 of file RuleFitParams.h.
      
  | 
  protected | 
error rate is sufficiently low <— stores boolean
Definition at line 216 of file RuleFitParams.h.
      
  | 
  protected | 
number of path steps
Definition at line 229 of file RuleFitParams.h.
      
  | 
  protected | 
number of tau-paths - calculated in SetGDTauPrec
Definition at line 222 of file RuleFitParams.h.
      
  | 
  protected | 
number of tau in the test-phase that are ok
Definition at line 221 of file RuleFitParams.h.
      
  | 
  protected | 
Gradient path ntuple, contains params for each step along the path.
Definition at line 237 of file RuleFitParams.h.
      
  | 
  protected | 
offset per tau
Definition at line 219 of file RuleFitParams.h.
      
  | 
  protected | 
step size along path (delta nu in eq 22, ref 1)
Definition at line 228 of file RuleFitParams.h.
      
  | 
  protected | 
selected threshold parameter (tau in eq 26, ref 1)
Definition at line 227 of file RuleFitParams.h.
      
  | 
  protected | 
max threshold parameter (tau in eq 26, ref 1)
Definition at line 226 of file RuleFitParams.h.
      
  | 
  protected | 
min threshold parameter (tau in eq 26, ref 1)
Definition at line 225 of file RuleFitParams.h.
      
  | 
  protected | 
precision in tau
Definition at line 223 of file RuleFitParams.h.
      
  | 
  protected | 
number scan for tau-paths
Definition at line 224 of file RuleFitParams.h.
      
  | 
  protected | 
the tau's
Definition at line 220 of file RuleFitParams.h.
      
  | 
  protected | 
gradient vector - dimension = number of rules in ensemble
Definition at line 209 of file RuleFitParams.h.
      
  | 
  protected | 
gradient vector - dimension = number of variables
Definition at line 210 of file RuleFitParams.h.
      
  | 
  protected | 
gradient vector, linear terms - one per tau
Definition at line 213 of file RuleFitParams.h.
      
  | 
  protected | 
gradient vector - one per tau
Definition at line 212 of file RuleFitParams.h.
      
  | 
  mutableprivate | 
! message logger
Definition at line 253 of file RuleFitParams.h.
      
  | 
  protected | 
sum of weights for Path events
Definition at line 201 of file RuleFitParams.h.
      
  | 
  protected | 
idem for Perf events
Definition at line 202 of file RuleFitParams.h.
      
  | 
  protected | 
number of linear terms
Definition at line 192 of file RuleFitParams.h.
      
  | 
  protected | 
number of rules
Definition at line 191 of file RuleFitParams.h.
      
  | 
  protected | 
GD path: rule coefficients.
Definition at line 243 of file RuleFitParams.h.
      
  | 
  protected | 
GD path: 'radius' of all rulecoeffs.
Definition at line 241 of file RuleFitParams.h.
      
  | 
  protected | 
GD path: error rate (or performance)
Definition at line 239 of file RuleFitParams.h.
      
  | 
  protected | 
GD path: linear coefficients.
Definition at line 244 of file RuleFitParams.h.
      
  | 
  protected | 
GD path: value of nu.
Definition at line 240 of file RuleFitParams.h.
      
  | 
  protected | 
GD path: model offset.
Definition at line 242 of file RuleFitParams.h.
      
  | 
  protected | 
GD path: risk.
Definition at line 238 of file RuleFitParams.h.
      
  | 
  protected | 
first event index for path search
Definition at line 197 of file RuleFitParams.h.
      
  | 
  protected | 
last event index for path search
Definition at line 198 of file RuleFitParams.h.
      
  | 
  protected | 
first event index for performance evaluation
Definition at line 199 of file RuleFitParams.h.
      
  | 
  protected | 
last event index for performance evaluation
Definition at line 200 of file RuleFitParams.h.
      
  | 
  protected | 
rule ensemble
Definition at line 189 of file RuleFitParams.h.
      
  | 
  protected | 
rule fit
Definition at line 188 of file RuleFitParams.h.
      
  | 
  protected | 
Sigma of current signal score function F(sig)
Definition at line 246 of file RuleFitParams.h.
      
  | 
  protected | 
Rms of F(sig)
Definition at line 247 of file RuleFitParams.h.