Logo ROOT  
Reference Guide
 
Loading...
Searching...
No Matches
DLMinimizers.h
Go to the documentation of this file.
1// @(#)root/tmva/tmva/cnn:$Id$
2// Author: Vladimir Ilievski
3
4/**********************************************************************************
5 * Project: TMVA - a Root-integrated toolkit for multivariate data analysis *
6 * Package: TMVA *
7 * Class : TDLGradientDescent *
8 * Web : http://tmva.sourceforge.net *
9 * *
10 * Description: *
11 * Deel Learning Minimizers *
12 * *
13 * Authors (alphabetical): *
14 * Vladimir Ilievski <ilievski.vladimir@live.com> - CERN, Switzerland *
15 * *
16 * Copyright (c) 2005-2015: *
17 * CERN, Switzerland *
18 * U. of Victoria, Canada *
19 * MPI-K Heidelberg, Germany *
20 * U. of Bonn, Germany *
21 * *
22 * Redistribution and use in source and binary forms, with or without *
23 * modification, are permitted according to the terms listed in LICENSE *
24 * (http://tmva.sourceforge.net/LICENSE) *
25 **********************************************************************************/
26
27#ifndef TMVA_DNN_DLMINIMIZERS
28#define TMVA_DNN_DLMINIMIZERS
29
31#include "TMVA/DNN/Functions.h"
32#include "TMVA/DNN/DeepNet.h"
33
34#include <limits>
35#include <vector>
36
37namespace TMVA {
38namespace DNN {
39
40/*** \class TDLGradientDescent
41 *
42 * Generic implementation of gradient descent minimization for the
43 * deep learning neural nets.
44 *
45 * The TDLGradientDescent class implements an architecture, input data and
46 * deep learning neural network type independent implementation of the gradient
47 * descent minimization algorithm.
48 *
49* This is provided by the Step(...), StepMomentum(...) and
50 * StepNesterov(...) functions that perform a single minimization step.
51 *
52 * The main training characteristics are defined by the provided learning rate,
53 * the test interval, and the convergence steps required for convergence. The
54 * test interval defines how often the error on the validation set is computed,
55 * and the values with which the step counter is increased each time the
56 * HasConverged() member function is called. A convergence step is defined as
57 * a step in which the test error is NOT less than 0.999 times the current
58 * minimal test error that has been reached. If between two subsequent calls
59 * to HasConverged(Double_t) the test error has not been sufficiently reduced
60 * it is assumed that a number of convergence steps equal to the test interval
61 * has been performed.
62 */
63
64template <typename Architecture_t>
66public:
68 using Scalar_t = typename Architecture_t::Scalar_t;
69 using Matrix_t = typename Architecture_t::Matrix_t;
70
71private:
72 size_t fBatchSize; ///< Batch size to use for the training.
73 size_t fStepCount; ///< Number of steps performed in the current training session
74 size_t fConvergenceSteps; ///< Number of training epochs without considerable
75 ///< decrease in the test error for convergence.
76 size_t fConvergenceCount; ///< Current number of training epochs without
77 ///< considerable decrease in the test error.
78 size_t fTestInterval; ///< Interval for the computation of the test error.
79 Scalar_t fTrainingError; ///< Holds the most recently computed training loss.
80 Scalar_t fTestError; ///< Holds the most recently computed test loss.
81 Scalar_t fLearningRate; ///< Learning rate \f$\alpha\f$
82 Scalar_t fMinimumError; ///< The minimum loss achieved on the training set during the current training session.
83
84public:
86 TDLGradientDescent(Scalar_t learningRate, size_t convergenceSteps, size_t testInterval);
87
88 /** Reset minimizer object to default state. */
89 void Reset()
90 {
91 fMinimumError = std::numeric_limits<Scalar_t>::infinity();
93 fStepCount = 0;
94 };
95
96 /** Perform a single optimization step on a given batch. Propagates the input
97 matrix forward through the net, evaluates the loss and propagates the gradients
98 backward through the net. The computed gradients are scaled by the learning
99 rate \f$\alpha\f$ and subtracted from the weights and bias values of each
100 layer. */
101 void Step(DeepNet_t &deepNet, std::vector<Matrix_t> &input, const Matrix_t &output, const Matrix_t &weights);
102
103 /** Does not evaluate the loss and therefore not trigger a possible synchronization
104 * with the device. Trains the weights of each layer, but only the bias terms of
105 * the first layer for compatibility with the previous implementation. */
106 void StepReducedWeights(DeepNet_t &deepNet, std::vector<Matrix_t> &input, const Matrix_t &output,
107 const Matrix_t &weights);
108
109 /** Same as Step(...) but also evaluate the loss on the given training data.
110 * Note that this requires synchronization between host and device. */
111 Scalar_t StepLoss(DeepNet_t &deepNet, std::vector<Matrix_t> &input, const Matrix_t &output, const Matrix_t &weights);
112
113 /** Similar to StepReducedWeights(...) but also evaluates the loss. May trigger
114 * synchronization with the device. */
115 Scalar_t StepReducedWeightsLoss(DeepNet_t &deepNet, std::vector<Matrix_t> &input, const Matrix_t &output,
116 const Matrix_t &weights);
117
118 /** Perform multiple optimization steps simultaneously. Performs the
119 * backprop algorithm on the input batches given in \p batches on
120 * the neural networks given in \p nets. The forward and backward propagation
121 * steps are executed in an interleaving manner in order to exploit potential
122 * batch-level parallelism for asynchronous device calls.
123 */
124 void Step(DeepNet_t &master, std::vector<DeepNet_t> &nets, std::vector<TTensorBatch<Architecture_t>> &batches);
125
126 /** Same as the Step(...) method for multiple batches but uses momentum. */
127 void StepMomentum(DeepNet_t &master, std::vector<DeepNet_t> &nets,
128 std::vector<TTensorBatch<Architecture_t>> &batches, Scalar_t momentum);
129
130 /** Same as the Step(...) method for multiple batches but uses Nesterov
131 * momentum. */
132 void StepNesterov(DeepNet_t &master, std::vector<DeepNet_t> &nets,
133 std::vector<TTensorBatch<Architecture_t>> &batches, Scalar_t momentum);
134
135 /** Increases the minimization step counter by the test error evaluation
136 * period and uses the current internal value of the test error to
137 * determine if the minimization has converged. */
138 bool HasConverged();
139
140 /** Increases the minimization step counter by the test error evaluation
141 * period and uses the provided test error value to determine if the
142 * minimization has converged. */
143 bool HasConverged(Scalar_t testError);
144
145 /** Getters */
146 size_t GetConvergenceCount() const { return fConvergenceCount; }
147 size_t GetConvergenceSteps() const { return fConvergenceSteps; }
149 Scalar_t GetTestError() const { return fTestError; }
150 size_t GetTestInterval() const { return fTestInterval; }
151
152 /** Setters */
153 void SetConvergenceSteps(size_t steps) { fConvergenceSteps = steps; }
154 void SetTestInterval(size_t interval) { fTestInterval = interval; }
156 void SetBatchSize(Scalar_t rate) { fBatchSize = rate; }
157};
158
159//
160// Implementation
161//______________________________________________________________________________
162template <typename Architecture_t>
164 : fBatchSize(0), fStepCount(0), fConvergenceSteps(0), fConvergenceCount(0), fTestInterval(0), fLearningRate(0),
165 fMinimumError(std::numeric_limits<Scalar_t>::infinity())
166{
167 // Nothing to do here.
168}
169
170//______________________________________________________________________________
171template <typename Architecture_t>
173 size_t testInterval)
174 : fBatchSize(0), fStepCount(0), fConvergenceSteps(convergenceSteps), fConvergenceCount(0),
175 fTestInterval(testInterval), fLearningRate(learningRate), fMinimumError(std::numeric_limits<Scalar_t>::infinity())
176{
177 // Nothing to do here.
178}
179
180//______________________________________________________________________________
181template <typename Architecture_t>
182void TDLGradientDescent<Architecture_t>::Step(DeepNet_t &deepNet, std::vector<Matrix_t> &input, const Matrix_t &output,
183 const Matrix_t &weights)
184{
185 // Make forward and backward pass and update the net afterwards
186 deepNet.Forward(input, true);
187 deepNet.Backward(input, output, weights);
188 deepNet.Update(fLearningRate);
189}
190
191//______________________________________________________________________________
192template <typename Architecture_t>
194 const Matrix_t &output, const Matrix_t &weights)
195{
196 // Make forward and backward pass and update the net afterwards
197 deepNet.Forward(input, true);
198 deepNet.Backward(input, output, weights);
199
200 for (size_t i = 0; i < deepNet.GetDepth(); i++) {
201 auto *layer = deepNet.GetLayerAt(i);
202
203 layer->UpdateWeights(layer->GetWeightGradients(), fLearningRate);
204 if (i == 0) {
205 layer->UpdateBiases(layer->GetBiasGradients(), fLearningRate);
206 }
207 }
208}
209
210//______________________________________________________________________________
211template <typename Architecture_t>
213 const Matrix_t &output, const Matrix_t &weights) -> Scalar_t
214{
215 Scalar_t loss = deepNet.Loss(input, output);
216 deepNet.Backward(input, output, weights);
217 deepNet.Update(fLearningRate);
218
219 return loss;
220}
221
222//______________________________________________________________________________
223template <typename Architecture_t>
225 const Matrix_t &output, const Matrix_t &weights)
226 -> Scalar_t
227{
228 Scalar_t loss = deepNet.Loss(input, output);
229 fTrainingError = loss;
230 deepNet.Backward(input, output, weights);
231
232 for (size_t i = 0; i < deepNet.GetDepth(); i++) {
233 auto *layer = deepNet.GetLayerAt(i);
234
235 layer->UpdateWeights(layer->GetWeightGradients(), fLearningRate);
236 if (i == 0) {
237 layer->UpdateBiases(layer->GetBiasGradients(), fLearningRate);
238 }
239 }
240
241 return loss;
242}
243
244//______________________________________________________________________________
245template <typename Architecture_t>
246void TDLGradientDescent<Architecture_t>::Step(DeepNet_t &master, std::vector<DeepNet_t> &nets,
247 std::vector<TTensorBatch<Architecture_t>> &batches)
248{
249
250 master.ParallelForward(nets, batches);
251 master.ParallelBackward(nets, batches, fLearningRate);
252}
253
254//______________________________________________________________________________
255template <typename Architecture_t>
256void TDLGradientDescent<Architecture_t>::StepMomentum(DeepNet_t &master, std::vector<DeepNet_t> &nets,
257 std::vector<TTensorBatch<Architecture_t>> &batches,
258 Scalar_t momentum)
259{
260 master.ParallelForward(nets, batches);
261 master.ParallelBackwardMomentum(nets, batches, fLearningRate, momentum);
262}
263
264//______________________________________________________________________________
265template <typename Architecture_t>
266void TDLGradientDescent<Architecture_t>::StepNesterov(DeepNet_t &master, std::vector<DeepNet_t> &nets,
267 std::vector<TTensorBatch<Architecture_t>> &batches,
268 Scalar_t momentum)
269{
270 master.ParallelForward(nets, batches);
271 master.ParallelBackwardNestorov(nets, batches, fLearningRate, momentum);
272}
273
274//______________________________________________________________________________
275template <typename Architecture_t>
277{
278 if (fTestError < fMinimumError * 0.999) {
279 fConvergenceCount = 0;
280 fMinimumError = fTestError;
281 } else {
282 fConvergenceCount++;
283 }
284
285 return (fConvergenceCount >= fConvergenceSteps);
286}
287
288//______________________________________________________________________________
289template <typename Architecture_t>
291{
292 fTestError = testError;
293 if (fTestError < fMinimumError * 0.999) {
294 fConvergenceCount = 0;
295 fMinimumError = fTestError;
296 } else {
297 fConvergenceCount += fTestInterval;
298 }
299 return (fConvergenceCount >= fConvergenceSteps);
300}
301
302} // namespace DNN
303} // namespace TMVA
304
305#endif
Option_t Option_t TPoint TPoint const char GetTextMagnitude GetFillStyle GetLineColor GetLineWidth GetMarkerStyle GetTextAlign GetTextColor GetTextSize void input
Scalar_t fMinimumError
The minimum loss achieved on the training set during the current training session.
void SetBatchSize(Scalar_t rate)
typename Architecture_t::Scalar_t Scalar_t
Scalar_t fTestError
Holds the most recently computed test loss.
void StepNesterov(DeepNet_t &master, std::vector< DeepNet_t > &nets, std::vector< TTensorBatch< Architecture_t > > &batches, Scalar_t momentum)
Same as the Step(...) method for multiple batches but uses Nesterov momentum.
bool HasConverged()
Increases the minimization step counter by the test error evaluation period and uses the current inte...
void SetTestInterval(size_t interval)
Scalar_t StepReducedWeightsLoss(DeepNet_t &deepNet, std::vector< Matrix_t > &input, const Matrix_t &output, const Matrix_t &weights)
Similar to StepReducedWeights(...) but also evaluates the loss.
size_t fStepCount
Number of steps performed in the current training session.
void Reset()
Reset minimizer object to default state.
size_t fBatchSize
Batch size to use for the training.
void StepReducedWeights(DeepNet_t &deepNet, std::vector< Matrix_t > &input, const Matrix_t &output, const Matrix_t &weights)
Does not evaluate the loss and therefore not trigger a possible synchronization with the device.
void SetConvergenceSteps(size_t steps)
Setters.
size_t fConvergenceCount
Current number of training epochs without.
void SetLearningRate(Scalar_t rate)
Scalar_t StepLoss(DeepNet_t &deepNet, std::vector< Matrix_t > &input, const Matrix_t &output, const Matrix_t &weights)
Same as Step(...) but also evaluate the loss on the given training data.
size_t fConvergenceSteps
Number of training epochs without considerable.
TDeepNet< Architecture_t > DeepNet_t
size_t GetConvergenceCount() const
Getters.
size_t fTestInterval
Interval for the computation of the test error.
typename Architecture_t::Matrix_t Matrix_t
Scalar_t GetTrainingError() const
Scalar_t fLearningRate
Learning rate .
Scalar_t fTrainingError
Holds the most recently computed training loss.
void Step(DeepNet_t &deepNet, std::vector< Matrix_t > &input, const Matrix_t &output, const Matrix_t &weights)
Perform a single optimization step on a given batch.
void StepMomentum(DeepNet_t &master, std::vector< DeepNet_t > &nets, std::vector< TTensorBatch< Architecture_t > > &batches, Scalar_t momentum)
Same as the Step(...) method for multiple batches but uses momentum.
Generic Deep Neural Network class.
Definition DeepNet.h:73
void Forward(Tensor_t &input, bool applyDropout=false)
Function that executes the entire forward pass in the network.
Definition DeepNet.h:896
size_t GetDepth() const
Definition DeepNet.h:326
void Backward(const Tensor_t &input, const Matrix_t &groundTruth, const Matrix_t &weights)
Function that executes the entire backward pass in the network.
Definition DeepNet.h:1033
Layer_t * GetLayerAt(size_t i)
Get the layer in the vector of layers at position i.
Definition DeepNet.h:322
void Update(Scalar_t learningRate)
Function that will update the weights and biases in the layers that contain weights and biases.
Definition DeepNet.h:1254
create variable transformations
static void output()