In this section we will have a first look at PROOF processing starting from simple generations of random numbers. Historically this is not the way all this was developed, but we think it is easier to understand the various components starting from this use case.
The reason behind PROOF is to increase processing performance by executing in parallel a number of independent tasks. To steer parallel execution PROOF uses the TSelector framework which provides an initialization phase, a processing phase and a termination phase. The processing phase is the one which can be parallelized.
The following table shows the name of the related TSelector methods and where/when they are called:
Phase | Client | Workers | Description |
---|---|---|---|
Client Init |
Begin() SlaveBegin() | Client initialization | |
Worker Init | SlaveBegin() | Worker initizlization | |
Process | Process() | Called N times | |
Worker Terminate | SlaveTerminate() | Worker termination | |
Client Terminate | Terminate() | Client termination |
where
- Begin()
Method called on the client only; may be used to configure PROOF with specific settings; in general is left empty. - SlaveBegin()
Method called on client AND workers. This is the place where to create the objects to be filled while processing, like histograms; typically is also the place where objects are added to the output list though this is not mandatory at this stage. - Process()
Method called for each task (or event) to be executed.This is the place where actual processing is done and output objects filled. - SlaveTerminate()
Method called after processing is finished on each worker. This the place where to close or cleanup temporary things, like open files. This is also the place where to add objects to the output list, if not already done. Often this meathd is empty. - Terminate()
Method called on the client. It has access to the full output list. This is the place where some final analysis or drawing is done. May be empty of the output object are analyzed elsewhere.
To process something in PROOF we need to provide our code in the form of a derivation of TSelector. From the considerations above the minimal set of methods to overload are SlaveBegin() - for the output definition - and Process() - fr the actual task execution.
2. Filling a 1D histogram with random numbers
We now illustrate the various steps by creating an TSelector implementation which creates a 1D histogram filled with gaussian random numbers and displays it. In this case the task is the generation of a gaussian random number and its filling in the histogram; this task will be repeated for a number N of times.
We start from the template file SelTemplate.h. We rename this to ProofFirst.h, replacing all internal occurrences of 'SelTemplate'. The first thing to do is to define the histogram. We need the object across methods (in Process and in SlaveBegin), so the more convenient way is to have it as data member of the selector. We will therefore have something like this:
//////////////////////////////////////////////////////////
//
// TSelector template
//
//////////////////////////////////////////////////////////
#ifndef ProofFirst_h
#define ProofFirst_h
#include
class TH1F;
class TRandom;
class ProofFirst : public TSelector {
public :
// Define members here
TH1F *fH1F; //! Output histogram
TRandom *fRandom; //! Random number generator
ProofFirst();
virtual ~ProofFirst();
virtual Int_t Version() const { return 2; }
virtual void Begin(TTree *tree);
virtual void SlaveBegin(TTree *tree);
virtual Bool_t Process(Long64_t entry);
virtual void SetOption(const char *option) { fOption = option; }
virtual void SetObject(TObject *obj) { fObject = obj; }
virtual void SetInputList(TList *input) { fInput = input; }
virtual TList *GetOutputList() const { return fOutput; }
virtual void SlaveTerminate();
virtual void Terminate();
ClassDef(ProofFirst,2);
};
#endif
The object is not streamed when streaming the selector because we want it to live on the worker only (selector streaming is an advanced feature which may encounter later).
Next comes the implementation file. After copying SelTemplate.C into ProofFirst.C (and replacing all internal occurrences of SelTemplate) we need to include TH1F.h and TRandom3.h (we will use the TRandom3 implementation of the random generator), initialize the pointers to 0 in the constructor and destroy the random number generator in the destructor:
#include "ProofFirst.h"
#include "TH1F.h"
#include "TRandom3.h"
//_____________________________________________________________________________
ProofFirst::ProofFirst()
{
// Constructor
fH1F = 0;
fRandom = 0;
}
//_____________________________________________________________________________
ProofFirst::~ProofFirst()
{
// Destructor
if (fRandom) delete fRandom;
}
Note that we do not explicitly destroy the histogram, as it will be owned by the output list.
We then create the histogram and the random generator instance in SlaveBegin():
//_____________________________________________________________________________
void ProofFirst::SlaveBegin(TTree * /*tree*/)
{
// The SlaveBegin() function is called after the Begin() function.
// When running with PROOF SlaveBegin() is called on each slave server.
// The tree argument is deprecated (on PROOF 0 is passed).
// TString option = GetOption();
// Histogram
fH1F = new TH1F("FirstH1F", "First TH1F in PROOF", 100, -10., 10.);
fOutput->Add(fH1F);
// Random number generator
fRandom = new TRandom3(0);
}
The initialization of the random generator with seed 0 means a unique seed. The next step is to add the relevant instructions to Process():
//_____________________________________________________________________________
Bool_t ProofFirst::Process(Long64_t)
{
// The Process() function is called for each entry in the tree (or possibly
// keyed object in the case of PROOF) to be processed. The entry argument
// specifies which entry in the currently loaded tree is to be processed.
// It can be passed to either ProofFirst::GetEntry() or TBranch::GetEntry()
// to read either all or the required parts of the data. When processing
// keyed objects with PROOF, the object is already loaded and is available
// via the fObject pointer.
//
// This function should contain the "body" of the analysis. It can contain
// simple or elaborate selection criteria, run algorithms on the data
// of the event and typically fill histograms.
//
// The processing can be stopped by calling Abort().
//
// Use fStatus to set the return value of TTree::Process().
//
// The return value is currently not used.
if (fRandom && fH1F) {
Double_t x = fRandom->Gaus(0.,1.);
fH1F->Fill(x);
}
return kTRUE;
}
Finally we display the result in terminate:
//_____________________________________________________________________________
void ProofFirst::Terminate()
{
// The Terminate() function is the last function to be called during
// a query. It always runs on the client, it can be used to present
// the results graphically or save the results to file.
// Create a canvas, with 100 pads
TCanvas *c1 = new TCanvas("c1", "Proof ProofFirst canvas",200,10,400,400);
fH1F = dynamic_cast(fOutput->FindObject("FirstH1F"));
if (fH1F) fH1F->Draw();
c1->Update();
}
We are now ready to process this selector. We do it invoking the method TProof::Process(const char *selector, Long64_t nentries):
root [0] TProof *plite= TProof::Open("lite://")
+++ Starting PROOF-Lite with 2 workers +++
Opening connections to workers: OK (2 workers)
Setting up worker servers: OK (2 workers)
PROOF set to parallel mode (2 workers)
(class TProof*)0x101891600
root [1] plite->Process("ProofFirst.C+", 10000000)
Info in <:setqueryrunning>: starting query: 1
Info in <:setrunning>: nwrks: 2
Info in <:aclic>: creating shared library /Users/ganis/dropbox/Private/Tutorial/root-tutorial/tutorial/./ProofFirst_C.so
Lite-0: all output objects have been merged
(Long64_t)0
We did the processing in PROOF-Lite. The resulting canvas should look like this: