Processing in PROOF

In this section we will have a first look at PROOF processing starting from simple generations of random numbers. Historically this is not the way all this was developed, but we think it is easier to understand the various components starting from this use case.

TSelector framework
Filling a 1D histogram with random numbers

1. TSelector framework

The reason behind PROOF is to increase processing performance by executing in parallel a number of independent tasks. To steer parallel execution PROOF uses the TSelector framework which provides an initialization phase, a processing phase and a termination phase. The processing phase is the one which can be parallelized.

The following table shows the name of the related TSelector methods and where/when they are called:

TSelector calling sequence
Phase	Client	Workers	Description
Client Init	Begin() SlaveBegin()		Client initialization
Worker Init		SlaveBegin()	Worker initizlization
Process		Process()	Called N times
Worker Terminate		SlaveTerminate()	Worker termination
Client Terminate	Terminate()		Client termination

where

Begin()
Method called on the client only; may be used to configure PROOF with specific settings; in general is left empty.
SlaveBegin()
Method called on client AND workers. This is the place where to create the objects to be filled while processing, like histograms; typically is also the place where objects are added to the output list though this is not mandatory at this stage.
Process()
Method called for each task (or event) to be executed.This is the place where actual processing is done and output objects filled.
SlaveTerminate()
Method called after processing is finished on each worker. This the place where to close or cleanup temporary things, like open files. This is also the place where to add objects to the output list, if not already done. Often this meathd is empty.
Terminate()
Method called on the client. It has access to the full output list. This is the place where some final analysis or drawing is done. May be empty of the output object are analyzed elsewhere.

To process something in PROOF we need to provide our code in the form of a derivation of TSelector. From the considerations above the minimal set of methods to overload are SlaveBegin() - for the output definition - and Process() - fr the actual task execution.

2. Filling a 1D histogram with random numbers

We now illustrate the various steps by creating an TSelector implementation which creates a 1D histogram filled with gaussian random numbers and displays it. In this case the task is the generation of a gaussian random number and its filling in the histogram; this task will be repeated for a number N of times.

We start from the template file SelTemplate.h. We rename this to ProofFirst.h, replacing all internal occurrences of 'SelTemplate'. The first thing to do is to define the histogram. We need the object across methods (in Process and in SlaveBegin), so the more convenient way is to have it as data member of the selector. We will therefore have something like this:

//////////////////////////////////////////////////////////
//
// TSelector template
//
//////////////////////////////////////////////////////////

#ifndef ProofFirst_h
#define ProofFirst_h

#include 

class TH1F;
class TRandom;
class ProofFirst : public TSelector {
public :

   // Define members here
   TH1F   *fH1F;             //! Output histogram
   TRandom *fRandom;  //! Random number generator

   ProofFirst();
   virtual ~ProofFirst();
   virtual Int_t   Version() const { return 2; }
   virtual void    Begin(TTree *tree);
   virtual void    SlaveBegin(TTree *tree);
   virtual Bool_t  Process(Long64_t entry);
   virtual void    SetOption(const char *option) { fOption = option; }
   virtual void    SetObject(TObject *obj) { fObject = obj; }
   virtual void    SetInputList(TList *input) { fInput = input; }
   virtual TList  *GetOutputList() const { return fOutput; }
   virtual void    SlaveTerminate();
   virtual void    Terminate();

   ClassDef(ProofFirst,2);
};
#endif

The object is not streamed when streaming the selector because we want it to live on the worker only (selector streaming is an advanced feature which may encounter later).

Next comes the implementation file. After copying SelTemplate.C into ProofFirst.C (and replacing all internal occurrences of SelTemplate) we need to include TH1F.h and TRandom3.h (we will use the TRandom3 implementation of the random generator), initialize the pointers to 0 in the constructor and destroy the random number generator in the destructor:

#include "ProofFirst.h"
#include "TH1F.h"
#include "TRandom3.h"

//_____________________________________________________________________________
ProofFirst::ProofFirst()
{
   // Constructor
   fH1F = 0;
   fRandom = 0;
}

//_____________________________________________________________________________
ProofFirst::~ProofFirst()
{
   // Destructor
   if (fRandom) delete fRandom;
}

Note that we do not explicitly destroy the histogram, as it will be owned by the output list. We then create the histogram and the random generator instance in SlaveBegin():

//_____________________________________________________________________________
void ProofFirst::SlaveBegin(TTree * /*tree*/)
{
   // The SlaveBegin() function is called after the Begin() function.
   // When running with PROOF SlaveBegin() is called on each slave server.
   // The tree argument is deprecated (on PROOF 0 is passed).

   // TString option = GetOption();

   // Histogram
   fH1F = new TH1F("FirstH1F", "First TH1F in PROOF", 100, -10., 10.);
   fOutput->Add(fH1F);

   // Random number generator
   fRandom = new TRandom3(0);
}

The initialization of the random generator with seed 0 means a unique seed. The next step is to add the relevant instructions to Process():

//_____________________________________________________________________________
Bool_t ProofFirst::Process(Long64_t)
{
   // The Process() function is called for each entry in the tree (or possibly
   // keyed object in the case of PROOF) to be processed. The entry argument
   // specifies which entry in the currently loaded tree is to be processed.
   // It can be passed to either ProofFirst::GetEntry() or TBranch::GetEntry()
   // to read either all or the required parts of the data. When processing
   // keyed objects with PROOF, the object is already loaded and is available
   // via the fObject pointer.
   //
   // This function should contain the "body" of the analysis. It can contain
   // simple or elaborate selection criteria, run algorithms on the data
   // of the event and typically fill histograms.
   //
   // The processing can be stopped by calling Abort().
   //
   // Use fStatus to set the return value of TTree::Process().
   //
   // The return value is currently not used.

   if (fRandom && fH1F) {
      Double_t x = fRandom->Gaus(0.,1.);
      fH1F->Fill(x);
   }

   return kTRUE;
}

Finally we display the result in terminate:

 //_____________________________________________________________________________
void ProofFirst::Terminate()
{
   // The Terminate() function is the last function to be called during
   // a query. It always runs on the client, it can be used to present
   // the results graphically or save the results to file.

   // Create a canvas, with 100 pads
   TCanvas *c1 = new TCanvas("c1", "Proof ProofFirst canvas",200,10,400,400);
   fH1F = dynamic_cast(fOutput->FindObject("FirstH1F"));
   if (fH1F) fH1F->Draw();
   c1->Update();
}

We are now ready to process this selector. We do it invoking the method TProof::Process(const char *selector, Long64_t nentries):

root [0] TProof *plite= TProof::Open("lite://")
 +++ Starting PROOF-Lite with 2 workers +++
Opening connections to workers: OK (2 workers)                 
Setting up worker servers: OK (2 workers)                 
PROOF set to parallel mode (2 workers)
(class TProof*)0x101891600
root [1] plite->Process("ProofFirst.C+", 10000000)
 Info in <:setqueryrunning>: starting query: 1
Info in <:setrunning>: nwrks: 2
Info in <:aclic>: creating shared library /Users/ganis/dropbox/Private/Tutorial/root-tutorial/tutorial/./ProofFirst_C.so
Lite-0: all output objects have been merged                                                         
(Long64_t)0

We did the processing in PROOF-Lite. The resulting canvas should look like this:

You are here