Configure a Dask connection to a HTCondor cluster hosted by the CERN batch service.
To reproduce this tutorial, run the following steps:
- Login to lxplus
- Source an LCG release (minimum LCG104). See https://lcgdocs.web.cern.ch/lcgdocs/lcgreleases/introduction/ for details
- Install the
dask_lxplus package, which provides the CernCluster class needed to properly connect to the CERN condor pools. See https://batchdocs.web.cern.ch/specialpayload/dask.html for instructions
- Run this tutorial
The tutorial defines resources that each job will request to the condor scheduler, then creates a Dask client that can be used by RDataFrame to distribute computations.
from datetime import datetime
import socket
import time
from dask_lxplus import CernCluster
import ROOT
"""
Creates a connection to HTCondor cluster offered by the CERN batch service.
Returns a Dask client that RDataFrame will use to distribute computations.
"""
cores=1,
memory='2000MB',
disk='1GB',
death_timeout='60',
lcg=True,
nanny=True,
container_runtime='none',
scheduler_options={
'port': 8786,
},
job_extra={
'MY.JobFlavour': '"espresso"',
},
)
n_workers = 2
client = Client(cluster)
print(f"Waiting for {n_workers} workers to start.")
print(f"All workers are ready, took {round(end - start, 2)} seconds.")
return client
"""
Run a simple example with RDataFrame, using the previously created
connection to the HTCondor cluster.
"""
"x", "gRandom->Rndm() * 100")
print(f"Dataset has {nentries.GetValue()} entries")
print("Column x stats:")
print(f"\tmean: {meanv.GetValue()}")
print(f"\tmax: {maxv.GetValue()}")
print(f"\tmin: {minv.GetValue()}")
if __name__ == "__main__":
print(f"Starting the computations at {datetime.now()}")
print(f"Computations ended at {datetime.now()}, "
f"took {round(end - start, 2)} seconds.")
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...
- Date
- September 2023
- Author
- Vincenzo Eduardo Padulano CERN
Definition in file distrdf004_dask_lxbatch.py.