24 Setup connection to a Dask cluster. Two ingredients are needed:
25 1. Creating a cluster object that represents computing resources. This can be
26 done in various ways depending on the type of resources at disposal. To use
27 only the local machine (e.g. your laptop), a `LocalCluster` object can be
28 used. This step can be skipped if you have access to an existing Dask
29 cluster; in that case, the cluster administrator should provide you with a
30 URL to connect to the cluster in step 2. More options for cluster creation
31 can be found in the Dask docs at
32 http://distributed.dask.org/en/stable/api.html#cluster .
33 2. Creating a Dask client object that connects to the cluster. This accepts
34 directly the object previously created. In case the cluster was setup
35 externally, you need to provide an endpoint URL to the client, e.g.
36 'https://myscheduler.domain:8786'.
38 Through Dask, you can connect to various types of cluster resources. For
39 example, you can connect together a set of machines through SSH and use them
40 to run your computations. This is done through the `SSHCluster` class. For
44 from dask.distributed import SSHCluster
46 # A list with machine host names, the first name will be used as
47 # scheduler, following names will become workers.
48 hosts=["machine1","machine2","machine3"],
49 # A dictionary of options for each worker node, here we set the number
50 # of cores to be used on each node.
51 worker_options={"nprocs":4,},
55 Another common usecase is interfacing Dask to a batch system like HTCondor or
56 Slurm. A separate package called dask-jobqueue (https://jobqueue.dask.org)
57 extends the available Dask cluster classes to enable running Dask computations
58 as batch jobs. In this case, the cluster object usually receives the parameters
59 that would be written in the job description file. For example:
62 from dask_jobqueue import HTCondorCluster
63 cluster = HTCondorCluster(
68 # Use the scale method to send as many jobs as needed
72 In this tutorial, a cluster object is created for the local machine, using
73 multiprocessing (processes=True) on 2 workers (n_workers=2) each using only
74 1 core (threads_per_worker=1) and 2GiB of RAM (memory_limit="2GiB").
76 cluster =
LocalCluster(n_workers=2, threads_per_worker=1, processes=
True, memory_limit=
"2GiB")
77 client = Client(cluster)
84if __name__ ==
"__main__":
93 df_1 =
df.Define(
"gaus",
"gRandom->Gaus(10, 1)").Define(
"exponential",
"gRandom->Exp(10)")
96 h_gaus =
df_1.Histo1D((
"gaus",
"Normal distribution", 50, 0, 30),
"gaus")
97 h_exp =
df_1.Histo1D((
"exponential",
"Exponential distribution", 50, 0, 30),
"exponential")
108 c.SaveAs(
"distrdf002_dask_connection.png")
109 print(
"Saved figure to distrdf002_dask_connection.png")
ROOT::Detail::TRangeCast< T, true > TRangeDynCast
TRangeDynCast is an adapter class that allows the typed iteration through a TCollection.
ROOT's RDataFrame offers a modern, high-level interface for analysis of data stored in TTree ,...