ROOT logo
ROOT » MATH » MATHCORE » TKDTree<int,float>

class TKDTree<int,float>: public TObject


                      kd-tree and its implementation in TKDTree

 Contents:
 1. What is kd-tree
 2. How to cosntruct kdtree - Pseudo code
 3. Using TKDTree
    a. Creating the kd-tree and setting the data
    b. Navigating the kd-tree
 4. TKDTree implementation - technical details
    a. The order of nodes in internal arrays
    b. Division algorithm
    c. The order of nodes in boundary related arrays



 1. What is kdtree ? ( http://en.wikipedia.org/wiki/Kd-tree )

 In computer science, a kd-tree (short for k-dimensional tree) is a space-partitioning data structure
 for organizing points in a k-dimensional space. kd-trees are a useful data structure for several
 applications, such as searches involving a multidimensional search key (e.g. range searches and
 nearest neighbour searches). kd-trees are a special case of BSP trees.

 A kd-tree uses only splitting planes that are perpendicular to one of the coordinate system axes.
 This differs from BSP trees, in which arbitrary splitting planes can be used.
 In addition, in the typical definition every node of a kd-tree, from the root to the leaves, stores a point.
 This differs from BSP trees, in which leaves are typically the only nodes that contain points
 (or other geometric primitives). As a consequence, each splitting plane must go through one of
 the points in the kd-tree. kd-trees are a variant that store data only in leaf nodes.

 2. Constructing a classical kd-tree ( Pseudo code)

 Since there are many possible ways to choose axis-aligned splitting planes, there are many different ways
 to construct kd-trees. The canonical method of kd-tree construction has the following constraints:

     * As one moves down the tree, one cycles through the axes used to select the splitting planes.
      (For example, the root would have an x-aligned plane, the root's children would both have y-aligned
       planes, the root's grandchildren would all have z-aligned planes, and so on.)
     * At each step, the point selected to create the splitting plane is the median of the points being
       put into the kd-tree, with respect to their coordinates in the axis being used. (Note the assumption
       that we feed the entire set of points into the algorithm up-front.)

 This method leads to a balanced kd-tree, in which each leaf node is about the same distance from the root.
 However, balanced trees are not necessarily optimal for all applications.
 The following pseudo-code illustrates this canonical construction procedure (NOTE, that the procedure used
 by the TKDTree class is a bit different, the following pseudo-code is given as a simple illustration of the
 concept):

 function kdtree (list of points pointList, int depth)
 {
     if pointList is empty
         return nil;
     else
     {
         // Select axis based on depth so that axis cycles through all valid values
         var int axis := depth mod k;

         // Sort point list and choose median as pivot element
         select median from pointList;

         // Create node and construct subtrees
         var tree_node node;
         node.location := median;
         node.leftChild := kdtree(points in pointList before median, depth+1);
         node.rightChild := kdtree(points in pointList after median, depth+1);
         return node;
     }
 }

 Our construction method is optimized to save memory, and differs a bit from the constraints above.
 In particular, the division axis is chosen as the one with the biggest spread, and the point to create the
 splitting plane is chosen so, that one of the two subtrees contains exactly 2^k terminal nodes and is a
 perfectly balanced binary tree, and, while at the same time, trying to keep the number of terminal nodes
 in the 2 subtrees as close as possible. The following section gives more details about our implementation.

 3. Using TKDTree

 3a. Creating the tree and setting the data
     The interface of the TKDTree, that allows to set input data, has been developped to simplify using it
     together with TTree::Draw() functions. That's why the data has to be provided column-wise. For example:
     {
     TTree *datatree = ...

     datatree->Draw("x:y:z", "selection", "goff");
     //now make a kd-tree on the drawn variables
     TKDTreeID *kdtree = new TKDTreeID(npoints, 3, 1);
     kdtree->SetData(0, datatree->GetV1());
     kdtree->SetData(1, datatree->GetV2());
     kdtree->SetData(2, datatree->GetV3());
     kdtree->Build();
     }
     NOTE, that this implementation of kd-tree doesn't support adding new points after the tree has been built
     Of course, it's not necessary to use TTree::Draw(). What is important, is to have data columnwise.
     An example with regular arrays:
     {
     Int_t npoints = 100000;
     Int_t ndim = 3;
     Int_t bsize = 1;
     Double_t xmin = -0.5;
     Double_t xmax = 0.5;
     Double_t *data0 = new Double_t[npoints];
     Double_t *data1 = new Double_t[npoints];
     Double_t *data2 = new Double_t[npoints];
     Double_t *y     = new Double_t[npoints];
     for (Int_t i=0; i<npoints; i++){
        data0[i]=gRandom->Uniform(xmin, xmax);
        data1[i]=gRandom->Uniform(xmin, xmax);
        data2[i]=gRandom->Uniform(xmin, xmax);
     }
     TKDTreeID *kdtree = new TKDTreeID(npoints, ndim, bsize);
     kdtree->SetData(0, data0);
     kdtree->SetData(1, data1);
     kdtree->SetData(2, data2);
     kdtree->Build();
     }

     By default, the kd-tree doesn't own the data and doesn't delete it with itself. If you want the
     data to be deleted together with the kd-tree, call TKDTree::SetOwner(kTRUE).

     Most functions of the kd-tree don't require the original data to be present after the tree
     has been built. Check the functions documentation for more details.

 3b. Navigating the kd-tree

     Nodes of the tree are indexed top to bottom, left to right. The root node has index 0. Functions
     TKDTree::GetLeft(Index inode), TKDTree::GetRight(Index inode) and TKDTree::GetParent(Index inode)
     allow to find the children and the parent of a given node.

     For a given node, one can find the indexes of the original points, contained in this node,
     by calling the GetNodePointsIndexes(Index inode) function. Additionally, for terminal nodes,
     there is a function GetPointsIndexes(Index inode) that returns a pointer to the relevant
     part of the index array. To find the number of point in the node
     (not only terminal), call TKDTree::GetNpointsNode(Index inode).

 4.  TKDtree implementation details - internal information, not needed to use the kd-tree.
     4a. Order of nodes in the node information arrays:

 TKDtree is optimized to minimize memory consumption.
 Nodes of the TKDTree do not store pointers to the left and right children or to the parent node,
 but instead there are several 1-d arrays of size fNNodes with information about the nodes.
 The order of the nodes information in the arrays is described below. It's important to understand
 it, if one's class needs to store some kind of additional information on the per node basis, for
 example, the fit function parameters.

 Drawback:   Insertion to the TKDtree is not supported.
 Advantage:  Random access is supported

 As noted above, the construction of the kd-tree involves choosing the axis and the point on
 that axis to divide the remaining points approximately in half. The exact algorithm for choosing
 the division point is described in the next section. The sequence of divisions is
 recorded in the following arrays:
 fAxix[fNNodes]  - Division axis (0,1,2,3 ...)
 fValue[fNNodes] - Division value

 Given the index of a node in those arrays, it's easy to find the indices, corresponding to
 children nodes or the parent node:
 Suppose, the parent node is stored under the index inode. Then:
 Left child index  = inode*2+1
 Right child index =  (inode+1)*2
 Suppose, that the child node is stored under the index inode. Then:
 Parent index = inode/2

 Number of division nodes and number of terminals :
 fNNodes = (fNPoints/fBucketSize)

 The nodes are filled always from left side to the right side:
 Let inode be the index of a node, and irow - the index of a row
 The TKDTree looks the following way:
 Ideal case:
 Number of _terminal_ nodes = 2^N,  N=3

            INode
 irow 0     0                                                                   -  1 inode
 irow 1     1                              2                                    -  2 inodes
 irow 2     3              4               5               6                    -  4 inodes
 irow 3     7       8      9      10       11     12       13      14           -  8 inodes


 Non ideal case:
 Number of _terminal_ nodes = 2^N+k,  N=3  k=1

           INode
 irow 0     0                                                                   - 1 inode
 irow 1     1                              2                                    - 2 inodes
 irow 2     3              4               5               6                    - 3 inodes
 irow 3     7       8      9      10       11     12       13      14           - 8 inodes
 irow 4     15  16                                                              - 2 inodes


 3b. The division algorithm:

 As described above, the kd-tree is built by repeatingly dividing the given set of points into
 2 smaller sets. The cut is made on the axis with the biggest spread, and the value on the axis,
 on which the cut is performed, is chosen based on the following formula:
 Suppose, we want to divide n nodes into 2 groups, left and right. Then the left and right
 will have the following number of nodes:

 n=2^k+rest

 Left  = 2^k-1 +  ((rest>2^k-2) ?  2^k-2      : rest)
 Right = 2^k-1 +  ((rest>2^k-2) ?  rest-2^k-2 : 0)

 For example, let n_nodes=67. Then, the closest 2^k=64, 2^k-1=32, 2^k-2=16.
 Left node gets 32+3=35 sub-nodes, and the right node gets 32 sub-nodes

 The division process continues until all the nodes contain not more than a predefined number
 of points.

 3c. The order of nodes in boundary-related arrays

 Some kd-tree based algorithms need to know the boundaries of each node. This information can
 be computed by calling the TKDTree::MakeBoundaries() function. It fills the following arrays:

 fRange : array containing the boundaries of the domain:
 | 1st dimension (min + max) | 2nd dimension (min + max) | ...
 fBoundaries : nodes boundaries
 | 1st node {1st dim * 2 elements | 2nd dim * 2 elements | ...} | 2nd node {...} | ...
 The nodes are arranged in the order described in section 3a.


 Note: the storage of the TKDTree in a file which include also the contained data is not
       supported. One must store the data separatly in a file (e.g. using a TTree) and then
       re-creating the TKDTree from the data, after having read them from the file

This class is also known as (typedefs to this class)

TKDTreeIF, TKDTree<Int_t,Float_t>

Function Members (Methods)

public:
TKDTree<int,float>()
TKDTree<int,float>(int npoints, int ndim, UInt_t bsize)
TKDTree<int,float>(int npoints, int ndim, UInt_t bsize, float** data)
virtual~TKDTree<int,float>()
voidTObject::AbstractMethod(const char* method) const
virtual voidTObject::AppendPad(Option_t* option = "")
virtual voidTObject::Browse(TBrowser* b)
voidBuild()
static TClass*Class()
virtual const char*TObject::ClassName() const
virtual voidTObject::Clear(Option_t* = "")
virtual TObject*TObject::Clone(const char* newname = "") const
virtual Int_tTObject::Compare(const TObject* obj) const
virtual voidTObject::Copy(TObject& object) const
virtual voidTObject::Delete(Option_t* option = "")MENU
Double_tDistance(const float* point, int ind, Int_t type = 2) const
voidDistanceToNode(const float* point, int inode, float& min, float& max, Int_t type = 2)
virtual Int_tTObject::DistancetoPrimitive(Int_t px, Int_t py)
virtual voidTObject::Draw(Option_t* option = "")
virtual voidTObject::DrawClass() constMENU
virtual TObject*TObject::DrawClone(Option_t* option = "") constMENU
virtual voidTObject::Dump() constMENU
virtual voidTObject::Error(const char* method, const char* msgfmt) const
virtual voidTObject::Execute(const char* method, const char* params, Int_t* error = 0)
virtual voidTObject::Execute(TMethod* method, TObjArray* params, Int_t* error = 0)
virtual voidTObject::ExecuteEvent(Int_t event, Int_t px, Int_t py)
virtual voidTObject::Fatal(const char* method, const char* msgfmt) const
voidFindBNodeA(float* point, float* delta, Int_t& inode)
voidFindInRange(float* point, float range, vector<int>& res)
voidFindNearestNeighbors(const float* point, Int_t k, int* ind, float* dist)
intFindNode(const float* point) const
virtual TObject*TObject::FindObject(const char* name) const
virtual TObject*TObject::FindObject(const TObject* obj) const
voidFindPoint(float* point, int& index, Int_t& iter)
float*GetBoundaries()
float*GetBoundariesExact()
float*GetBoundary(const Int_t node)
float*GetBoundaryExact(const Int_t node)
intGetBucketSize()
Int_tGetCrossNode()
virtual Option_t*TObject::GetDrawOption() const
static Long_tTObject::GetDtorOnly()
virtual const char*TObject::GetIconName() const
int*GetIndPoints()
Int_tGetLeft(Int_t inode) const
virtual const char*TObject::GetName() const
intGetNDim()
Int_tGetNNodes() const
UChar_tGetNodeAxis(Int_t id) const
voidGetNodePointsIndexes(Int_t node, Int_t& first1, Int_t& last1, Int_t& first2, Int_t& last2) const
floatGetNodeValue(Int_t id) const
intGetNPoints()
intGetNPointsNode(Int_t node) const
virtual char*TObject::GetObjectInfo(Int_t px, Int_t py) const
static Bool_tTObject::GetObjectStat()
Int_tGetOffset()
virtual Option_t*TObject::GetOption() const
Int_tGetParent(Int_t inode) const
int*GetPointsIndexes(Int_t node) const
Int_tGetRight(Int_t inode) const
Int_tGetRowT0()
virtual const char*TObject::GetTitle() const
Int_tGetTotalNodes() const
virtual UInt_tTObject::GetUniqueID() const
virtual Bool_tTObject::HandleTimer(TTimer* timer)
virtual ULong_tTObject::Hash() const
virtual voidTObject::Info(const char* method, const char* msgfmt) const
virtual Bool_tTObject::InheritsFrom(const char* classname) const
virtual Bool_tTObject::InheritsFrom(const TClass* cl) const
virtual voidTObject::Inspect() constMENU
voidTObject::InvertBit(UInt_t f)
virtual TClass*IsA() const
virtual Bool_tTObject::IsEqual(const TObject* obj) const
virtual Bool_tTObject::IsFolder() const
Bool_tTObject::IsOnHeap() const
Int_tIsOwner()
virtual Bool_tTObject::IsSortable() const
Bool_tIsTerminal(int inode) const
Bool_tTObject::IsZombie() const
floatKOrdStat(int ntotal, float* a, int k, int* index) const
virtual voidTObject::ls(Option_t* option = "") const
voidMakeBoundaries(float* range = 0x0)
voidMakeBoundariesExact()
voidTObject::MayNotUse(const char* method) const
virtual Bool_tTObject::Notify()
voidTObject::Obsolete(const char* method, const char* asOfVers, const char* removedFromVers) const
static voidTObject::operator delete(void* ptr)
static voidTObject::operator delete(void* ptr, void* vp)
static voidTObject::operator delete[](void* ptr)
static voidTObject::operator delete[](void* ptr, void* vp)
void*TObject::operator new(size_t sz)
void*TObject::operator new(size_t sz, void* vp)
void*TObject::operator new[](size_t sz)
void*TObject::operator new[](size_t sz, void* vp)
virtual voidTObject::Paint(Option_t* option = "")
virtual voidTObject::Pop()
virtual voidTObject::Print(Option_t* option = "") const
virtual Int_tTObject::Read(const char* name)
virtual voidTObject::RecursiveRemove(TObject* obj)
voidTObject::ResetBit(UInt_t f)
virtual voidTObject::SaveAs(const char* filename = "", Option_t* option = "") constMENU
virtual voidTObject::SavePrimitive(ostream& out, Option_t* option = "")
voidTObject::SetBit(UInt_t f)
voidTObject::SetBit(UInt_t f, Bool_t set)
Int_tSetData(int idim, float* data)
voidSetData(int npoints, int ndim, UInt_t bsize, float** data)
virtual voidTObject::SetDrawOption(Option_t* option = "")MENU
static voidTObject::SetDtorOnly(void* obj)
static voidTObject::SetObjectStat(Bool_t stat)
voidSetOwner(Int_t owner)
virtual voidTObject::SetUniqueID(UInt_t uid)
virtual voidShowMembers(TMemberInspector&)
voidSpread(int ntotal, float* a, int* index, float& min, float& max) const
virtual voidStreamer(TBuffer&)
voidStreamerNVirtual(TBuffer& ClassDef_StreamerNVirtual_b)
virtual voidTObject::SysError(const char* method, const char* msgfmt) const
Bool_tTObject::TestBit(UInt_t f) const
Int_tTObject::TestBits(UInt_t f) const
virtual voidTObject::UseCurrentStyle()
virtual voidTObject::Warning(const char* method, const char* msgfmt) const
virtual Int_tTObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0)
virtual Int_tTObject::Write(const char* name = 0, Int_t option = 0, Int_t bufsize = 0) const
protected:
virtual voidTObject::DoError(int level, const char* location, const char* fmt, va_list va) const
voidTObject::MakeZombie()
private:
TKDTree<int,float>(const TKDTree<int,float>&)
voidCookBoundaries(const Int_t node, Bool_t left)
TKDTree<int,float>&operator=(const TKDTree<int,float>&)
voidUpdateNearestNeighbors(int inode, const float* point, Int_t kNN, int* ind, float* dist)
voidUpdateRange(int inode, float* point, float range, vector<int>& res)

Data Members

protected:
UChar_t*fAxis[fNNodes] nodes cutting axis
float*fBoundaries! nodes boundaries
intfBucketSizesize of the terminal nodes
Int_tfCrossNode! cross node - node that begins the last row (with terminal nodes only)
float**fData! data points
Int_tfDataOwner! 0 - not owner, 2 - owner of the pointer array, 1 - owner of the whole 2-d array
int*fIndPoints! array of points indexes
intfNDimnumber of dimensions
intfNDimmdummy 2*fNDim
Int_tfNNodessize of node array
intfNPointsnumber of multidimensional points
Int_tfOffset! offset in fIndPoints - if there are 2 rows, that contain terminal nodes
float*fRange[fNDimm] range of data for each dimension
Int_tfRowT0! smallest terminal row - first row that contains terminal nodes
Int_tfTotalNodestotal number of nodes (fNNodes + terminal nodes)
float*fValue[fNNodes] nodes cutting value

Class Charts

Inheritance Inherited Members Includes Libraries
Class Charts

Function documentation

void TKDTree<Index, Value> Build()
 Build the kd-tree

 1. calculate number of nodes
 2. calculate first terminal row
 3. initialize index array
 4. non recursive building of the binary tree


 The tree is divided recursively. See class description, section 4b for the details
 of the division alogrithm
void TKDTree<Index, Value> FindNearestNeighbors(const float* point, Int_t k, int* ind, float* dist)
Find kNN nearest neighbors to the point in the first argument
Returns 1 on success, 0 on failure
Arrays ind and dist are provided by the user and are assumed to be at least kNN elements long
void TKDTree<Index, Value> UpdateNearestNeighbors(int inode, const float* point, Int_t kNN, int* ind, float* dist)
Update the nearest neighbors values by examining the node inode
Double_t TKDTree<Index, Value> Distance(const float* point, int ind, Int_t type = 2) const
Find the distance between point of the first argument and the point at index value ind
Type argument specifies the metric: type=2 - L2 metric, type=1 - L1 metric
void TKDTree<Index, Value> DistanceToNode(const float* point, int inode, float& min, float& max, Int_t type = 2)
Find the minimal and maximal distance from a given point to a given node.
Type argument specifies the metric: type=2 - L2 metric, type=1 - L1 metric
If the point is inside the node, both min and max are set to 0.
Index TKDTree<Index, Value> FindNode(const float* point) const
 returns the index of the terminal node to which point belongs
 (index in the fAxis, fValue, etc arrays)
 returns -1 in case of failure
void TKDTree<Index, Value> FindPoint(float* point, int& index, Int_t& iter)
 find the index of point
 works only if we keep fData pointers
void TKDTree<Index, Value> FindInRange(float* point, float range, vector<int>& res)
Find all points in the sphere of a given radius "range" around the given point
1st argument - the point
2nd argument - radius of the shere
3rd argument - a vector, in which the results will be returned
void TKDTree<Index, Value> UpdateRange(int inode, float* point, float range, vector<int>& res)
Internal recursive function with the implementation of range searches
Index* TKDTree<Index, Value> GetPointsIndexes(Int_t node) const
return the indices of the points in that terminal node
for all the nodes except last, the size is fBucketSize
for the last node it's fOffset%fBucketSize
void TKDTree<Index, Value> GetNodePointsIndexes(Int_t node, Int_t& first1, Int_t& last1, Int_t& first2, Int_t& last2) const
Return the indices of points in that node
Indices are returned as the first and last value of the part of indices array, that belong to this node
Sometimes points are in 2 intervals, then the first and last value for the second one are returned in
third and fourth parameter, otherwise first2 is set to 0 and last2 is set to -1
To iterate over all the points of the node #inode, one can do, for example:
Index *indices = kdtree->GetPointsIndexes();
Int_t first1, last1, first2, last2;
kdtree->GetPointsIndexes(inode, first1, last1, first2, last2);
for (Int_t ipoint=first1; ipoint<=last1; ipoint++){
   point = indices[ipoint];
   //do something with point;
}
for (Int_t ipoint=first2; ipoint<=last2; ipoint++){
   point = indices[ipoint];
   //do something with point;
}
Index TKDTree<Index, Value> GetNPointsNode(Int_t node) const
Get number of points in this node
for all the terminal nodes except last, the size is fBucketSize
for the last node it's fOffset%fBucketSize, or if fOffset%fBucketSize==0, it's also fBucketSize
void TKDTree<Index, Value> SetData(int npoints, int ndim, UInt_t bsize, float** data)
 Set the data array. See the constructor function comments for details
Int_t TKDTree<Index, Value> SetData(int idim, float* data)
Set the coordinate #ndim of all points (the column #ndim of the data matrix)
After setting all the data columns, proceed by calling Build() function
Note, that calling this function after Build() is not possible
Note also, that no checks on the array sizes is performed anywhere
void TKDTree<Index, Value> Spread(int ntotal, float* a, int* index, float& min, float& max) const
Calculate spread of the array a
Value TKDTree<Index, Value> KOrdStat(int ntotal, float* a, int k, int* index) const
copy of the TMath::KOrdStat because I need an Index work array
void TKDTree<Index, Value> MakeBoundaries(float* range = 0x0)
 Build boundaries for each node. Note, that the boundaries here are built
 based on the splitting planes of the kd-tree, and don't necessarily pass
 through the points of the original dataset. For the latter functionality
 see function MakeBoundariesExact()
 Boundaries can be retrieved by calling GetBoundary(inode) function that would
 return an array of boundaries for the specified node, or GetBoundaries() function
 that would return the complete array.
void TKDTree<Index, Value> CookBoundaries(const Int_t node, Bool_t left)
 define index of this terminal node
void TKDTree<Index, Value> MakeBoundariesExact()
 Build boundaries for each node. Unlike MakeBoundaries() function
 the boundaries built here always pass through a point of the original dataset
 So, for example, for a terminal node with just one point minimum and maximum for each
 dimension are the same.
 Boundaries can be retrieved by calling GetBoundaryExact(inode) function that would
 return an array of boundaries for the specified node, or GetBoundaries() function
 that would return the complete array.
void TKDTree<Index, Value> FindBNodeA(float* point, float* delta, Int_t& inode)
 find the smallest node covering the full range - start

Value* TKDTree<Index, Value> GetBoundaries()
 Get the boundaries
Value* TKDTree<Index, Value> GetBoundariesExact()
 Get the boundaries
Value* TKDTree<Index, Value> GetBoundary(const Int_t node)
 Get a boundary
Value* TKDTree<Index, Value> GetBoundaryExact(const Int_t node)
 Get a boundary
Int_t GetLeft(Int_t inode) const
 Get indexes of left and right daughter nodes
{return inode*2+1;}
Int_t GetRight(Int_t inode) const
{return (inode+1)*2;}
Int_t GetParent(Int_t inode) const
 Other getters
{return (inode-1)/2;}
UChar_t GetNodeAxis(Int_t id) const
{return (id < 0 || id >= fNNodes) ? 0 : fAxis[id];}
Value GetNodeValue(Int_t id) const
{return (id < 0 || id >= fNNodes) ? 0 : fValue[id];}
Int_t GetNNodes() const
{return fNNodes;}
Int_t GetTotalNodes() const
{return fTotalNodes;}
Index GetNPoints()
{ return fNPoints; }
Index GetNDim()
{ return fNDim; }
Int_t GetRowT0()
Getters for internal variables.
{return fRowT0;}
Int_t GetCrossNode()
{return fCrossNode;}
Int_t GetOffset()
{return fOffset;}
Index* GetIndPoints()
{return fIndPoints;}
Index GetBucketSize()
{return fBucketSize;}
Bool_t IsTerminal(int inode) const
{return (inode>=fNNodes);}
Int_t IsOwner()
{ return fDataOwner; }
void SetOwner(Int_t owner)
{ fDataOwner = owner; }
TKDTree<Index, Value>& operator=(const TKDTree<int,float>& )