Experimental Setup. All runtime experiments were conducted on a standard desktop computer with an Intel(R) Core(TM) i7-4790K CPU running at 4.00 GHz (4 cores; 8 hard- ware threads), 32 GB RAM, and two Nvidia GeForce Titan Z GPUs (each consisting of two devices with 2880 shader units and 6 GB main memory) using single precision. The operating system was Ubuntu 14.4.3 LTS (64 Bit) with kernel 3.13.0-52, CUDA 7.0.65 (graphics driver 340.76), and OpenCL 1.2. All algorithms were implemented in C and OpenCL, where Swig was used to obtain appropriate Python interfaces.5 The code was compiled using gcc-4.8.4 at opti- mization level -O3. For the experimental evaluation, we report runtimes for both the construction and the query phase (referred to as “train” and “test” phases), where the focus is on the latter one (that makes use of the GPUs). We consider the following three implementations: (1) bufferkdtree(i): The adapted buffer k-d tree implementation with both FindLeafBatch and ProcessAllBuffers being conducted on i GPUs. (2) kdtree(i): A multi-core implementation of a k-d tree-based search, which runs i threads in parallel on the CPU (each handling a single query). (3) brute(i): A brute-force implementation that makes use of i GPUs to pro- cess the queries in a massively-parallel manner. The parameters for the buffer k-d tree implementation were fixed to appro- priate values.6 Note that both competitors of bufferkdtree have been evaluated extensively in the literature; the reported runtimes and speed-ups can thus be put in a broad context. For simplicity, we fix the number k of nearest neighbors to k = 10 for all experiments. We focus on several data-intensive tasks from the field of astronomy. Note that a similar runtime behavior can be observed on data sets from other domains as well as long as the dimensionality of the search space is moderate (e.g., from d = 5 to d = 30). We follow our previous work and consider the psf mag, psf model mag, and all mag data sets of dimensionality d = 5, d = 10, and d = 15, respectively; for a description, we refer to ▇▇▇▇▇▇▇ et al. [8]. In addition, we consider a new dataset derived from the Catalina Realtime Transient Survey 5 The code is publicly available under ▇▇▇▇▇://▇▇▇▇▇▇.▇▇▇/▇▇▇▇▇▇▇/bufferkdtree. 6 For a tree of height h, we fixed B = 224−h and the number M of indices fetched from input and reinsert in each iteration of Algorithm 1 to M = 10 · B.
Appears in 1 contract
Sources: End User Agreement
Experimental Setup. All runtime experiments were conducted on a standard desktop computer with an Intel(R) Core(TM) i7-4790K CPU running at 4.00 GHz (4 cores; 8 hard- ware threads), 32 GB RAM, and two Nvidia GeForce Titan Z GPUs (each consisting of two devices with 2880 shader units and 6 GB main memory) using single precision. The operating system was Ubuntu 14.4.3 LTS (64 Bit) with kernel 3.13.0-52, CUDA 7.0.65 (graphics driver 340.76), and OpenCL 1.2. All algorithms were implemented in C and OpenCL, where Swig was used to obtain appropriate Python interfaces.5 The code was compiled using gcc-4.8.4 at opti- mization level -O3. For the experimental evaluation, we report runtimes for both the construction and the query phase (referred to as “train” and “test” phases), where the focus is on the latter one (that makes use of the GPUs). We consider the following three implementations:
(1) bufferkdtree(i): The adapted buffer buffer k-d tree implementation with both FindLeafBatch and ProcessAllBuffers being conducted on i GPUs.
(2) kdtree(i): A multi-core implementation of a k-d tree-based search, which runs i threads in parallel on the CPU (each handling a single query).
(3) brute(i): A brute-force implementation that makes use of i GPUs to pro- cess the queries in a massively-parallel manner. The parameters for the buffer buffer k-d tree implementation were fixed fixed to appro- priate values.6 Note that both competitors of bufferkdtree have been evaluated extensively in the literature; the reported runtimes and speed-ups can thus be put in a broad context. For simplicity, we fix fix the number k of nearest neighbors to k = 10 for all experiments. We focus on several data-intensive tasks from the field field of astronomy. Note that a similar runtime behavior can be observed on data sets from other domains as well as long as the dimensionality of the search space is moderate (e.g., from d = 5 to d = 30). We follow our previous work and consider the psf mag, psf model mag, and all mag data sets of dimensionality d = 5, d = 10, and d = 15, respectively; for a description, we refer to ▇▇▇▇▇▇▇ et al. [8]. In addition, we consider a new dataset derived from the Catalina Realtime Transient Survey 5 The code is publicly available under ▇▇▇▇▇://▇▇▇▇▇▇.▇▇▇/▇▇▇▇▇▇▇/bufferkdtree/bufferkdtree. 6 For a tree of height h, we fixed fixed B = 224−h and the number M of indices fetched from input and reinsert in each iteration of Algorithm 1 to M = 10 · B.
Appears in 1 contract
Sources: End User Agreement