Exploiting Massive Parallelism for Scientific Datasets on the GPUs




  • Multi-dimensional indexing on the GPU

  • Inherently multi-dimensional n-ary indexing structures such as R-trees are not well suited for the GPU because of their irregular memory access patterns and recursive back-tracking function calls. It has been known that traversing hierarchical tree structures in an irregular manner makes it difficult to exploit parallelism and to maximize the utilization of GPU processing units. Moreover, the recursive tree search algorithms often fail with large indexes because of the GPU's tiny runtime stack size.

    In our work, we propose a novel parallel tree traversal algorithm - Massively Parallel Restart Scanning (MPRS) for multi-dimensional range queries that avoids recursion and irregular memory access. The proposed MPRS algorithm traverses hierarchical tree structures with mostly contiguous memory access patterns without recursion, which offers more chances to optimize the parallel SIMD algorithm.

    We implemented the proposed MPRS range query processing algorithm on n-ary bounding volume hierarchies including R-trees and evaluate its performance using real scientific datasets on NVIDIA Tesla M2090 GPU. Our experiments show braided parallel MPRS range query algorithm achieves at least 80% SIMD efficiency while task parallel tree traversal algorithm shows only 9%-15% SIMD efficiency. Moreover, braided parallel MPRS algorithm accesses 7~20 times less amount of global memory than task parallel parent link algorithm by virtue of minimal warp divergence.




  • High Dimensional Nearest Neighbor Query on the GPU(To be updated)