GPU-based lock-free mesh reduction using deterministic vertex clustering
Date
2026
Authors
Hanif, Khizra
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The growing demand for real-time rendering and processing of complex 3D models in film production, gaming, and scientific visualization has exposed the limitations of CPU-based mesh simplification techniques. While S-Weld ensures deterministic clustering, its sequential execution makes it unsuitable for large-scale meshes. The multi-core P-Weld improves performance through lock-free multi-threading but remains constrained by CPU core count and memory bandwidth.
To address these issues, this thesis presents a GPU-accelerated vertex clustering framework that extends the deterministic behavior of P-Weld to a fully parallel and memory-adaptive GPU architecture. The work begins with a direct CUDA port of P-Weld and introduces an On-the-fly neighbor evaluation method that performs clustering without storing explicit adjacency lists, followed by a fully GPU-resident sparse voxel-grid framework for efficient processing of large meshes. Shared-memory caching, warp-synchronous centroid updates, and sparse neighbor filtering are used to minimize redundant computations and improve parallel efficiency.
Existing GPU-based libraries, such as FRNN and cuNSearch, were evaluated but found unsuitable for large irregular meshes on consumer laptops. A custom sparse grid-based neighbor search was therefore developed to perform efficient ε-neighborhood queries entirely on the GPU within limited VRAM. The proposed pipeline was evaluated on five benchmark meshes, including Bunny, Lucy, Thai Statue, Manuscript, and the point cloud LiDAR dataset using various clustering thresholds. The GPU versions achieved identical results while providing a 10-26× speedup, improving the scalability of vertex clustering for large 3D mesh simplification.
In summary, this thesis presents the first deterministic GPU extension of vertex clustering, introducing a memory-efficient sparse-grid neighbor search and a fully parallel pipeline that reproduces accurate results, advancing scalable and reproducible mesh simplification.