Improving Image Matching using an Ensemble of Local Descriptors and Hardware Design
Date
2023-04-12
Authors
Ghaffari, Sina
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Image matching is one of the fundamental problems in computer vision, and has many applications such as object recognition, structure from motion, and 3D reconstruction. In this work, we aim to accelerate image matching algorithms, and improve their accuracy by proposing approaches that have a minimal impact on speed. With this in mind, we focus on handcrafted descriptor and matching algorithms. Our contributions in this dissertation are twofold.
The first set of contributions are related to the acceleration of descriptor algorithms and the reduction of resource utilization for image matching by proposing novel circuits based on Field Programmable Gate Arrays (FPGAs). We use FPGAs as a platform due to their features such as parallel processing, low-power computing, and flexibility in design.
This work presents a comprehensive analysis of FPGA-based implementations of the Histogram of Oriented Gradients (HOG) algorithm. A novel hardware-software co-design of the HOG algorithm is introduced to accelerate the execution of this descriptor algorithm. We propose methods such as logarithm-based bin assignment, approximate normalization, and a time-sharing protocol for sequential histogram generation for increasing the speed. A novel task allocation to optimize resource utilization on the hardware platform, in addition to acceleration of the HOG algorithm, is also presented.
Next, we focus on binary descriptors and present a novel hardware implementation of the Binary Robust Invariant Scalable Keypoints (BRISK) algorithm. BRISK is faster than non-binary descriptors but is computationally expensive with respect to other binary descriptor algorithms. A new sampling pattern for the BRISK algorithm is proposed to facilitate the hardware implementation of BRISK in multiple scales. Our proposed design reduces FPGA resource utilization while maintaining the image matching accuracy. Furthermore, the proposed fully pipelined design achieves a frame rate of 78 fps on images with full HD resolution.
The second set of contributions is related to improving image matching accuracy while maintaining performance in terms of computations. For this purpose, the focus is on handcrafted descriptor algorithms which are known to be more computationally efficient than deep learning based algorithms. We analyze and propose fusion of descriptor algorithms which extracts complementary information to attain higher accuracy. To this end, three ensemble methods are proposed. The first method (weighted-fusion) combines a non-binary and a binary descriptor using their weighted distance metrics. The second method (binary fusion) combines a non-binary and a binary descriptor by converting the non-binary descriptor to a binary descriptor using a learned threshold. The third method (non-binary fusion) combines a non-binary and a binary descriptor by transforming the binary descriptor to a non-binary descriptor using a learned scaling factor. Comprehensive experiments on benchmarks from the HPatches, Brown (Photo tourism) and Oxford Affine Covariant Regions datasets are provided. The experimental results and analysis demonstrate a higher mean Average Precision (mAP) of the fusion methods in comparison with the baseline algorithms.
The next contribution for accuracy improvement is adding convolutional neural network (CNN) prefiltering to images prior to keypoint detection. The addition of a shallow CNN as the first step of a handcrafted algorithm to improve accuracy is proposed. The CNN is trained to filter the raw input images to achieve higher mAP. Experimental results indicate an improvement of accuracy using this method on the HPatches dataset.
Finally, we demonstrate our proposed approaches on a practical application. This application is relevant to environmental research by providing the basis for a tool for the automated identification of wildlife in tracking habitat (in this case, badgers). The proposed methods outperform the commonly-used handcrafted algorithms on identifying individual badgers in multiple images using their facial characteristics. This is done without fine-tuning the algorithms on the target badger identification dataset, which shows the generality of our proposed methods.
Description
Keywords
image matching, FPGA, BRISK, SIFT, HOG, Histogram of Oriented Gradients, Hill Climbing, badger identification, hardware implementation, hardware software co-design, Field Programmable Gate Arrays, Computer Vision, Image processing, Local Descriptor, Keypoint, fusion