New efficient block-based motion estimation algorithms for video compression and their hardware implementations




Rehan, Mohamed Mohamed

Journal Title

Journal ISSN

Volume Title



Video compression technology aims at compressing large amount of video data for efficient transmission and storage without significant loss of quality. Most video compression techniques rely on removing temporal data redundancy between frames using motion estimation and motion compensation techniques which are generally very computationally expensive. The objective of the research done in this thesis is to develop new efficient motion estimation techniques that reduce the computational complexity of motion estimation. The thesis presents a new prediction technique referred to as weighted sum block matching (WSBM) which dynamically reduces the computational complexity by limiting the search to a small subset of the search area. Simulation results have shown that adding WSBM to some well-known search algorithms reduces their computational complexity by 6-1.5 without affecting the visual quality of the reconstructed video frames. The thesis also presents two new algorithms based on the simplex optimization method. the simplex based block matching algorithm (SMPLX) and the flexible triangle search (FTS). Both techniques use a triangle that moves inside the search area and checks only positions that lie at its vertices. As a result the computational complexity of the search is reduced since it depends directly on the number of positions checked. The techniques can change the size and orientation of the search triangle during the search. The changes make the search highly flexible and efficient and reduce the number of search positions to be checked compared to those in other search algorithms. The SMPLX uses equations based on the simplex optimization method to compute the new triangle size and orientation. The FTS, on the other hand, was implemented to be more suitable for a digital search grid by using look-up tables and integer computations. The two algorithms were implemented as part of the H.263 and H.264 encoders. Both algorithms were compared to the state of the art motion search algorithms. Experimental results showed that both algorithms can reach sub-optimal solutions while checking fewer search positions compared to other algorithms which results in lower computational complexity as a consequence. Additional research was done to analyze and further improve FTS performance. As a result, various extensions of the FTS have been developed such as the enhanced FTS (EFTS), the half-pixel FTS (HP-FTS). and the predictive FTS (PETS). These extensions were also implemented as part of the H.263 and H.264 encoders. In the EFTS. repeated computations are reduced by caching intermediate results. In addition. the termination condition is modified to avoid premature exit. These modifications reduce the computational complexity of the FTS by up to 4%%. The HP-FTS extended the FTS so that the search can be done at half-pixel resolution instead of full-pixel resolution. The commonly used approach for half-pixel search is based on two separate stages. i.e., full-pixel search followed by half-pixel search. By combining the two stages in HP-FTS. the overall computational complexity can be reduced by an average of 13% without affecting the produced quality or compression ratio. The PETS uses prediction to select the direction of the starting search triangle. Analysis results show that the proper selection of the starting search triangle has great effect on the performance of the FTS. Simulation results show that the PFTS can reduce the computational complexity of the FTS by 7-13%. Finally, hardware designs for the FTS and the full search (FS) algorithms are proposed. The FS was chosen due to its regularity, low control overhead, and suitability for hardware implementation. It uses a high degree of parallelism and pipelining in order to improve the computational efficiency. The FTS requires less computation and thus provides high processing rates. Both designs were implemented, simulated, and verified using VHDL and then synthesized with Xilinx FPGAs. Simulation results have shown that both hard-ware implementations are more efficient than other existing implementations in terms of performance and hardware usage.



video compression