Optimized hardware accelerators for data mining applications

dc.contributor.authorKanan, Awos
dc.contributor.supervisorGebali, Fayez
dc.contributor.supervisorIbrahim, Atef
dc.date.accessioned2018-02-19T20:27:12Z
dc.date.available2018-02-19T20:27:12Z
dc.date.copyright2018en_US
dc.date.issued2018-02-19
dc.degree.departmentDepartment of Electrical and Computer Engineeringen_US
dc.degree.levelDoctor of Philosophy Ph.D.en_US
dc.description.abstractData mining plays an important role in a variety of fields including bioinformatics, multimedia, business intelligence, marketing, and medical diagnosis. Analysis of today’s huge and complex data involves several data mining algorithms including clustering and classification. The computational complexity of machine learning and data mining algorithms, that are frequently used in today’s applications such as embedded systems, makes the design of efficient hardware architectures for these algorithms a challenging issue for the development of such systems. The aim of this work is to optimize the performance of hardware acceleration for data mining applications in terms of speed and area. Most of the previous accelerator architectures proposed in the literature have been obtained using ad hoc techniques that do not allow for design space exploration, some did not consider the size (number of samples) and dimensionality (number of features in each sample) of the datasets. To obtain practical architectures that are amenable for hardware implementation, size and dimensionality of input datasets are taken into consideration in this work. For one-dimensional data, algorithm-level optimizations are investigated to design a fast and area-efficient hardware accelerator for clustering one-dimensional datasets using the well-known K-Means clustering algorithm. Experimental results show that the optimizations adopted in the proposed architecture result in faster convergence of the algorithm using less hardware resources while maintaining the quality of clustering results. The computation of similarity distance matrices is one of the computational kernels that are generally required by several machine learning and data mining algorithms to measure the degree of similarity between data samples. For these algorithms, distance calculation is considered a computationally intensive task that accounts for a significant portion of the processing time. A systematic methodology is presented to explore the design space of 2-D and 1-D processor array architectures for similarity distance computation involved in processing datasets of different sizes and dimensions. Six 2-D and six 1-D processor array architectures are developed systematically using linear scheduling and projection operations. The obtained architectures are classified based on the size and dimensionality of input datasets, analyzed in terms of speed and area, and compared with previous architectures in the literature. Motivated by the necessity to accommodate large-scale and high-dimensional data, nonlinear scheduling and projection operations are finally introduced to design a scalable processor array architecture for the computation of similarity distance matrices. Implementation results of the proposed architecture show improved compromise between area and speed. Moreover, it scales better for large and high-dimensional datasets since the architecture is fully parameterized and only has to deal with one data dimension in each time step.en_US
dc.description.embargo2019-12-31en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/9079
dc.languageEnglisheng
dc.language.isoenen_US
dc.rightsAvailable to the World Wide Weben_US
dc.subjectData Miningen_US
dc.subjectParallel Algorithmsen_US
dc.subjectHardware Accelerationen_US
dc.subjectSystolic Arraysen_US
dc.subjectDesign Methodologyen_US
dc.titleOptimized hardware accelerators for data mining applicationsen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Kanan_Awos_PhD_2018.pdf
Size:
2.13 MB
Format:
Adobe Portable Document Format
Description:
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: