Distributed high-dimensional similarity search with music information retrieval applications

dc.contributor.authorFaghfouri, Aidin
dc.contributor.supervisorPan, Jianping
dc.date.accessioned2011-08-29T20:30:49Z
dc.date.available2011-08-29T20:30:49Z
dc.date.copyright2011en_US
dc.date.issued2011-08-29
dc.degree.departmentDept. of Computer Scienceen_US
dc.degree.levelMaster of Science M.Sc.en_US
dc.description.abstractToday, the advent of networking technologies and computer hardware have enabled more and more inexpensive PCs, various mobile devices, smart phones, PDAs, sensors and cameras to be linked to the Internet with better connectivity. In recent years, we have witnessed the emergence of several instances of distributed applications, providing infrastructures for social interactions over large-scale wide-area networks and facilitating the ways users share and publish data. User generated data today range from simple text files to (semi-) structured documents and multimedia content. With the emergence of Semantic Web, the number of features (associated with a content) that are used in order to index those large amounts of heterogenous pieces of data is growing dramatically. The feature sets associated with each content type can grow continuously as we discover new ways of describing a content in formulated terms. As the number of dimensions in the feature data grow (as high as 100 to 1000), it becomes harder and harder to search for information in a dataset due to the curse of dimensionality and it is not appropriate to use naive search methods, as their performance degrade to linear search. As an alternative, we can distribute the content and the query processing load to a set of peers in a distributed Peer-to-Peer (P2P) network and incorporate high-dimensional distributed search techniques to attack the problem. Currently, a large percentage of Internet traffic consists of video and music files shared and exchanged over P2P networks. In most present services, searching for music is performed through keyword search and naive string-matching algorithms using collaborative filtering techniques which mostly use tag based approaches. In music information retrieval (MIR) systems, the main goal is to make recommendations similar to the music that the user listens to. In these systems, techniques based on acoustic feature extraction can be employed to achieve content-based music similarity search (i.e., searching through music based on what can be heard from the music track). Using these techniques we can devise an automated measure of similarity that can replace the need for human experts (or users) who assign descriptive genre tags and meta-data to each recording and solve the famous cold-start problem associated with the collaborative filtering techniques. In this work we explore the advantages of distributed structures by efficiently distributing the content features and query processing load on the peers in a P2P network. Using a family of Locality Sensitive Hash (LSH) functions based on p-stable distributions we propose an efficient, scalable and load-balanced system, capable of performing K-Nearest-Neighbor (KNN) and Range queries. We also propose a new load-balanced indexing algorithm and evaluate it using our Java based simulator. Our results show that this P2P design ensures load-balancing and guarantees logarithmic number of hops for query processing. Our system is extensible to be used with all types of multi-dimensional feature data and it can also be employed as the main indexing scheme of a multipurpose recommendation system.en_US
dc.description.scholarlevelGraduateen_US
dc.identifier.urihttp://hdl.handle.net/1828/3515
dc.languageEnglisheng
dc.language.isoenen_US
dc.rights.tempAvailable to the World Wide Weben_US
dc.subjectsimilarity searchen_US
dc.subjectmusicen_US
dc.subjectpeer-to-peeren_US
dc.subjectdistributed similarity searchen_US
dc.subjectmusic information retrievalen_US
dc.subjectlocality sensitive hashingen_US
dc.subjectmusic similarity searchen_US
dc.subjecthigh-dimensional similarity searchen_US
dc.subjectmarsyasen_US
dc.subjectP2Pen_US
dc.subjectfeature vectoren_US
dc.subjectmusic featuresen_US
dc.titleDistributed high-dimensional similarity search with music information retrieval applicationsen_US
dc.typeThesisen_US

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Faghfouri_Aidin_MSc_2011.pdf
Size:
1.45 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.74 KB
Format:
Item-specific license agreed upon to submission
Description: