All-against-all approximate substring matching
| dc.contributor.author | Barsky, Marina | |
| dc.contributor.supervisor | Thomo, Alex | |
| dc.contributor.supervisor | Upton, Christopher | |
| dc.date.accessioned | 2010-01-21T17:05:47Z | |
| dc.date.available | 2010-01-21T17:05:47Z | |
| dc.date.copyright | 2006 | en |
| dc.date.issued | 2010-01-21T17:05:47Z | |
| dc.degree.department | Department of Computer Science | |
| dc.degree.level | Master of Science M.Sc. | en |
| dc.description.abstract | Finding local regions of high similarity in a set of strings is of great importance in biological sequence analysis. This problem is far from being efficiently solved. In this thesis we study the best known solutions to this problem. We present a new and efficient algorithm to solve the "threshold all vs. all" variant of the problem. which involves searching two strings (with length N and M respectively) for all maximal approximate substring matches of length at least S, with up to K differences. The algorithm is based on a novel graph model and solves the problem in time O(NMK2). We also explore the possibility of extending our approach to the local alignment problem for multiple strings. Our developed program is a practical solution that detects similar regions in a set of strings in a feasible time, for cases of practical importance. | en |
| dc.identifier.uri | http://hdl.handle.net/1828/2090 | |
| dc.language | English | eng |
| dc.language.iso | en | en |
| dc.rights | Available to the World Wide Web | en |
| dc.subject | biochemistry | en |
| dc.subject | data processing | en |
| dc.subject | bioinformatics | en |
| dc.subject.lcsh | UVic Subject Index::Sciences and Engineering::Applied Sciences::Computer science | en |
| dc.title | All-against-all approximate substring matching | en |
| dc.type | Thesis | en |