All-against-all approximate substring matching

Barsky, Marina

All-against-all approximate substring matching

Files

Barsky_M_MSc.pdf (10.01 MB)

Date

2010-01-21T17:05:47Z

Authors

Barsky, Marina

Abstract

Finding local regions of high similarity in a set of strings is of great importance in biological sequence analysis. This problem is far from being efficiently solved. In this thesis we study the best known solutions to this problem. We present a new and efficient algorithm to solve the "threshold all vs. all" variant of the problem. which involves searching two strings (with length N and M respectively) for all maximal approximate substring matches of length at least S, with up to K differences. The algorithm is based on a novel graph model and solves the problem in time O(NMK2). We also explore the possibility of extending our approach to the local alignment problem for multiple strings. Our developed program is a practical solution that detects similar regions in a set of strings in a feasible time, for cases of practical importance.

Keywords

biochemistry, data processing, bioinformatics

URI

http://hdl.handle.net/1828/2090

Collections

Electronic Theses and Dissertations (ETD)

Full item page

All-against-all approximate substring matching

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections