All-against-all approximate substring matching
Date
2010-01-21T17:05:47Z
Authors
Barsky, Marina
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Finding local regions of high similarity in a set of strings is of great importance in biological sequence analysis. This problem is far from being efficiently solved.
In this thesis we study the best known solutions to this problem. We present a new and efficient algorithm to solve the "threshold all vs. all" variant of the problem. which involves searching two strings (with length N and M respectively) for all maximal approximate substring matches of length at least S, with up to K differences. The algorithm is based on a novel graph model and solves the problem in time O(NMK2).
We also explore the possibility of extending our approach to the local alignment problem for multiple strings. Our developed program is a practical solution that detects similar regions in a set of strings in a feasible time, for cases of practical importance.
Description
Keywords
biochemistry, data processing, bioinformatics