All-against-all approximate substring matching

Date

2010-01-21T17:05:47Z

Authors

Barsky, Marina

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Finding local regions of high similarity in a set of strings is of great importance in biological sequence analysis. This problem is far from being efficiently solved. In this thesis we study the best known solutions to this problem. We present a new and efficient algorithm to solve the "threshold all vs. all" variant of the problem. which involves searching two strings (with length N and M respectively) for all maximal approximate substring matches of length at least S, with up to K differences. The algorithm is based on a novel graph model and solves the problem in time O(NMK2). We also explore the possibility of extending our approach to the local alignment problem for multiple strings. Our developed program is a practical solution that detects similar regions in a set of strings in a feasible time, for cases of practical importance.

Description

Keywords

biochemistry, data processing, bioinformatics

Citation