An integrated complexity analysis of problems from computational biology

Date

2018-07-17

Authors

Hailett, Micheiel Trevor

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

We perform an Integrated complexity analysis on a number of combinatorial problems arising from the field of computational biology. The classic framework of NP-completeness, algorithmic design techniques for bounded width graphs, and parameterized computational complexity together provide a clear and detailed map of the intrinsic hardness of the following problems: INTERVALIZING COLORED GRAPHS and SHORTEST COMMON SUPERSEQUENCE. The fundamental concern of parameterized complexity is the apparent qualitative difference in algorithmic behaviour displayed by many problems when one or more input parameters are bounded. For many problems, only a small range of values for these parameters capture most instances arising in practice. This is certainly the case in computational biology in several specific arenas such as DNA physical mapping or multiple sequence alignment. At its most general level, parameterized complexity partitions problems into two classes: fixed parameter tractable (FPT) and fixed parameter intractable (hard for classes of the W-hierarchy.) The former indicates that the particular parameterization may allow for efficient practical algorithms whilst the latter indicates the parameterization is not effective (asymptotically) in alleviating the intractability. The problem INTERVALIZING COLORED GRAPHS (ICG) models in a straightforward albeit limited way the determination of contig assemblies in the mapping of DNA. We show ICG to be NP-complete (no polynomial time algorithm unless P=NP), not finite-state (a very general algorithmic design technique for bounded width graphs fails), and hard for the parameterized complexity class W[1] (a specific parameterized version of ICG does not admit an efficient algorithm unless many other well-known - and apparently hard - problems admit efficient algorithms). Both SHORTEST COMMON SUPERSEQUENCE and its sister problem LONGEST COMMON SUBSEQUENCE have applications in multiple sequence alignment. We show that SHORTEST COMMON SUPERSEQUENCE PARAMETERIZED BY THE NUMBER OF INPUT STRINGS AND THE SIZE OF THE ALPHABET is hard for complexity class W[1]. As is the case with ICG, this implies that it does not admit efficient algorithms unless some unlikely computational complexity collapses occur.

Description

Keywords

Computational complexity, Biology, Computer simulation

Citation