New techniques for the location of hot spots in proteins and exons in DNA using digital filters

Date

2011-05-30

Authors

Ramachandran, Parameswaran

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

The development, implementation, and performance evaluation of new techniques for the location of hot spots in proteins and exons in DNA using digital filters are presented. The application of bandpass notch (BPN) digital filters for locating hot spots in proteins is first investigated. A technique is proposed for designing the appropriate BPN filter for a specific protein sequence in which the area under the amplitude response is minimized to achieve maximum selectivity for a chosen stability margin. The minimization is performed using the golden-section search. A tuning technique is also proposed for improving the accuracy of the BPN filter. The tuning is carried out using a least-squares polynomial model. Several example protein sequences are used to illustrate these techniques. BPN filters are then employed for locating exons in DNA. An additional step of lowpass filtering is introduced in order to detect the strength of the bandpass filtered signal as a function of nucleotide location. For the character-to-numerical mapping, the application of the electron-ion interaction potentials (EIIPs) of the nucleotides as well as their binary sequences is investigated. The performance of the techniques is then evaluated using metrics such as sensitivity, specificity, accuracy, precision, and computational efficiency. These metrics are used in conjunction with the so-called receiver operating characteristic (ROC) technique to establish a reliable framework for the comparisons. For exon location, a technique based on the short-time discrete Fourier transform (STDFT) reported in the literature is also included in the comparison. The effect of using different window functions on the prediction accuracy of the technique is explored. Using a set of examples, it is shown that BPN filters predict short exons with better accuracy than the STDFT. The test dataset comprised 66 protein sequences and 160 DNA sequences obtained from the protein data bank and the HMR195 database, respectively. Results show that among the techniques considered, BPN filters perform best for the location of both protein hot spots and DNA exons in terms of accuracy and computational efficiency. User-friendly MATLAB implementations of the techniques incorporating graphical interfaces are also described. Optimized numerical mapping schemes are proposed for exon location using both EIIP as well as binary sequences. Characteristic numerical values are obtained for the four nucleotides using a training procedure in which the prediction accuracy is maximized using a quasi-Newton algorithm based on the Broyden-Fletcher-Goldfarb-Shanno updating formula. A training set of 80 DNA sequences is chosen from the HMR195 database and the objective function is formulated using the ROC technique. The procedure is initialized using EIIP values. Unbiased testing of the optimized values is carried out using a test set that has no overlap with the training set. Simulation results show that the optimized values yield more accurate exon locations than those obtained using the actual EIIP values. In addition, they perform significantly better than a set of existing optimized complex values. By employing a similar strategy to optimize the weights of the binary sequences, it is shown that, in practice, only three out of four binary sequences are necessary to obtain accurate estimates of exon locations. Consequently, a computational saving of 25% can be achieved, which is substantial considering that DNA sequences encountered in practice are very long in nature.

Description

Keywords

hot spots, proteins, exons, genetics

Citation