Energy and performance improvement relying on trivial instructions and speculative snooping in high-performance processors

Date

2010-04-12T19:50:04Z

Authors

Atoofian, Ehsan

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

This thesis introduces energy and performance optimization techniques for high-performance processors. Our optimization techniques target both single processors and chip multiprocessors (CMPs). In single processors, we exploit trivial instructions to improve energy and performance. Trivial instructions are those instructions whose output can be determined without performing the actual computations. We show that bypassing such unnecessary computations reduces energy while improve performance. Performance improvement achieved by skipping executing trivial instructions depends on how early such instructions are identified. We use value prediction to detect trivial instructions with high accuracy and as soon as possible. Consequently, we improve performance over a processor that bypasses trivial instructions without using speculation. In CMPs, we introduce two techniques to improve energy of interconnect and caches Conventional snoopy based chip multiprocessors take an aggressive approach broadcasting snoop requests to all nodes. In addition, each node checks all received requests. This approach reduces the latency of cache to cache transfer misses at the expense of increasing energy. We exploit this design inefficiency and introduce two optimization techniques in CMPs. First, and at the requester end, we introduce speculative selective request (SSR) to reduce energy consumption in the binary tree interconnect. In SSR, we send the request only to the node more likely to have the missing data. We reduce energy as we limit accesses only to the interconnect components between the requestor and the supplier node. Second, and at the receiving end, we propose speculative tag lookup (STL) to reduce energy consumption in data caches. We filter those accesses more likely to miss in the L1, cache. Using shared memory applications, we show that SSR and STL improve energy of interconnect and caches significantly with negligible performance loss and hardware overhead.

Description

Keywords

High-performance processors, Multiprocessors

Citation