On designing coarse grain reconfigurable arrays to operate in weak inversion




Ross, Dian Marie

Journal Title

Journal ISSN

Volume Title



Field Programmable Gate Arrays (FPGAs) support the reconfigurable computing paradigm by providing an integrated circuit hardware platform that facilitates software like reconfigurability. The addition of an embedded microprocessor and peripherals to traditional FPGA Combinational Logic Blocks (CLBs) interleaved with interconnections has effectively resulted in a programmable system on-chip. FPGAs are used to support flexible implementations of Application Specific Integrated Circuit (ASIC) functions. Because FPGAs are reconfigurable, they often are used in place of ASICs during the cicuit design process. FPGAs are also used when only a small number of ICs are required: ASICs necessitate large manufacturing runs to be economically viable; for smaller runs the use of FPGAs is an economic alternative. Application domains of interest, such as intelligent guidance systems, medical devices, and sensors, often require low power, inexpensive calculation of trance- dental functions. COordinate Rotation DIgital Computer (CORDIC) is an iterative algorithm used to emmulate hardware expensive multipliers, such as Multiply/ACculmulate (MAC) units, with only shift and add operations. However, because CORDIC is a sequential algorithm, characterized as having the latency of a serial multiplier, techniques that speed up computational performance have many applications.To this end, three implementations of standard CORDIC, (i) unrolled hardwired, (ii) unrolled programmable, and (iii) rolled programmable, were implemented on four Xilinx FPGA families: Virtex-4, -5, and -6, and Spartan-6. Although hardwired unrolled was found to have the greatest speed at the expense of no runtime flexibility, and rolled programmable was found to have the greatest flexibility and lowest silicon area consumption at the expense of the longest propagation delay, improvements to CORDIC implementations were still sought. Three parallelized CORDIC techniques, P-CORDIC, Flat-CORDIC, and Para-CORDIC, were implemented on the same four FPGA families. P-CORDIC and Flat-CORDIC, were shown to have the lowest latency under various conditions; Para-CORDIC was found to perform well in deeply pipelined, high throughput circuits. Design rules for when to use standard versus precomputation CORDIC techniques are presented. To address the low power requirements of many applications of interest, the Unfolded Multiplexor-LRB (UMUX-LRB), patent held by Sima, et al, was analyzed in weak inversion across four transistor technology nodes (180nm, 130nm, 90nm, and 65nm). Previous was also expanded from strong inversion across 180nm, 130nm, and 90nm technology nodes to also include 65nm. The UMUX-LRB interconnection network is based upon the Xilinx commercial interconnection network. Therefore, this network (MUX-LRB), and another static circuit technique, CMOS-Transmission Gates (CMOS-TG), were profiled across all four technology nodes to provide a baseline of comparision. This analysis found the UMUX-LRB to have the smallest and most balanced rising and falling edge propagation delay, in addition to having the greatest reliability for temperature and process variation.



Reconfigurable Computing, CORDIC, sign precomputation, FPGA, Programmable Interconnection Network, Level Restoring Buffers, Sub-Threshold Logic