
ABSTRACT
Servers in data centers consume a lot of electricity. At Oracle Labs we are building an energy-efficient parallel computer -RAPID- to accelerate data analytics in the Oracle DBMS. Although not quite finished, we have already learned that industrial strength computing is possible using only tens of milliwatts per thread, and that we can write commercially-significant software with each thread having a cache footprint around 100 KB. The question arises: is this a new, more efficient architecture, or did we just capture a very special case? To answer that, we are planning to tape out a more general, experimental chip later this year. The proposed design is a 32-core, bus-based, cache-coherent multiprocessor using an existing out-of-order processor (BOOM) from the UC Berkeley RISC-V project, and associated Chisel design tools. We expect substantial participation of UC Berkeley students for the design, as well as student interns from other universities. The system comprises nodes consisting of processor chip, two DDR4 memory channels, a PCIe network NIC card, and optionally a PCIe FLASH storage card, connected by off-the-shelf Ethernet switches. Preliminary analysis suggests 6X power efficiency improvement relative to an Intel Xeon cluster, and up to 10X for a 96-core, large die version. The goal is to show that even using a modern, full-featured processor core, we can nevertheless achieve excellent energy efficiency, but by running Linux and being feature-wise fairly standard, greatly expand the range of applications that can be ported with low effort. We hope this chip will enable further research to enhance the performance and energy efficiency of this kind of architecture.