If you are interested in what went into writing this blog post, you can view a replay of the livestream here.
In a recent post we explored when Vivado inferred Block RAM (BRAM) for memories in FPGA designs, and when it used distributed RAM instead. While it is somewhat obvious why BRAM can be used for memory in an FPGA (i.e. it is literally a discrete memory element), distributed RAM is a bit more complicated.
Look-Up Tables (LUTs) are a key element of FPGAs that allow it to behave as
“reconfigurable hardware”. Depending on the the size (number of inputs and
outputs) of a LUT, some set of logic functions can be programmed as part of a
design. Understanding how this is possible requires exploring the architecture
of LUT elements. We have been using Xilinx 7 Series
moss, so we’ll focus on them
today. The Xilinx design elements documentation decsribes LUTs as follows.
LUTs are the basic logic building blocks and are used to implement most logic functions of the design.
Internally, LUTs are typically implemented with static random access memory (SRAM) and multiplexers. The SRAM bits are written on FPGA configuration to dictate the outputs when various inputs are supplied. SRAM is the same technology used to implement CPU registers and caches, and is typically faster than dynamic random access memory (DRAM), which has to be periodically refreshed to retain its contents. All About Circuits has a great post on internal architecture of LUTs if you are interested in learning more.
Note: Dynamic Random Access Memory (DRAM) is not to be confused with distributed RAM. The latter is the term used for LUT-based memory in an FPGA, which, somewhat confusingly is being implemented with SRAM. This is why block RAM is sometimes referred to as BRAM, but distributed RAM is not referred to as DRAM.
LUTs can vary in their number of inputs. For example, the diagram below
illustrates a 2-input LUT (
Abstract depiction of a 2-input LUT.
INIT[X] values refer to those programmed via the FPGA bitstream, while the
I1 signals represent wires that are dynamically driven high or low.
Those input signals can be thought of as the “selectors” for the multiplexer,
with each permutation selecting a different initial value. Therefore, if we have
4 initial values, we will need 2 input signals to be able to select each of them
# initial values = 2^(# input signals)).
The ability to set each of the
INIT[X] values to
0 means that we can
implement any truth table. In other
words, we can implement arbitrary logic. The truth table for the
above looks as follows.
Schematic view of the
RAM32M element is described as follows in the Xilinx docs:
This design element is a 32-bit deep by 8-bit wide, multi-port, random access memory with synchronous write and asynchronous independent, 2-bit, wide-read capability. This RAM is implemented using the LUT resources of the device known as SelectRAM™+, and does not consume any of the Block RAM resources of the device.
Expanding the element reveals the underlying LUT primitives.
Expanded schematic view of a
However, rather than being labeled as LUT elements, they are named
Xilinx 7 Series FPGAs, LUTs reside in either a
SLICEM, with the
L suffix denoting “logic” and the
M denoting “memory”. Each slice has 4
6-input LUT elements (i.e.
SLICEM slices, the LUTs can be
configured as distributed memory.
RAM32M elements use all 4 LUTs on a
By our earlier calculation, a
2^6 = 64initial values.
As can be seen in the schematic, the LUTs support 5 read address ports (
and 5 write address (
SLICEM slices contain 6-input LUTs, a
LUT6 is just two
LUT5 elements wired
so a single
RAMD32 refers to a
LUT5 configured as distributed RAM. We see 8
LUT5 elements making up a
RAM32M distributed memory (4
LUT6 elements, each
comprised of 2
Schematic view of a
LUT5acting as a distributed memory component (
If we look at the same elements on the device, we can see the
LUT5 within the
Device view of the same
LUT5element shown in the schematic above.
Note: I am assuming the reason only one
LUT5is shown within the
LUT6is because the device footprint is static in Vivado (i.e. implemented designs are just shown by highlighting the elements in use), and “wrapping” the
LUT6somewhat illustrates that a
LUT5could be used individually or as part of a
LUT6. Feel free to reach out if you have an alternative explanation!
Zooming out further, we can see the 4
LUT6 elements in both
Device view of one
SLICEM(left) and one
SLICEL(right), each with 4
LUT6elements, but only the
SLICEMLUTs showing write address ports.
SLICEM LUTs, as previously mentioned, also support writing, which
is why we see the additional
WADR address lines, as well as a write enable
WE) and write clock (
WCLK) signal. The comparison between BRAM and
distributed RAM in our earlier post identified that, with BRAM, read and writes
are both synchronous, whereas with distributed RAM writes are synchronous, but
reads are asynchronous. In order to implement the synchronous writes, a clock
WCLK) is required.
Note: synchronous reads can be implemented with distributed RAM, as evidenced by our example where we forced distributed RAM usage. Doing so requires connecting the output to a flip-flop in the same slice.
Though it may be obvious at this point, the reason why these LUTs can be used
as distributed RAM is because we already need to be able to write values to
LUT SRAM to configure the FPGA. When they are used as distributed RAM, we use
that same functionality to write values after configuration. The
aforementioned “initial values” (i.e.
INIT[X]) are not only available for
reading, but also for writing.
Closing Thoughts Link to heading
Small realizations about the physical characteristics of a device can help us
develop a better intuition about why designs are being implemented in a certain
manner, as well as the tradeoffs of using different types of components. These
are the tedious details that I hope to continue to capture via
livestreams and posts in this
feed), while developing
expertise in chip
As always, if you have feedback, questions, or just want to chat, feel free to
reach out to
@hasheddan on any of the platforms listed on the home