Running non-volatile memory in Gem5
I am researching next-generation memory in computer architecturse. Part of that includes learning ways to simulate computer systems with non-volatile memory and part of that of course is reading existing literature on the topic. I recently read A Cycle-level Unified DRAM Cache Controller Model for 3DXPoint Memory Systems in gem5, a paper who builds basically that. This seemed like a great fit. To make things better, they even made their DRAM cache model open source.
However, to use it in my own research or even just replicating their results is not immediately straightforward. This post explains what I did.
nvm_interface
After looking through the original repo and spending a day trying to merge their changes I realized much of their own work had been upstreamed anyway and refactored heavily. Trying to backport their changes led to a lot of build errors and frustration. Of course, there was no need to have a custom NVM interface since Gem5 now has split their mem_interface
into separate dram_interface
and nvm_interface
files.
Going back to my simulation script, simple.py
, I can update the dram
property of my memory bus to connect to an NVM instead. Thanks to the modularity of the Gem5 platform, I can create any number of custom memory devices. I do think the current name, dram
, may need to be updated but probably remains that way for legacy reasons.
If you recall, I’ve previously used standard DDR3 memory. Gem5 has an out-of-the-box version called DDR3_1600_8x8
which inherits the Python DRAMInterface
class. There’s a DDR4 memory class too. For non-volatile memory I'll need to inherit NVMInterface
, which was introduced about three years ago.
Gem5 includes just one out of the box: NVM_2400_1x64
. It is meant to mimic the properties of phase-change memory. You can see many of its properties in the Python file. In general it operates at 1200 MHz with a handful of different bus delays.
*The read/write/send values are defined differently between DRAM and NVM objects. The first attribute is the DRAM field and the second is its closest NVM equivalent based on my understanding of the code.
As you can see, PCM performs better than DDR3 across many metrics. Yet compared to DDR4, it is only comparable and sometimes worse. We’ll show this experimentally in a moment, but it is worth analyzing right now. In order to take advantage of the benefits of non-volatile memory in computer architecture, we cannot just use our current approach of more speed.
This table doesn’t really explore the electrical properties of each either. Partially that’s because Gem5 isn’t designed to track power consumption. Having to refresh volatile memory continually could lead to greater power usage and might be a disadvantage outside the scope of this experiment.
What is nice is that these constants are easily defined and changeable. If I wanted unrealistically fast memory, I could do that.
Evaluating Benchmarks
In a previous post I showed performance when running the benchmarks Whetstone and Dhrystone.
I can now attach this NVM object to my script to run the same benchmarks. I modified my simple.py
simulation script to use DDR3_1600_8x8()
, DDR4_2400_16x4()
, and NVM_2400_1x64()
. Then I ran build/X86/gem5.opt configs/tutorial/part1/simple.py
. For Whetstone, I kept the counter argument fixed at 3000
for each.
# ...
system.mem_ctrl = MemCtrl()
# system.mem_ctrl.dram = DDR3_1600_8x8()
# system.mem_ctrl.dram = DDR4_2400_16x4()
system.mem_ctrl.dram = NVM_2400_1x64()
system.mem_ctrl.dram.range = system.mem_ranges[0]
system.mem_ctrl.port = system.membus.mem_side_ports
binary = 'tests/test-progs/whetstone/src/whetstone'
# binary = 'tests/test-progs/dhrystone/dhrystone'
# for gem5 V21 and beyond
system.workload = SEWorkload.init_compatible(binary)
process = Process()
# process.cmd = [binary]
process.cmd = [binary, '-c', 3000] # for Whetstone
# ...
You can see that the DRAM-based memory perform roughly the same in each case. For PCM, it does measurably worse in the Dhrystone benchmark.
Why is this the case? As I mentioned previously, both of these benchmark programs are quite old and likely are optimized for DRAM kinds of memory. As such, we will need to explore better benchmarks that evaluate more metrics rather than direct speed.
Future Work
If these memory systems are not faster, what are they good for? Non-volatile memories, and analog memory in particular, can be faster for computations through in-memory performance, deep learning, and other kinds of niche improvements. Building out examples of each, along with appropriate benchmarks, is a critical goal to prove the value of the hardware.
Another area of investigation is creating new kinds of NVMInterface
classes that resemble other kinds of non-volatile memory. STT-MRAM, FRAM, and other kinds of materials that have similar non-volatile properties are worth investigating in computer architectures.