Extending Gem5 with custom RISC-V commands

Nick Felker
9 min readJun 19, 2023

--

I’ve been doing research on non-volatile memory on top of Gem5. Up to this point I have yet to show benchmarks where non-volatile memory performs better than DDR4 RAM. That’s because DDR4 is very good, but also because existing computer systems are heavily optimized towards behaviors that take full advantage of DDR4.

Yet we know that existing RAM is not going to scale indefinitely. Our current CPUs have the “Von Neumann bottleneck”. So my research turns towards extending computer systems with new kinds of commands that are better optimized for non-volatile memory.

In this article, I present an end-to-end flow for building a Gem5 simulation using custom RISC-V commands. I reviewed many tutorials, many years old and outdated. Others were newer but didn’t show the complete end-to-end flow. I have finally gotten something to work and I want to show everything together.

Creating the perfect development environment

After days of struggle, my wonderful setup of Docker running in WSL running on Windows 10 failed. There were weird compiler errors where I couldn’t build the RISC-V toolchain and I couldn’t get a straight answer on how to fix it.

As such, I spun up an Ubuntu 20 VM and ran the build commands. Everything built fine, albeit slowly. Admittedly I was rather annoyed by spending so much time when the fix was so simple.

So I had to rebuilt my environment from scratch. I cloned the Gem5 repository and build that. I also cloned and built the riscv-gnu-toolchain. It takes about half an hour to build the latter in my VM even using multiple threads. So any failed build takes some time to correct, but at least it builds.

Custom commands

In the past I used x86 as the basis for my simulations. This makes sense as my PC is an Intel i9–9900K CPU. But the x86 architecture is not designed for flexibility. Rather, I turned to RISC-V an open-source CPU architecture that makes it easy to add custom instructions.

Reading through several online tutorials, I decided to implement three novel instructions: modulus, greatest common denominator, and factorial. Note that the actual behaviors are not important right now as much as getting the instructions to behave consistently.

The first step is to load riscv-gnu-toolchain and open up binutils/opcodes/riscv-opc.c . This contains definitions for each instruction. You can just add your own to the riscv_opcodes array:

{"mod",      0, INSN_CLASS_I,       "d,s,t",         MATCH_MOD, MASK_MOD, match_opcode, 0 },
{"gcd", 0, INSN_CLASS_I, "d,s,t", MATCH_GCD, MASK_GCD, match_opcode, 0 },
{"fact", 0, INSN_CLASS_I, "d,a", MATCH_FACT, MASK_FACT, match_opcode, 0 },

You can see I define the name of the instruction, the operands, and the MATCH/MASK constants. These two constants will be defined later and are used in order to parse this instruction.

Next, open up binutils/include/opcode/riscv-opc.h and add in definitions for the MATCH/MASK:


#define MATCH_GCD 0x6027
#define MASK_GCD 0xfe00707f
#define MATCH_FACT 0x27
#define MASK_FACT 0x7f
#define MATCH_MOD 0x200006b
#define MASK_MOD 0xfe00707f

Further down, where we are declaring our instructions, you’ll need to add each of them:

DECLARE_INSN(mod, MATCH_MOD, MASK_MOD)
DECLARE_INSN(gcd, MATCH_GCD, MASK_GCD)
DECLARE_INSN(fact, MATCH_FACT, MASK_FACT)

It’s just that easy to add in custom instructions. Now we can rebuild the toolchain and if it succeeds we’ll actually be able to build new programs using them. Assuming you did everything right, you can step away for a coffee break.

./configure --prefix=/opt/riscv --host=riscv64-unknown-elf
sudo make clean
sudo make -j16

Compiling Programs

While we have managed to add our instructions, that’s only in the RISC-V compiler and not in the C-language itself. As such, example programs will need to embed assembly steps within the broader program. Here’s modulus.c:

#include <stdio.h>
int main(){
int a,b,c;
a = 5;
b = 2;
asm volatile
(
"mod %[z], %[x], %[y]\n\t"
: [z] "=r" (c)
: [x] "r" (a), [y] "r" (b)
);
if ( c != 1 ){
printf("\n[[FAILED]]\n");
return -1;
}
printf("\n[[PASSED]]\n");
return 0;
}

You can see most of it is a regular C program, but the modulus instruction needs to be embedded with the asm volatile ( ... ); syntax.

We can compile this program using the elf binaries we compiled earlier. First, we will need to ensure that our RISC-V binaries are included in the system PATH. Open ~/.bashrc and add this line at the end:

export PATH=$PATH:/opt/riscv/bin

Then run source ~/.bashrc to refresh the terminal.

Now I can use our modified version of gcc with the new RISC-V instructions:

/opt/riscv/bin/riscv64-unknown-elf-gcc modulus.c -o modulus.o

If this succeeded, you will see a new binary file called modulus.o . As a binary, we cannot read this directly. However, we can disassemble it and see what’s inside:

/opt/riscv/bin/riscv64-unknown-elf-objdump -D modulus.o

00000000000101a2 <main>:
101a2: 1101 add sp,sp,-32
101a4: ec06 sd ra,24(sp)
101a6: e822 sd s0,16(sp)
101a8: 1000 add s0,sp,32
101aa: 4795 li a5,5
101ac: fef42623 sw a5,-20(s0)
101b0: 4789 li a5,2
101b2: fef42423 sw a5,-24(s0)
101b6: fec42783 lw a5,-20(s0)
101ba: fe842703 lw a4,-24(s0)
101be: 02e787eb mod a5,a5,a4
101c2: fef42223 sw a5,-28(s0)
101c6: fe442783 lw a5,-28(s0)
101ca: 0007871b sext.w a4,a5
101ce: 4785 li a5,1
101d0: 00f70963 beq a4,a5,101e2 <main+0x40>
101d4: 67c9 lui a5,0x12
101d6: 67878513 add a0,a5,1656 # 12678 <__errno+0x8>
101da: 1b2000ef jal 1038c <puts>
101de: 57fd li a5,-1
101e0: a039 j 101ee <main+0x4c>
101e2: 67c9 lui a5,0x12
101e4: 68878513 add a0,a5,1672 # 12688 <__errno+0x18>
101e8: 1a4000ef jal 1038c <puts>
101ec: 4781 li a5,0
101ee: 853e mv a0,a5
101f0: 60e2 ld ra,24(sp)
101f2: 6442 ld s0,16(sp)
101f4: 6105 add sp,sp,32
101f6: 8082 ret

You will see that it correctly parsed the instruction 02e787eb as our modulus command. Now let’s run it.

Running custom instructions in Gem5

Let’s start out with a simple Gem5 configuration. We can copy the demo script configs/learning_gem5/part1/simple.py and rename it to configs/learning_gem5/part1/simple-riscv.py .

First, we’ll need to update the CPU import to system.cpu = RiscvTimingSimpleCPU() as it’s not x86. Second, we will update the binary path to point to modulus.o for execution.

We can rebuild Gem5 now and run our script:

scons build/RISCV/gem5.opt -j 16
build/RISCV/gem5.opt configs/learning_gem5/part1/simple-riscv.py

This will return an error:

Beginning simulation!
build/RISCV/sim/simulate.cc:192: info: Entering event queue @ 0. Starting simulation...
build/RISCV/arch/riscv/faults.cc:188: panic: Unknown instruction 0x02e787eb at pc (0x101e6=>0x101ea).(0=>1)

Once it hits our mod instruction, it has no idea what to do. We need to ensure that Gem5 also supports our custom instructions. Again, this process is fairly straightforward.

All RISC-V instructions are defined in src/arch/riscv/isa/decoder.isa . Open that up and peruse it. It’s a nested data structure that defines how to implement the command within Gem5’s simulation.

As we’re defining normal instructions, we’ll start out with 0x3: decode OPCODE and go down deeper. We’ll add the following:

0x1A: decode FUNCT3 {
format ROp {
0x0: decode FUNCT7 {
0x01: mod({{
Rd = Rs1_sd % Rs2_sd;
}});
}
}
}

You can start to see how this syntax works. We decode the OPCODE field and for the mod instruction, it is 0x1A . Once that is matched, we then define the FUNCT3 bits. We use the ROp format because it is an R-type instruction (there are more kinds of instruction layouts that are not covered.

R-type instructions also contain a FUNCT7 field to more precisely identify the operation we want to perform. The modulus operation has a FUNCT3 of 0x0 and a FUNCT7 of 0x01 . Within the mod , we can define some C language behavior that Gem5 can run. You can see references to our destination register Rd and our two input registers Rs1_sd and Rs2_sd .

Now we can rebuilt and re-run our modulus program.

scons build/RISCV/gem5.opt -j 16
build/RISCV/gem5.opt configs/learning_gem5/part1/simple-riscv.py

Beginning simulation!
build/RISCV/sim/simulate.cc:192: info: Entering event queue @ 0. Starting simulation...

[[PASSED]]
Exiting @ tick 144701000 because exiting with last active thread context

It works completely! Now I can build similar programs that do not use these custom flows and compare performance between them in novel benchmark suites.

Implementing GCD and FACT

At the start of this, I added three custom instructions. The gcd and fact instructions come from another tutorial and that one is much less clear on how to reach this end-state.

For one, it only gives the MATCH_ and MASK_ definitions without the opcode fields. So I’ve figured those out myself.

For R-type instructions in RISC-V, each instruction is a 32-bit number:

funct7 | rs2    | rs1    | funct3 | rd     | opcode | quadrant
31..25 | 24..20 | 19..15 | 14..12 | 11..7 | 6..2 | 1..0

So when I look at my earlier mod operation, I can parse it using these bitfields.

0x02e787eb
0000 0010 1110 0111 1000 0111 1110 1011

funct7 | rs2 | rs1 | funct3 | rd | opcode | quadrant
0000001 | 01110 | 01111 | 000 | 01111 | 11010 | 11

Quadrant 3 (3)
Opcode 26 (1A)
Rd 585 (249)
Funct3 0 (0)
Rs1 585 (249)
Rs2 584 (248)
Funct7 1 (1)

Parsing it into binary, decimal, and hex shows how everything connects together. Then you can see why I used the values in decoder.isa of Opcode: 0x1A , Funct3: 0x0 , and Funct7: 0x01 .

Now I can do the same thing with my other operations. Since I was given a value for MATCH_ , I can work backwards to obtain the instruction params. Essentially, I need the conditional (ins & MASK_GCD) == MATCH_GCD to be true.

To make this process a bit easier, I built a tool in JavaScript to perform these calculations:

Now I can plug those into my decoder.isa in the right location and rebuild.

0x09: decode FUNCT3 {
...
format ROp {
0x6: decode FUNCT7 {
0x0: gcd({{
Rd = Rs1 + Rs2 // Or some other C-lang impl
}});
}
}
}

Implementing FACT

The same parsing can happen for the FACT instruction as well.

Note: The web app above only works for R-types. But a factorial instruction is an immediate CPU instruction (U-type). This meant that I couldn’t use it to decode this instruction. But manually parsing it will still work.

In a U-type instruction, the upper 20 bits are a constant, followed by the destination register and then the opcode.

As such, I have not yet figured out how to properly parse this and obtain the opcode.

#include <stdint.h>
#define FACT_DIGITS 10000
int main(void)
{
uint32_t num1 = 2321, num2 = 1771731, gcd = 0;
uint32_t fact_test_val = 10;
uint32_t fact_result_ptr;
uint8_t fact_result[FACT_DIGITS];
fact_result_ptr = (uint64_t)fact_result;
asm volatile("gcd %0, %1,%2\n":"=r"(gcd):"r"(num1),"r"(num2):);
//suppose we want to compute the factorial of 125 so immediate=250
asm volatile("fact %0, %1\n":"=r"(fact_result_ptr):"i"(250):);
return 0;
}

Similar to before, I can use my modified RISC-V compiler to create a binary:

/opt/riscv/bin/riscv64-unknown-elf-gcc factorial.c -o factorial.o
/opt/riscv/bin/riscv64-unknown-elf-objdump -D factorial.o

000000000001019c <main>:
1019c: 8c010113 add sp,sp,-1856
101a0: 72813c23 sd s0,1848(sp)
101a4: 74010413 add s0,sp,1856
101a8: 72f9 lui t0,0xffffe
101aa: 9116 add sp,sp,t0
101ac: 6785 lui a5,0x1
101ae: 91178793 add a5,a5,-1775 # 911 <exit-0xf7d7>
101b2: fef42623 sw a5,-20(s0)
101b6: 001b17b7 lui a5,0x1b1
101ba: 8d378793 add a5,a5,-1837 # 1b08d3 <__global_pointer$+0x19ebfb>
101be: fef42423 sw a5,-24(s0)
101c2: fe042223 sw zero,-28(s0)
101c6: 47a9 li a5,10
101c8: fef42023 sw a5,-32(s0)
101cc: 77f9 lui a5,0xffffe
101ce: 8d878793 add a5,a5,-1832 # ffffffffffffd8d8 <__global_pointer$+0xfffffffffffebc00>
101d2: 17c1 add a5,a5,-16
101d4: 97a2 add a5,a5,s0
101d6: fcf42e23 sw a5,-36(s0)
101da: fec42783 lw a5,-20(s0)
101de: fe842703 lw a4,-24(s0)
101e2: 00e7e7a7 gcd a5,a5,a4
101e6: fef42223 sw a5,-28(s0)
101ea: f11ef7a7 fact a5,fa <exit-0xffee>
101ee: fcf42e23 sw a5,-36(s0)
101f2: 4781 li a5,0
101f4: 853e mv a0,a5
101f6: 6289 lui t0,0x2
101f8: 9116 add sp,sp,t0
101fa: 73813403 ld s0,1848(sp)
101fe: 74010113 add sp,sp,1856
10202: 8082 ret

Finally, I can update simple-riscv.py to run factorial.o and then simulate using Gem5. However, as FACT is not yet implemented, I still get an error. I aim to fix this and will update the post once I do.

build/RISCV/gem5.opt configs/learning_gem5/part1/simple-riscv.py

Conclusion

In the future, having a working definition of GCD and FACT might make more sense. But with this scaffolding in place I can have these commands do anything, or even add new commands.

I’ll have to go beyond that too, designing new instructions that take full advantage of non-volatile memory, perhaps through in-memory computing.

But I hope in the meantime this article helps someone else by consolidating all these tutorials into one place with a streamlined set of instructions.

--

--

Nick Felker
Nick Felker

Written by Nick Felker

Social Media Expert -- Rowan University 2017 -- IoT & Assistant @ Google

Responses (4)