RISC-V & Gem5 Part 2: Decoding FACT
In my previous blog post, which I published quite late on a Sunday, I had gotten custom RISC-V instructions to work with a program containing assembly and managed to simulate them in Gem5. Those two instructions were mod and gcd . Both are considered R-type instructions in Gem5. Basically that implies there is a destination register and two register inputs. Let’s take one instruction and parse it:
101e2: 00e7e7a7 gcd a5,a5,a4
00e7e7a7
0000 0000 1110 0111 1110 0111 1010 0111
The quadrant is not an official property of a RISC-V instruction, but something that is incorporated into Gem5’s RISC-V ISA decoder. A quadrant of 0x3
refers to 32-bit instructions:
decode QUADRANT default Unknown::unknown() {
0x0: decode COPCODE {
...
}
0x1: decode COPCODE {
...
}
0x2: decode COPCODE {
...
}
0x3: decode OPCODE {
...
So we can split off the final two bits since all our instructions will be 32-bits only.
Parsing FACT
In this program, I also include a line which is marked to compute a factorial operation:
asm volatile("fact %0, %1\n":"=r"(fact_result_ptr):"i"(250):);
This is not an R-type instruction. It is defined in the riscv-gnu-toolchain
with having a destination register and an immediate value similar to built-in rules like jal
.
{"fact", 0, INSN_CLASS_I, "d,a", MATCH_FACT, MASK_FACT, match_opcode, 0 },
So our decoder rules above do not work here.
101ea: f11ef7a7 fact a5,fa <exit-0xffee>
f11ef7a7
1111 0001 0001 1110 1111 0111 1010 0111
My original understanding was that this was a U-type or a J-type instruction:
However, 0xF11EF
does not correspond to my input constant of FA
even though the assembler and disassembler used the correct values. It was at this point I was stymied as of Sunday night. Now I intend to figure it out.
Reassembly
Since I have the original program and my modified RISC-V assembler, I can make any number of changes to the immediate:
/opt/riscv/bin/riscv64-unknown-elf-gcc factorial.c -o factorial.o
/opt/riscv/bin/riscv64-unknown-elf-objdump -D factorial.o | grep "fact"
From this I obtained a table of results:
fact %0, %1\n
It should be noted that the compiler throws an error when using 255
:
/tmp/ccS2nCnh.o: in function `main':
factorial.c:(.text+0x4e): relocation truncated to fit: R_RISCV_JAL against `*UND*'
collect2: error: ld returned 1 exit status
It should also be noted that while -8!
is not a valid factorial, the RISC-V toolchain neither knows nor cares. It is a problem for the execution environment.
Looking at this data specifically, it doesn’t really elicit any obvious connections. To improve matters, I simplified the program I wrote to be as simple as possible and tried out a few things afterwards.
Making Connections
This program:
#include <stdint.h>
int main(void)
{
uint32_t fact_result_ptr;
asm volatile("fact %0, %1\n":"=r"(fact_result_ptr):"i"(0):);
return 0;
}
Becomes:
000000000001019c <main>:
1019c: 1101 add sp,sp,-32
1019e: ec22 sd s0,24(sp)
101a0: 1000 add s0,sp,32
101a2: e5fef7a7 fact a5,0 <exit-0x100e8>
101a6: fef42623 sw a5,-20(s0)
101aa: 4781 li a5,0
101ac: 853e mv a0,a5
101ae: 6462 ld s0,24(sp)
101b0: 6105 add sp,sp,32
101b2: 8082 ret
If I change the constant:
#include <stdint.h>
int main(void)
{
uint32_t fact_result_ptr;
asm volatile("fact %0, %1\n":"=r"(fact_result_ptr):"i"(254):);
return 0;
}
The assembly becomes:
000000000001019c <main>:
1019c: 1101 add sp,sp,-32
1019e: ec22 sd s0,24(sp)
101a0: 1000 add s0,sp,32
101a2: f5def7a7 fact a5,fe <exit-0xffea>
101a6: fef42623 sw a5,-20(s0)
101aa: 4781 li a5,0
101ac: 853e mv a0,a5
101ae: 6462 ld s0,24(sp)
101b0: 6105 add sp,sp,32
101b2: 8082 ret
So every single instruction is the same except for the one fact
. That narrows down the possibilities of what the compiler is doing.
Here’s a table with some calculations:
What I also had to consider was the endianness. After converting the difference between the disassembled immediate and the immediate at 0
, I then have to swap the endian value to arrive correctly at the original program input.
I added in a few question marks, as some edge-cases like 1000
don't seem to be correctly converted back to its original value even though that seems to be the case in most of them.
The compiler is doing something here that’s too clever for me to understand. It is adding in an extra constant to every number I pass into the program in a way that’s consistent.
Opcodes
To add this behavior into Gem5, I need an opcode. We saw before that Gem5’s decoder.isa
file can disambiguate instructions like gcd
based on the opcode
, FUNCT3
, and FUNCT7
fields. For a command like fact
, there are no disambiguities, so this has to be unique.
I added a MASK_GCD
and a MATCH_GCD
values. In the instruction decoder, the equation (ins & MASK_GCD) == MATCH_GCD
in order for the command to be processed.
101e2: 00e7e7a7 gcd a5,a5,a4
00e7e7a7
0000 0000 1110 0111 1110 0111 1010 0111
#define MATCH_GCD 0x6027
// 0000 0000 0000 0000 0110 0000 0010 0111
#define MASK_GCD 0xfe00707f
// 1111 1110 0000 0000 0111 0000 0111 1111
0000 0000 1110 0111 1110 0111 1010 0111
& 1111 1110 0000 0000 0111 0000 0111 1111
= 0000 0000 0000 0000 0110 0000 0010 0111
= 0x6027
So that’s cool. In the tutorial I got this from, 0x9
is defined as the opcode and we could verify that with the logic above. However, adding:
0x09: UOp::fact({{
Rd = (uint64_t)(sext<20>(imm) << 12); // FIXME: Update behavior later
}}, IsDirectControl, IsUncondControl);
will not work because that opcode already has a lot of ambiguity, so I get a compiler error in Gem5:
build/RISCV/arch/riscv/generated/decode-method.cc.inc:865:9: error: duplicate case value
865 | case 0x9:
| ^~~~
build/RISCV/arch/riscv/generated/decode-method.cc.inc:859:9: note: previously used here
859 | case 0x9:
| ^~~~
scons: *** [build/RISCV/arch/riscv/generated/decoder.o] Error 1
scons: building terminated because of errors.
Looking further down in decoder.isa
, I see that 0x0a
is not taken by any opcode so I want to use that instead. This will mean having to update my values of MASK_FACT
and MATCH_FACT
. So, knowing the formula, I can work backwards.
101ea: f11ef7a7 fact a5,fa <exit-0xffee>
f11ef7a7
1111 0001 0001 1110 1111 0111 1010 0111
The opcode
bits need to be 0xa
, so the final bits need to be 101011
including the quadrant
. Without a FUNCT3
or FUNCT7
, those fields should be equal to 0
in the MATCH, and zeroed out in the MASK since we may never get them.
As such, the MATCH_FACT
field should be set to 0x2B
and MASK_FACT
should be 0x7F
. This will require a slight update in the riscv-gnu-toolchain
project and our compiled instruction will be moderately different as well.
./configure --prefix=/opt/riscv --host=riscv64-unknown-elf
sudo make clean
sudo make -j16
With the new fields, the program needs to be rebuild:
/opt/riscv/bin/riscv64-unknown-elf-gcc factorial.c -o factorial.o
/opt/riscv/bin/riscv64-unknown-elf-objdump -D factorial.o | grep "fact"
101a2: e57ef7ab fact a5,fffffffffffffff8 <__global_pointer$+0xfffffffffffee370>
1110 0101 0111 1110 1111 0111 1010 1011
& 0000 0000 0000 0000 0000 0000 0111 1111
= 0000 0000 0000 0000 0000 0000 0010 1011
= 0x2B
Now we can jump back to Gem5 to update our decoder.isa
:
0x0a: JOp::fact({{
Rd = Rd + 1;
}}, IsDirectControl, IsUncondControl);
Then I build Gem5 once again and run the basic configuration I defined previously:
scons build/RISCV/gem5.opt -j 16
Before I actually run the program, I’m going to do a quick rewrite so that I can actually print the result in the terminal as the original program did not do that.
#include <stdint.h>
#include <stdio.h>
#define FACT_DIGITS 10000
int main(void)
{
uint32_t num1 = 3, num2 = 5, gcd = 0;
uint32_t fact_test_val = 10;
uint32_t fact_result_ptr;
uint8_t fact_result[FACT_DIGITS];
fact_result_ptr = (uint64_t)fact_result;
asm volatile("gcd %0, %1,%2\n":"=r"(gcd):"r"(num1),"r"(num2):);
asm volatile("fact %0, %1\n":"=r"(fact_result_ptr):"i"(250):);
printf("\nGCD is %i\n", gcd);
printf("FACT is %i\n", fact_result_ptr);
return 0;
}
Now I can recompile and send that over to my Gem5 directory:
/opt/riscv/bin/riscv64-unknown-elf-gcc factorial.c -o factorial.o
cp ~/Desktop/riscv-gnu-toolchain/factorial.o ~/Desktop/gem5
cd ~/Desktop/gem5
build/RISCV/gem5.opt configs/learning_gem5/part1/simple-riscv.py
Whereas in the last post this produced an error, I finally get the expected output:
Beginning simulation!
GCD is 8
FACT is 9
Exiting @ tick 246117000 because exiting with last active thread context
You will see that the results are not mathematically correct. That’s because I didn’t implement the instructions to be mathematically correct. gcd
actually just sums the two parameters. fact
takes the register and just increments it by one. Following those rules, the program does in fact work as I expected.
Conclusion
At this point, I have learned a lot, but it admittedly won’t be too useful. Most of my research is heading in the direction of in-memory computing. This typically will mean that the destination register isn’t going to use a single immediate but rather incorporate some register-based parameters: either a R-type or I-type instruction.
I-type instructions are like addi
, adding an immediate (12-bits) to a source register and then sticking that result into a destination register.
I don’t quite know what I want to do yet, but this deep dive into immediates within RISC-V might just come in handy.