Skip to main content

A tiny RISC-V instruction set simulator

Project description

tinyRV

A RISC-V instruction decoder, instruction set simulator and basic system emulator in less than 1000 lines of python.

Mission: Make the most useful RISC-V disassembler/simulator for understanding the ISA and reverse-engineering binaries with the least amount of easily extendable code. Simulation performance is secondary.

  • Uses official RISC-V specs to decode every specified RISC-V instruction.
  • Simulates the base ISAs and is easily extendable.
  • RV32GC and RV64GC compliance validated using riscof and riscv-tests (see Testing below).
  • IEEE754 compliant single-precision and double-precision floating point.
  • Emulates basic user environment for running ELFs. Supports some linux system calls, argc/argv, semihosting, HTIF.
  • Emulates a virt system similar to qemu with: UART, CLINT, PLIC, basic boot-loader, DTB generation.
  • Boots nommu Linux images! Big thanks to CNLohr for mini-rv32ima.

Getting Started

pip install tinyrv

Print all RISC-V instructions in a binary:

tinyrv-dump firmware.bin

Outputs for firmware.bin from picorv32:

00000000: custom0                                  # INVALID data=0x800400b
00000004: custom0                                  # INVALID data=0x600600b
00000008: jal        zero, 0x3e0                   # rv_i
0000000c: addi       zero, zero, 0                 # rv_i
00000010: custom0                                  # INVALID data=0x200a10b
00000014: custom0                                  # INVALID data=0x201218b
00000018: lui        ra, 0                         # rv_i
0000001c: addi       ra, ra, 0x160                 # rv_i
00000020: custom0                                  # INVALID data=0x410b
00000024: sw         sp, 0(ra)                     # rv_i
00000028: custom0                                  # INVALID data=0x1410b
0000002c: sw         sp, 4(ra)                     # rv_i
00000030: custom0                                  # INVALID data=0x1c10b
00000034: sw         sp, 8(ra)                     # rv_i
00000038: sw         gp, 12(ra)                    # rv_i
0000003c: sw         tp, 16(ra)                    # rv_i
...

picorv32 uses some custom instructions for IRQ handling.

Decode instructions from data:

tinyrv-dump 0xf2410113 0xde0ec086 0x2013b7

or in python:

import tinyrv
for op in tinyrv.decoder(0xf2410113, 0xde0ec086, 0x2013b7):
    print(op)

Outputs four instructions (the second word contains actually two 16-bit compressed instructions):

addi       sp, sp, -220
c.swsp     ra, 0x40(sp)
c.swsp     gp, 0x3c(sp)
lui        t2, 0x201000

Each decoded instruction comes with a lot of metadata and parsed arguments:

op = tinyrv.decode(0xf2410113)
print(hex(op.data), op.name, op.extension, op.variable_fields, bin(op.mask), bin(op.match), op.valid())
print(op.args, op.rd, op.rs1, op.imm12)
print(op.arg_str())
0xf2410113 addi ['rv_i'] ['rd', 'rs1', 'imm12'] 0b111000001111111 0b10011 True
{'rd': 2, 'rs1': 2, 'imm12': -220} 2 2 -220
sp, sp, -220

Simulate a binary:

rv = tinyrv.sim(xlen=32)  # xlen affects overflows, sign extensions
rv.copy_in(0, open('firmware.bin', 'rb').read())
print(rv.x)  # print registers
print()
rv.step()  # simulate a single instruction at rv.pc

Outputs:

x00(ro)=00000000  x08(fp)=00000000  x16(a6)=00000000  x24(s8)=00000000
x01(ra)=00000000  x09(s1)=00000000  x17(a7)=00000000  x25(s9)=00000000
x02(sp)=00000000  x10(a0)=00000000  x18(s2)=00000000  x26(10)=00000000
x03(gp)=00000000  x11(a1)=00000000  x19(s3)=00000000  x27(11)=00000000
x04(tp)=00000000  x12(a2)=00000000  x20(s4)=00000000  x28(t3)=00000000
x05(t0)=00000000  x13(a3)=00000000  x21(s5)=00000000  x29(t4)=00000000
x06(t1)=00000000  x14(a4)=00000000  x22(s6)=00000000  x30(t5)=00000000
x07(t2)=00000000  x15(a5)=00000000  x23(s7)=00000000  x31(t6)=00000000


00000000: unimplemented: 0800400b custom0
00000000: custom0                                  # [0]

Simulation halts at the first instruction that is not implemented. Just set the pc and carry on:

rv.pc = 8
rv.run(50)
00000008: jal        zero, 0x3e0                   # [1]

000003e0: addi       ra, zero, 0                   # [2] ra=00000000
000003e4: addi       sp, zero, 0                   # [3] sp=00000000
(... boring initialization stuff skipped ...)
00000454: addi       t5, zero, 0                   # [31] t5=00000000
00000458: addi       t6, zero, 0                   # [32] t6=00000000
0000045c: lui        sp, 0x20000                   # [33] sp=00020000
00000460: jal        ra, 0xbdc                     # [34] ra=00000464

00000bdc: lui        a0, 0xc000                    # [35] a0=0000c000
00000be0: addi       a0, a0, 0x79c                 # [36] a0=0000c79c
00000be4: jal        zero, 0xb08                   # [37]

00000b08: lui        a4, 0x10000000                # [38] a4=10000000
00000b0c: lbu        a5, 0(a0)                     # [39] mem[0000c79c]->68 a5=00000068
00000b10: bne        a5, zero, 0xb18               # [40]

00000b18: addi       a0, a0, 1                     # [41] a0=0000c79d
00000b1c: sw         a5, 0(a4)                     # [42] 00000068->mem[10000000]
00000b20: jal        zero, 0xb0c                   # [43]

00000b0c: lbu        a5, 0(a0)                     # [44] mem[0000c79d]->65 a5=00000065
00000b10: bne        a5, zero, 0xb18               # [45]

00000b18: addi       a0, a0, 1                     # [46] a0=0000c79e
00000b1c: sw         a5, 0(a4)                     # [47] 00000065->mem[10000000]
00000b20: jal        zero, 0xb0c                   # [48]

00000b0c: lbu        a5, 0(a0)                     # [49] mem[0000c79e]->6c a5=0000006c
00000b10: bne        a5, zero, 0xb18               # [50]

Each jump, taken branch produces a newline, right-hand side has register changes and memory transactions. Memory is paged, allocated on demand and persists. This loop writes ascii chars to address 0x10000000 - the firmware apparently expects an UART there. Now let's get past this loop by setting a breakpoint:

rv.run(1000, bpts={0xb14})
rv.run(10)
...
00000b0c: lbu        a5, 0(a0)                     # [89] mem[0000c7a6]->64 a5=00000064
00000b10: bne        a5, zero, 0xb18               # [90]

00000b18: addi       a0, a0, 1                     # [91] a0=0000c7a7
00000b1c: sw         a5, 0(a4)                     # [92] 00000064->mem[10000000]
00000b20: jal        zero, 0xb0c                   # [93]

00000b0c: lbu        a5, 0(a0)                     # [94] mem[0000c7a7]->0a a5=0000000a
00000b10: bne        a5, zero, 0xb18               # [95]

00000b18: addi       a0, a0, 1                     # [96] a0=0000c7a8
00000b1c: sw         a5, 0(a4)                     # [97] 0000000a->mem[10000000]
00000b20: jal        zero, 0xb0c                   # [98]

00000b0c: lbu        a5, 0(a0)                     # [99] mem[0000c7a8]->00 a5=00000000
00000b10: bne        a5, zero, 0xb18               # [100]
00000b14: jalr       zero, 0(ra)                   # [101]

00000464: addi       ra, zero, 0x3e8               # [102] ra=000003e8

00000468: unimplemented: 0a00e00b custom0
00000468: custom0                                  # [103]

Another custom IRQ instruction. Let us ignore those and add a UART by subclassing tinyrv.sim. This code example is from tests/fwsim.py:

import tinyrv, struct
from tinyrv.system import uart8250

class fwsim(tinyrv.sim):
    def __init__(self, xlen=64, trap_misaligned=True):
        super().__init__(xlen, trap_misaligned)
        self.uart = uart8250(self)
    def _custom0   (self, **_): self.pc+=4
    def notify_stored(self, addr):
        if addr == 0x10000000: self.uart[0] = struct.unpack_from('B', *self.page_and_offset(addr))[0]

def main():
    rv = fwsim(xlen=32, trap_misaligned=False)
    rv.copy_in(0, open('tinyrv-test-blobs/picorv32_fw/firmware.bin', 'rb').read())
    rv.pc = 0
    rv.run(10, trace=False)
    rv.run(0, bpts={0x3e0}, trace=False)

if __name__ == '__main__': main()

It shows several features of tinyRV:

  • sub-classes can implement additional instructions by simply defining methods named '_'+instruction_name. All parameters are passed as kwargs - ignore all unused arguments with **_.
  • The custom0 instruction is defined as nop. It only advances the PC.
  • Several callbacks and hooks are available for sub-casses. Here we use notify_stored to catch writes to 0x10000000 and forward them to a UART. Other callbacks are: notify_loading, hook_csr, hook_exec.
  • copy_in loads binary data into memory at a given address. Here: 0.
  • First run will get past the reset-vector (at 0x3e0), the second run continues simulation until the next reset.

Running this code will output data sent to the UART:

hello world
lui..OK
auipc..OK
j..OK
jal..OK
jalr..OK
beq..OK
bne..OK
blt..OK
bge..OK
...

TinyRV comes with two virtual machines that can be lauched from the command line. Call with -h for more info:

  • tinyrv-user-elf simulates ELFs in a minimal user environment. Use this for running cross-compiled user programs.
  • tinyrv-system-virt emulates a system similar to qemu's virt. Use this to boot kernels.

Dev Setup

All core simulator code is in tinyrv/tinyrv.py and has no external dependencies. tinyRV loads opcode specs from tinyrv/opcodes.py, which is auto-generated from riscv-opcodes by tinyrv_opcodes_gen.py.

Do this to re-generate:

git clone https://github.com/riscv/riscv-opcodes.git
make -C riscv-opcodes
python3 tinyrv_opcodes_gen.py

Some VMs use external libraries:

Testing

Install riscv-gnu-toolchain or homebrew-riscv (for MacOS).

RISCOF

Install the RISC-V compatibility framework RISCOF:

pip3 install setuptools wheel
git clone https://github.com/riscv/riscof.git
cd riscof
pip3 install -e .

Install the Sail ISA specification language:

brew install opam zlib z3 pkg-config
opam init
opam switch create ocaml-base-compiler
opam install sail
eval $(opam config env)

Install the RISCV Sail Model:

git clone https://github.com/riscv/sail-riscv.git
cd sail-riscv
ARCH=RV32 make c_emulator/riscv_sim_RV32
ARCH=RV64 make c_emulator/riscv_sim_RV64
# copy / link c_emulator/riscv_sim_RV{32,64} into $PATH location

Optionally, install Spike RISC-V ISA Simulator:

git clone https://github.com/riscv-software-src/riscv-isa-sim.git
cd riscv-isa-sim
mkdir build
cd build
../configure --prefix=/path/to/install  # /path/to/install/bin must be in $PATH
make
make install
spike  # test

Then, run the tests:

make -C tests run_riscof

riscv-software-src/riscv-tests

make -C tests run_riscv_tests

This will automatically clone and build the test suite if necessary.

Run Coremark

make -C tests run_coremark

This will automatically clone and build coremark if necessary.

Boot Linux

make -C tests run_linux

Will download an Image and boot it. Boottime is about a minute on a reasonable fast machine and recent Python.

The Image was made using buildroot. The configuration is based on qemu_riscv64_nommu_virt_defconfig with FPU, compressed instructions and other extensions turned off.

To (re-)create this Image, run:

make -C tests Image

This will download buildroot and build the kernel (takes some time). The build configuration is in tests/br2_external_tinyrv.

Related

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinyrv-0.0.9.tar.gz (90.8 kB view details)

Uploaded Source

Built Distribution

tinyrv-0.0.9-py3-none-any.whl (82.2 kB view details)

Uploaded Python 3

File details

Details for the file tinyrv-0.0.9.tar.gz.

File metadata

  • Download URL: tinyrv-0.0.9.tar.gz
  • Upload date:
  • Size: 90.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for tinyrv-0.0.9.tar.gz
Algorithm Hash digest
SHA256 f751c7f123058a0454f16e15589148d5c789b23c8c000be12cf928eab1665a74
MD5 4ae696f99690db32f42e3b9a27b64f6e
BLAKE2b-256 a1463d57dd7d6cd6e91e12e621ee956e4bd81bf0404ec9663997b4012398d8c5

See more details on using hashes here.

File details

Details for the file tinyrv-0.0.9-py3-none-any.whl.

File metadata

  • Download URL: tinyrv-0.0.9-py3-none-any.whl
  • Upload date:
  • Size: 82.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for tinyrv-0.0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 b4c19d628cb8852a201113805bacefe6a169c7407d8bc1f11e95b312d5821e5a
MD5 9adbf8b3a999b86a621929eff0707575
BLAKE2b-256 ea5587a2684b45c57d7778f054fb51ba7f037b76298ae6a8b7f201b2a31ee806

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page