Part 1: Custom Rust toolchain

Return to Rust on the CH32V003
Published on 2023-03-27 by Noxim. 2073 words, about 11 minutes

In previous post we covered some ground work on what RV32E is and what we need to do to build Rust code targeting it. After some experimentation we found LLVM patch differential D70401 that implements all the required codegen changes for us. In this post we will be going over how to build our own Rust toolchain to make use of it.

Custom LLVM

To get started, we first need to acquire the sources for Rust's compiler. All of these steps are fairly well detailed in the Rustc Dev Guide, so if you are having trouble following along you can refer to it for more details. Note that you should be working somewhere with plenty of space: We will be compiling LLVM and Rust from scratch which can take tens of gigabytes of space for all the build artifacts.

$ cd ~
$ git clone https://github.com/rust-lang/rust.git
$ cd rust

By default Rust will build against a precompiled LLVM build. The build script simply called x.py will download the binaries and start the build process. We are however interested in modifying LLVM itself, so we need to get it's sources as well and configure x.py to build from scratch.

LLVM lives in a git submodule of the Rust project. Let's head to src/llvm-project and get ourself a copy of the sources.

$ git submodule update --init --recursive
$ cd src/llvm-project

Now we have a clean starting point for our work. Let's get started with D70401. Download the differential from LLVMs Phabricator. This is plain old unix diff which you can apply with git. Apply it and cross your fingers that the merge conflicts won't be too painful. Depending on what the status of D70401 and the LLVM version used by Rust there can be quite a lot of conflicts to merge. If everything get's very unwildy, you can of course use my LLVM branch with the patch already applied.

$ wget https://reviews.llvm.org/D70401?download=true -O D70401.diff
$ git apply D70401.diff

If you survived the battlefield of merge conflicts, you are ready to build our fork. To make sure everything picks up the LLVM changes correctly, let's commit them to our submodule.

# Apply changes to LLVM
$ git add .
$ git commit -m "Apply D70401"
$ cd ..
# Apply changed LLVM to Rust
$ git add src/llvm-project
$ git commit -m "Use custom LLVM"

The Rust project's build script x.py requires a configuration setup to function correctly. Normally we could use the ./x.py setup subcommand, but I am going to show you the entire configuration file here. Go back to the projects root directly and save the configuration as config.toml. If you want to read more about what each of the options do, check out the extensive documentation available in config.toml.example

$ cd ~/rust
$ cat config.toml
# Use defaults for codegen affecting custom builds
profile = "codegen"

[llvm]
# Use our own LLVM build instead of downloading from CI
download-ci-llvm = false

[rust]
# Enable building LLD for our target as well
lld = true

Now we are ready to start a build. We have configured a full from-source build of both LLVM and Rust, so this will take quite a while. I recommend you to prepare a hot beverage of your own preference.

$ python x.py build
...
Build completed successfully in 0:31:27

Hey, that worked! Amazing. We have successfully built Rust against our custom LLVM. As a side effect there are bunch of bintools in the build directory. We will need some of these for debugging later on, so don't clear it. For convenience, let's add our new Rust toolchain to rustup and check it's working.

$ rustup toolchain link custom-rv32e ~/rust/build/host/stage1
$ rustc +custom-rv32e --version --verbose
rustc 1.70.0-dev
binary: rustc
commit-hash: unknown
commit-date: unknown
host: x86_64-pc-windows-msvc
release: 1.70.0-dev
LLVM version: 15.0.7

Armed with new tools, let's take the minimal Rust crate from previous post and try building it again.

$ cd ~/hello-wch
$ cargo +custom-rv32e build
   Compiling core v0.0.0 (~/rust/build/x86_64-pc-windows-msvc/stage1/lib/rustlib/src/rust/library/core)
   Compiling compiler_builtins v0.1.87
   Compiling rustc-std-workspace-core v1.99.0 (~/rust/build/x86_64-pc-windows-msvc/stage1/lib/rustlib/src/rust/library/rustc-std-workspace-core)
   Compiling hello-wch v0.1.0 (~/hello-wch)
error: `#[panic_handler]` function required, but not found

Nice, we got further. Panics are a language level feature in rust that requires a some sort of handler to be present. In your usual code the handler prints a message and optionally dumps the backtrace. This is achieved through using panic unwinding, a method that undoes the function callstack. It is somewhat complex to implement and uses precious bytes on our small microcontroller, so let's just choose to handle panics by stopping. The panic-halt crate will simply enter an infinite loop in the case of a panic. Add the following import to main.rs

use panic_halt as _;

Then also add it to your Cargo.toml and let's build again with our toolchain

$ cargo add panic-halt
$ cargo +custom-rv32e build
Compiling hello-wch v0.1.0 (~/hello-wch)
error: linking with `rust-lld` failed: exit code: 1
  = note: "rust-lld" "-flavor" "gnu" "~/AppData/Local/Temp/rustc3OWSgx/symbols.o" 
  "~/hello-wch/target/riscv32ec-unknown-none-elf/debug/deps/hello_wch-e87bc7451cecb7dc.1u8c89d3z18nbgyn.rcgu.o" "--as-needed" "-L" 
  "~/hello-wch/target/riscv32ec-unknown-none-elf/debug/deps" "-L" "~/hello-wch/target/debug/deps" "-L" 
  "~/rust/build/x86_64-pc-windows-msvc/stage1/lib/rustlib/riscv32ec-unknown-none-elf/lib" "-Bstatic" 
  "~/hello-wch/target/riscv32ec-unknown-none-elf/debug/deps/libpanic_abort-4b4e2d6913f294a2.rlib" 
  "~/hello-wch/target/riscv32ec-unknown-none-elf/debug/deps/librustc_std_workspace_core-ff579916435eeff7.rlib" 
  "~/hello-wch/target/riscv32ec-unknown-none-elf/debug/deps/libcore-127215e2b9f4f97d.rlib" 
  "~/hello-wch/target/riscv32ec-unknown-none-elf/debug/deps/libcompiler_builtins-b07df0b4d59d00d7.rlib" "-Bdynamic" "-z" "noexecstack" "-L" 
  "~/rust/build/x86_64-pc-windows-msvc/stage1/lib/rustlib/riscv32ec-unknown-none-elf/lib" "-o" 
  "~/hello-wch/target/riscv32ec-unknown-none-elf/debug/deps/hello_wch-e87bc7451cecb7dc" "--gc-sections"
  = note: rust-lld: error: ~/hello-wch/target/riscv32ec-unknown-none-elf/debug/deps/hello_wch-e87bc7451cecb7dc.1u8c89d3z18nbgyn.rcgu.o: cannot link object files with different EF_RISCV_RVE
          rust-lld: error: ~/hello-wch/target/riscv32ec-unknown-none-elf/debug/deps/libpanic_abort-4b4e2d6913f294a2.rlib(panic_abort-4b4e2d6913f294a2.panic_abort.6694c8aa-cgu.0.rcgu.o): cannot link object files with different EF_RISCV_RVE

Making rustc RV32E-aware

Okay, that's a bit weird. Rust is at least producing some code now, but is failing to link the build files together. More specifically, the error is saying that our build files have conflicting flags. This is where our custom bintools come in handy. We can access them in ~/rust/build/host/llvm/bin/. You can either add them to your PATH temporarily or just invoke with the full path. While you're at it, set our custom toolchain as the default for our directory so we don't have to write the +custom-rv32e every time we invoke Cargo. Now we are ready to start debugging. Let's check the mentioned files with readelf:

$ llvm-readelf --file-header target/riscv32ec-unknown-none-elf/debug/deps/hello_wch-*
File: hello-wch
ELF Header:
  Magic:   7f 45 4c 46 01 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              EXEC (Executable file)
  Machine:                           RISC-V
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          52 (bytes into file)
  Start of section headers:          6156 (bytes into file)
  Flags:                             0x9, RVC, RVE
  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         3
  Size of section headers:           40 (bytes)
  Number of section headers:         14
  Section header string table index: 12
$ llvm-readelf --file-header %LOCALAPPDATA%/Temp/rustc3OWSgx/symbols.o
~/rust/build/host/llvm/bin/llvm-readelf.exe: error: '~/AppData/Local/Temp/rustc3OWSgx/symbols.s': no such file or directory

Huh, the other file, that oddly does not live in our target directory, does not even seem to exist after compilation has ended. On the other hand we see that our object file in target/ is atleast built correctly with the RVE ELF flag. But what is this mysterious symbols.o that lives in %LOCALAPPDATA%?

We can bust out everyones favorite search tool, ripgrep, and go hunting. Based on the filepath we can assume the file is created by rustc and not LLVM, so let's start there.

$ rg symbols.o ~/rust/compiler/
compiler/rustc_codegen_ssa/src/back/link.rs
1787:    let path = tmpdir.join("symbols.o");

Alright, this looks promising. Opening the source file and going to the appropriate line, we find the suspect along with a long comment explaining what is going on. In short: There was a bug where public static items got removed if they were not referred to in the Rust code. The workaround was to create this temporary symbols.o object file that contains references to said items so that they will not be accidentally dropped during build, but also will get correctly garbage collected if unused.

So, the temporary symbols.o is being created without the proper ELF flag. The file is produced in self::metadata::create_object_file, which already contains some flag handling for RISC-V. The ELF handling here is mostly provided by the object crate, which conveniently is also missing definitions for the E extension. After a quick patch in that project we can add the correct flag bit to our symbols.o

183// Check if embedded base extension is in use
184if features.contains("+e") {
185 e_flags |= elf::EF_RISCV_RVE;
186}

Let's rebuild the compiler and try it again. This one should go by a lot faster than the last compilation.

$ python x.py build
Build completed successfully in 0:02:37
$ cargo build
   Compiling hello-wch v0.1.0 (~/hello-wch)
   Finished dev [unoptimized + debuginfo] target(s) in 0.30s
$

With a new build of the compiler, and a bit of wishing, our build finally goes through! We can now take a look at the fresh binary and sanity check that the assembly looks at least a little bit correct. After a little bit of poking around, the binary seems to not contain any references to registers x16-x31. We have succesfully compiled code for a new target!

happy ferris interjects: Hooray! Yet another computer has been carcinizated

Making it official

Using a JSON target specification file is generally fine, but if we are forking Rust anyways, let's make RV32E an official target. rustc's target specifications live (very surprisingly) in a crate called rustc-target. Each target specification is just a function called target that returns a struct that matches our JSON file. Let's create one for RV32EC that we will use in compiler/​rustc_target/​src/​spec/​riscv32ec_unknown_none_elf.rs

use crate::spec::{Cc, LinkerFlavor, Lld, PanicStrategy, RelocModel, Target, TargetOptions};

pub fn target() -> Target {
    Target {
        data_layout: "e-m:e-p:32:32-i64:64-n32-S32".into(),
        llvm_target: "riscv32".into(),
        pointer_width: 32,
        arch: "riscv32".into(),

        options: TargetOptions {
            linker_flavor: LinkerFlavor::Gnu(Cc::No, Lld::Yes),
            linker: Some("rust-lld".into()),
            cpu: "generic-rv32".into(),
            features: "+e,+c".into(),
            llvm_abiname: "ilp32e".into(),
            max_atomic_width: Some(0),
            atomic_cas: false,
            panic_strategy: PanicStrategy::Abort,
            relocation_model: RelocModel::Static,
            emit_debug_gdb_scripts: false,
            eh_frame_header: false,
            ..Default::default()
        },
    }
}

Then to include our new specification in the build we add our module name to the big supported_targets! macro in compiler/​rustc_target/​src/​spec/​mod.rs

1191 ...
1192 ("x86_64-unknown-hermit", x86_64_unknown_hermit),
1193 ("riscv32ec-unknown-none-elf", riscv32ec_unknown_none_elf),
1194 ("riscv32i-unknown-none-elf", riscv32i_unknown_none_elf),
1195 ...

With this, Rust should now have internal knowledge of our new target and we can say goodbye to managing these target spec json files. Rebuild the compiler with x.py and let's build our own crate again, this time with our newly built-in target. Remove the JSON specification file and change the .cargo/config.toml to just use the bare riscv32ec-unknown-none-elf (without .json) as the target.

$ python x.py build
Build completed successfully in 0:03:03
$ cargo build
   Compiling hello-wch v0.1.0 (~/hello-wch)
   Finished dev [unoptimized + debuginfo] target(s) in 0.42s
$

In the next post we will be putting our new Rust toolchain to work and bring up an actual CH32V003 board.

Previous: Part 0: Introduction
Next: Part 2: Boot and blink