248 lines
14 KiB
Markdown
248 lines
14 KiB
Markdown
---
|
|
layout: post
|
|
title: Booting the Bootloader
|
|
date: 2024-08-11
|
|
---
|
|
|
|
a.k.a "How I wrote a FreeBSD Bootloader"
|
|
|
|
A few days ago I wrote [a FreeBSD bootloader](https://git.mildlyfunctional.gay/artemist/freeloader/).
|
|
There are a few reasons why I did so (I want a better bootloader for [NixBSD](https://github.com/nixos-bsd/nixbsd) and
|
|
I want something like [lanzaboote](https://github.com/nix-community/lanzaboote) for FreeBSD),
|
|
but mostly I wrote it because I could.
|
|
|
|
It runs as a UEFI application, reads a kernel from the filesystem, loads it into memory, sets arguments,
|
|
and executes it, all without using any of the upstream FreeBSD "stand" code. The code sucks, changing any
|
|
settings requires recompiling, it's missing features, and it certainly won't be portable. But it works.
|
|
As far as I can tell, the only other project that has done that is grub.
|
|
|
|
After I posted about it [on fedi](https://social.mildlyfunctional.gay/@artemist/112907665770518105) someone asked
|
|
if I was going to write about my project. That seemed like a fun idea, but I'm not sure how useful
|
|
a rant about weird FreeBSD design decisions would be to anybody, so instead I'll talk more about my thought
|
|
process for reimplementing
|
|
|
|
## 1. Setting the Scope
|
|
When I program I try to take everything into account. I'll constantly be trying to answer questions like:
|
|
* "What happens if the firmware is buggy?"
|
|
* "What if I want to port this to ARM later?"
|
|
* "What if meow?"
|
|
|
|
This can be useful when I'm trying to write secure fault-tolerant production code,
|
|
but it's mostly a hindrance when I'm trying to just get something to work.
|
|
|
|
Therefore, the first step is setting the smallest scope where I've still accomplished something.
|
|
This can be a bit flexible, but for this project I wanted: "load a FreeBSD kernel with serial or graphical output
|
|
from a fixed path in an x86_64 VM". I didn't even put "boot from a root filesystem" in scope, but it turned out
|
|
that was trivial.
|
|
|
|
This gives me a sense of accomplishment early in the process and helps banish the "what if" demons [^demon] in my head.
|
|
|
|
## 2. Understanding the Problem
|
|
Before starting any programming I like to have a good idea of what I'm interfacing with.
|
|
This tends to mean first learning more general "How do I use $thing" information
|
|
then moving onto "How does $thing work".
|
|
It's no use knowing how to encode kernel environment if you have no idea what kernel environment is.
|
|
|
|
While reading documentation sometimes gives me a starting point, it's rarely enough so I quickly
|
|
end up experimenting, trying debug features, tracing, and reading the code.
|
|
|
|
A lot of these suggestions apply whether or not you have source code.
|
|
You can try a bunch of inputs, `strace`, dump memory, find important functions, and sometimes enable debug logging
|
|
whether or not you have the code, code is just easier to search than binaries.
|
|
|
|
In this case already had a good idea of the user-visible parts of the boot process [^user-visible]
|
|
from working on the NixBSD bootloader so it was immediately time to figure out how the process works.
|
|
|
|
I spent around 2 days for this project just reading code and writing notes.
|
|
My notes skip general concepts I already know and just include reminders and lists of
|
|
information I might forget. They're probably not useful to anyone but me, but
|
|
could be useful in the future if I want to write documentation.
|
|
|
|
It would probably behoove me to add important code references to my notes,
|
|
but I mostly end up looking through my search history trying to find what I was looking at.
|
|
Please don't do this.
|
|
|
|
(have a sample of my [notes](https://git.mildlyfunctional.gay/artemist/freeloader/src/commit/fb7dcf0f401cad2fb124044df8104747c008a2ed/notes.md) to get an idea of what they include)
|
|
```markdown
|
|
## Modinfo
|
|
Loader must provide modinfo to kernel, a TLV structure
|
|
|
|
* Dump from normal FreeBSD with `sysctl debug.dump_modinfo`
|
|
* Tag is `MODINFO_*` or `MODINFO_METADATA | MODINFOMD_*`
|
|
* Tag and length are 4 bytes native endian
|
|
* Value is padded to align to `sizeof(size_t)`
|
|
* Strings are null-terminated
|
|
* Encodes multiple modules in sequence, separated by `MODINFO_NAME` string
|
|
|
|
### Fields
|
|
* `MODINFO_NAME`: string with path to file if available
|
|
```
|
|
|
|
FreeBSD keeps the loader ("stand") and kernel ("sys") code mostly separate, so I simultaneously reverse engineered the
|
|
loader serialization and kernel deserialization code.
|
|
|
|
Before I could do much of anything though, I needed to know where to look.
|
|
The easiest starting points are often the beginning or end of a program,
|
|
in this case the kernel's entry point and the part of the loader that jumps to it.
|
|
|
|
The kernel's entry point (`btext`) was relatively easy to find with `readelf -Wa kernel`.
|
|
The readelf command gave me the address of the entry point. Since I was using a kernel with
|
|
debug symbols, the address is linked to the function name later in the ELF output,
|
|
so a quick search gave me the name, and from there [the function](https://cgit.freebsd.org/src/tree/sys/amd64/amd64/locore.S?h=release/14.1.0#n63). [^script]
|
|
|
|
The loader's exit point was also easy to find. In the standard elf header entry is called
|
|
`e_entry`, so I used [ripgrep](https://github.com/BurntSushi/ripgrep) with `rg e_entry`
|
|
and immediately found [the function](https://cgit.freebsd.org/src/tree/stand/efi/loader/arch/amd64/elf64_freebsd.c?h=release/14.1.0#n91).
|
|
|
|
From there I traced where important variables are changed, which quickly led me to the
|
|
|
|
TODO
|
|
|
|
## 3. Writing the code
|
|
My goal when writing the code is to get something that superficially produces the right output
|
|
that I can fix later.
|
|
|
|
TODO
|
|
|
|
## 4. Debugging
|
|
TODO
|
|
|
|
## 5. Cleaning up the code
|
|
At this point, I generally have code that works, but is terrible. It might use hardcoded constants, have tons of unnecessary debug statements, have no configuration, or just barely work.
|
|
|
|
From here I have 3 options:
|
|
* Don't clean up the code, because I have no plans to use it anymore
|
|
* Iteratively clean up the code
|
|
* Rewrite the code from scratch with more foresight, maybe copying some parts over
|
|
|
|
A lot of my projects end up in the first category because they were just experiments to see if I could.
|
|
|
|
However, if I have any plans to use it in the future, the best option is normally to take a break.
|
|
A few days or weeks of thinking it over and talking normally help me figure out how to rewrite or improve
|
|
the code.
|
|
|
|
This is not always advice that I follow myself. The day after I got freeloader working,
|
|
I tried to refactor the `Serialize` trait, but ended up spending hours just making the code worse
|
|
and threw my work away.
|
|
|
|
A few days later I realized there was a much better way and could have avoided all that trouble.
|
|
|
|
|
|
## The Boot Process
|
|
With all that out of the way, here's what I discovered about the boot process:
|
|
|
|
The loader stuffs the kernel and all its dependencies into contiguous block of physical memory,
|
|
which it calls several things including `modulep` or just `addr`.
|
|
I call it the "staging buffer" since it's good a name as any.
|
|
On x86 [^x86] it must be aligned on a 2MiB boundary. [^buffer]
|
|
|
|
### The kernel
|
|
The first thing the loader puts in the staging buffer is the kernel.
|
|
Conveniently, the kernel is an [ELF](https://en.wikipedia.org/wiki/Executable_and_Linkable_Format),
|
|
also used for programs on Linux and FreeBSD, so there's plenty of existing code for parsing it. [^interp]
|
|
|
|
Like other ELF programs, the kernel specifies the location of metadata and code in its
|
|
Program Headers. [^phdr] Although there are a few types here, the loader only cares about `LOAD`
|
|
headers, each representing a segment of memory to copy.
|
|
|
|
Readelf's interpretation of my kernel's program header table:
|
|
```
|
|
Program Headers:
|
|
Type Offset VirtAddr PhysAddr FileSiz MemSiz Flg Align
|
|
PHDR 0x000040 0xffffffff80200040 0x0000000000200040 0x000268 0x000268 R 0x8
|
|
INTERP 0x0002a8 0xffffffff802002a8 0x00000000002002a8 0x00000d 0x00000d R 0x1
|
|
[Requesting program interpreter: /red/herring]
|
|
LOAD 0x000000 0xffffffff80200000 0x0000000000200000 0x17baa0 0x17baa0 R 0x200000
|
|
LOAD 0x17baa0 0xffffffff8037baa0 0x000000000037baa0 0xd5efd8 0xd5efd8 R E 0x200000
|
|
LOAD 0xedaa80 0xffffffff810daa80 0x00000000010daa80 0x425e1c 0x425e1c R 0x200000
|
|
LOAD 0x1400000 0xffffffff81600000 0x0000000001600000 0x000180 0x001000 RW 0x200000
|
|
LOAD 0x1600000 0xffffffff81800000 0x0000000001800000 0x1868b0 0x600000 RW 0x200000
|
|
DYNAMIC 0x1400000 0xffffffff81600000 0x0000000001600000 0x000180 0x000180 RW 0x8
|
|
GNU_RELRO 0x1400000 0xffffffff81600000 0x0000000001600000 0x000180 0x001000 R 0x1
|
|
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
|
|
NOTE 0x1300648 0xffffffff81500648 0x0000000001500648 0x0001c0 0x0001c0 R 0x4
|
|
```
|
|
|
|
Before it can copy though, the loader takes the `VirtualAddr` of the first `LOAD` segment and
|
|
keeps it as an offset. That offset lets the loader place the first segment at the beginning
|
|
of the staging buffer but keep the other segments at the correct relative positions. For example, if the staging buffer was at `0xacab_0000_0000`, then the loader would put the first segment of my kernel at `0xacab_0000_0000` and the second at `0xacab_0017_baa0`.
|
|
|
|
With that offset, the loader looks at each `LOAD` segment and copies from the kernel file (from `Offset` to `Offset + FileSiz` bytes in) to the staging buffer (from `VirtAddr - <load offset>` bytes in).
|
|
|
|
Note that in some cases `MemSiz > FileSiz`. The loader zeroes the excess amount in the staging buffer,
|
|
and the kernel uses it for uninitialized global variables (placed in the section `.bss`).
|
|
|
|
### The Kernel 2: Electric Boogaloo
|
|
At this point all the kernel's code is in RAM,
|
|
but it's missing the `.symtab` and `.symstr` sections [^symtab] that the kernel will need later to load modules.
|
|
|
|
The loader finds these sections by looking at the aptly-named Section Header Table.
|
|
Sections include info about the purpose of different parts of the file (e.g. `.text` for code, `.rodata` for constants)
|
|
that are useful for linkers but not normally needed to run a program.
|
|
|
|
Readelf's interpretation of my kernel's section header table:
|
|
```
|
|
Section Headers:
|
|
[Nr] Name Type Addr Off Size ES Flg Lk Inf Al
|
|
[ 0] NULL 0000000000000000 000000 000000 00 0 0 0
|
|
[ 1] .interp PROGBITS ffffffff802002a8 0002a8 00000d 00 A 0 0 1
|
|
...
|
|
[ 9] .text PROGBITS ffffffff8037c000 17c000 d5ea78 00 AX 0 0 4096
|
|
...
|
|
[58] .SUNW_ctf PROGBITS 0000000000000000 1abdde8 105984 00 59 0 4
|
|
[59] .symtab SYMTAB 0000000000000000 17869a8 189d38 18 60 43442 8
|
|
[60] .strtab STRTAB 0000000000000000 1910a49 1ad39c 00 0 0 1
|
|
```
|
|
|
|
The loader only needs to give the kernel `.symstr` — which lists the names of functions, global variables, and other "symbols" — and `.symtab` — which provides the address and type of those symbols.
|
|
The two sections are only useful with one another, so `.symtab` includes a link to its `.strtab`.
|
|
Readelf shows this with the `Lk` field, as in the table above.
|
|
|
|
Readelf's interpretation of my kernel's symtab and strtab:
|
|
```
|
|
Symbol table '.symtab' contains 67213 entries:
|
|
Num: Value Size Type Bind Vis Ndx Name
|
|
0: 0000000000000000 0 NOTYPE LOCAL DEFAULT UND
|
|
1: ffffffff8037c05a 0 NOTYPE LOCAL DEFAULT 9 l1
|
|
2: ffffffff8037c080 0 NOTYPE LOCAL DEFAULT 9 l2
|
|
3: ffffffff8037c570 10 FUNC LOCAL DEFAULT 9 camstatusentrycomp
|
|
4: ffffffff81808000 112 OBJECT LOCAL DEFAULT 48 sysctl___kern_features_scbus
|
|
...
|
|
```
|
|
|
|
With the "why" out of the way, the "how" is relatively simple. The loader:
|
|
* Searches the section header table for an entry with type `SYMTAB`
|
|
* Copies the length of the symtab section immediately after the kernel
|
|
* Copies the symtab section immediately after its length
|
|
* Copies the length of the linked strtab section after the symtab
|
|
* Copies the strtab section immediately after its length
|
|
|
|
This leaves the following structure immediately after the kernel (lower addresses on the bottom):
|
|
<table style="max-width: fit-content;">
|
|
<tr><td>strtab contents</td></tr>
|
|
<tr><td>strtab length</td></tr>
|
|
<tr><td>symtab contents</td></tr>
|
|
<tr><td>symtab length</td></tr>
|
|
</table>
|
|
|
|
The loader then remembers the start and end address of this structure for later.
|
|
|
|
TODO
|
|
### Environment
|
|
TODO
|
|
### Modinfo
|
|
TODO
|
|
### Booting
|
|
TODO
|
|
|
|
#### Footnotes
|
|
[^demon]: Wait, this is BSD, it's named "beastie" and I want to load it, not banish it
|
|
[^user-visible]: Things that a knowledgable system administrator might know about, like kernel environment, module loading, and memdisks
|
|
[^script]: I think I did this, but it's also possible that I used the [linker script](https://cgit.freebsd.org/src/tree/sys/conf/ldscript.amd64?h=release/14.1.0#n3)
|
|
[^x86]: I think the 2MiB alignment limitation is x86-specific because of the [horrible code](https://cgit.freebsd.org/src/tree/sys/amd64/amd64/machdep.c?h=release/14.1.0#n1273) that causes it, but I haven't actually tried any other architectures
|
|
[^buffer]: Historically on x86 this would start at 2MiB (physical address `0x20_0000`) but this isn't possible on modern systems where part could be reserved by EFI.
|
|
[^interp]: In fact it's so similar that users could accidentally run it as a program and get confused. To stop this, the kernel's interpreter is set to `/red/herring`.
|
|
[^elf]: or is it "elves"?
|
|
[^phdr]: Confusingly each entry is called a "Program Header" and is in the "Program Header Table"
|
|
[^symtab]: Technically `.symtab` and `.symstr` could be copied as part of a `LOAD` segment and the kernel will know where to look, but I haven't seen it.
|