-
Work on PC, Memory leaks, headdesks, and Valgrind
08/24/2014 at 20:47 • 0 commentsTime for another long, rambly post about dhcpext! Yay! Since last time, the first thing I did was start PC: specifically, to start with, have RX offload all of its work to PC, over a stream created by CT, and sent to both. RX is now simply a network endpoint, to forward traffic on to PC (although initially, in the event I needed to roll back in a hurry, I left all the code in rx.c, commented out.) This has the unfortunate, though temporary, side effect that, until RP is written, PC will contain some network code.
This happened without incident, so I moved on to having TX transmit all of the heartbeats it sends over the network to PC as well. This exposed a nasty bug in the stream API: if messages are sent too quickly, they have a tendency to overwrite each other, resulting in a memory leak, and lost information. In this instance the impact was small (the heartbeat wouldn't be printed to stdout), however it could easily get more serious if left unchecked, so a rewrite of the stream.c internals ensued (although the API stayed largely the same), to use linked lists to allow arbitrary storage of messages.
These heartbeats are stored in a linked list, which will be used later on to correlate when heartbeats and replies are received, and track timeouts. This is important, and caused me a lot of pain, as I'll discuss in just a second.
The next thing I did was move from usleep() to gettimeofday() in TX (rather than sleeping, compare times to when the heartbeats were last sent). This would, so the theory went, be more accurate, however exposed a massive memory leak in my code somewhere, which caused dhcpext to lock up entirely, followed by the kernel OOM manager to kill off every process barring init, which caused much head scratching.
Needless to say, I quickly rolled back this change, however the problem persisted, which meant that during one of my tests, I neglected to scp the software to the router. D'oh.
With that done, I set about to find the memory leak. Initally, this consisted of my "standard" debugging technique of littering printfs around the place, looking into various important variables. After a few days of getting nowhere with this technique, it was clear I needed something better. After some research, I found mtrace, which reports various issues with malloc() and free(), among other things.
After getting a trace on dhcpext using mtrace (it had to be compiled for my desktop for this to work), I had some idea of where the errors were occurring (although it wasn't entirely accurate, due to its limited reporting). After looking at the places it pointed me to, and its callers and callees, some errors were fixed, but the majority remained. I needed more information still.
After more digging, and much time spent on Stack Overflow reading up on people with similar issues, I found Valgrind. This is, to give the understatement of the year, an absolute godsend. There were several small-fry issues with the source that Valgrind pointed out to me, most of which added up to another major rewrite of stream - this time, with a new API to follow. This time, stream_rcv(_nblock) handles the allocation of memory, rather than the caller through stream_wait_full or stream_size. The caller still frees the data, however, as it could contain global pointers if a data structure is sent, that stream isn't aware of.
This fixed most issues, but a few still remained. Firstly, I stopped sending void**s, and started sending void*s and freeing. This fixed a bunch more errors, leaving me with about 8 at a time. 3 actual errors produced all of these: firstly, the linked list of sent heartbeats, in PC (it has a companion linked list for received heartbeats, which I'm not sure why I included and needs deleting), was being re-initialized every itteration of PCs main loop. This meant that memory usage grew rapidly, and every time the pointer to the last one was lost.
Secondly, at no point was the linked list contents actually freed. This should have been done when the thread was shut down. Later on, the linked list will be emptied real time, as replies are received, or the entries have existed for >400ms. But, not yet.
Thirdly, a race condition. If PC shut down just before TX (async, remember), data could be sent through the stream between the two, and never actually received and freed. This was a reordering of instructions in main.c.
So, there you have it. I've learned a lot about debugging tools, and have a newfound respect for them, and the code was cleaned up dramatically. Next, I want to finish off PC as best I can without UCI integration, merge into the master branch (which is happening regularly right now, as I clean up the code), try again with gettimeofday, and clean up the code some more, with other tools like Coverity, and enabling compiler warnings (and no, not just -Wall). I think we could also do with a bug tracker at this point.
-
Starting the Implementation of the Protocol
08/17/2014 at 15:18 • 0 commentsThe protocol started to be implemented recently (see proto.c in the Github repository, for reference). Of course, as with any project log, this had its ups and downs (and currently, isn't finished; part of the protocol is implemented, however what remains cannot be completed until later, when the code begins to talk to the UCI). You may have noticed a lot of issues and problems cropping up in the project log, often several per post. That's not us being incompetent (I don't think), but rather, trying to be 100% transparent.
The last time I posted on here, the heartbeats and replies were literally the strings "heartbeat" and "reply". Obviously, that's not much help at all for sending any useful information over the network, so we needed to start on the actual project now that communication was established.
2 new data structures were defined: one for the heartbeat, and one for the reply. They share the same base set of fields (identification number, used to discern between the two when received over the network), flags, and a magic number. The fields following this, the optional fields, differ, however right now they aren't used (although are present in the structure), so I won't go over them here.
4 routines also got added, per data structure. One to generate the structure, one to serialize it for sending over the network, one to deserialize it back into a data structure, and one to output it to stdout, for debugging purposes. A 9th routine (remember 4*2, because 2 data structures) was added, to tell the two apart. These were already stubbed in proto.c, however as I haven't gone over them yet, I shall repeat it here.
The first implementation of these did not work, and transmitted 0s for all fields. As did the second version, and the third as well (none of which were committed to Github, due to the principle of not breaking the build tree). The itterations we went through were: copying bytes directly from structure to buffer (and back again) using some pointer magic. Then, we went to using memcpy. Then we moved to using types declared in stdint.h, and at last it worked, just about. This happened in stages, hence the 3 versions above. The move to memcpy should be pretty obvious, however the move to stdint.h is a bit less so, and an issue I've certainly never ran into before. As far as I've ever been concerned, an int in C has always been 32 bits, and I've always assumed them to be so. However, and I should have known this before, different architectures assume their basic types to be different sizes, which can change (MIPS, which is what we're developing for, incidentally uses 64 bits, I believe, for ints). Hence, a 64 bit magic number field was created, the lower 32 bits stuffed with a randomly generated (from /dev/urandom), and (thanks to the way pointers work for little endian) the upper 32 bits copied to the serialized bitstream, to be sent over the network. Using uint32_t fixed this (this was the 4th change: from int32_t to uint32_t. Fundamentally it makes no difference, as it will work either way, but it makes debugging easier, as it means minus signs aren't just floating around in the output, making it easier to read.)
That made the protocol fully implemented, as good as it was going to get without linking to UCI, which won't happen until much, much later. The next step is to build the 3rd thread, PC, for processing the generated and received messages.
-
Multithreading, libpthread, and stream closing
08/02/2014 at 12:41 • 0 commentsNow that we're back in business, and working on the code again, I find myself lost for work; there's a lot to do, but no clear direction. After some much needed coffee, however, the direction becomes clear - split the project into 5 threads: tx, which manages the transmission of the heartbeats, rx, which manages receiving everything, pc, which processes the incoming packets, and rp, which replies to any and all heartbeats it needs to. The final thread, ct, will orchestrate these, shutting them down, setting options, and communicating with UCI.
Now that we have a clear architecture, the place to start is obvious - get a working tx thread, and a stub rx, which also replies (for now...). That's easy enough, and takes only a couple of hours of work. I use my inter-thread stream library, which I developed for another project, in the code, and hooked SIGINT to properly shut down the streams.
I won't go into any more depth with regards to the code, as most of it was simple coding, flowing from my brain, through my fingertips, and into vi, in that much coveted state of 'flow'. We've all done it before. What is interesting, however, is the snag I encountered upon compiling the project - no libpthread in my SDK. After much Googling, I concluded that my SDK needed to be recompiled, with libpthread selected in menuconfig. I was surprised that this wasn't the default behaviour, but oh well. Now it compiles, I SCP it over to the router, only to find libpthread missing yet again.
This one was harder to find the solution for; when I began to think I'd need to build a custom image for the router (I'd already tried, to no avail, manually copying libpthread.so over), I stumbled upon this resource, and my prayers were answered. Downloading libpthread.ipk, and SCPing it over, it finally installs, and DHCPExt is on the router. Repeat for #2 and #3.
To test, I ran dhcpext 10.255.255.255 10.55.223.2 1 on node #1, and dhcpext 10.255.255.255 10.0.0.1 0 on node 2. The parameters are, in order, address to broadcast heartbeats on, address to directly transmit heartbeats to, and whether (1) or not (0) to broadcast the heartbeats. Success! Heartbeats are sent and received just fine, although there's no provisions to check if they're being replied to. I will check this later, however, as it's easily enough added.
Closing both pieces of software highlights an issue that, in hindsight, should have been obvious. Because recvfrom(2) is blocking, there's a window of a few microseconds between receiving a heartbeat, and sending the message from ct to close the thread - meaning, expert timing on Ctrl-C. This needs to be made nonblocking somehow, so that the thread can be closed properly and cleanly, rather than having to use the kill command, using a separate SSH session.
-
Starting on the Code, Massive issue
07/31/2014 at 10:28 • 0 commentsAs you can probably tell from the Github repo, work was started on the code a long time ago. Initally, a broadcast heartbeat could be sent, and a reply received. This in itself carried a small problem, owing to my limited experience in socket programming - a special permission must be used to successfully broadcast traffic. With that out the way, a simple heartbeat message (literally, the string "heartbeat") could be passed back and forth. At this stage, we're only one-shotting; there is no loop sending the packet repeatedly yet.
The next thing to try was with all 3 nodes. The last test used only 2. This exposed an issue that, in hindsight, should have been obvious - without extra processing, the heartbeat reply can only be received from one host. Something to add to the todo list. However, through multiple SSH sessions, I can confirm that both nodes set to receive did indeed receive (and reply to) the heartbeat. There is massive potential for race conditions here, and care must be taken when we work on this area of the code.
The next thing I tried was to read in the DHCP lease file. This was an absolute PITA. It wasn't hard, in any way, but owing to C's lackluster string processing capabilities, it took many more lines of code than it should, to parse a simple text database into an internal structure, and output it in a neatly formatted (kind-of) table. An exercise in patience, more than anything.
Now onto the issue we hit. After doing all of these, (DHCP client was enabled on wifi for the 2 nodes not acting as the active DHCP server), the next thing to do, or so I thought, was bridge wlan0 and eth0 together; this would simplify things later down the line, as it removes the requirement for NAT, which would make routing much harder. However, it brought issues of its own, namely, the fact that DHCP seems to want to go over eth0 rather than wlan0 (or both, as I thought it would, due to broadcast traffic). After almost 2 weeks of head-scratching, I stumbled upon a resource informing me that, for this to work, the macaddr option must be set in /etc/config/network, to the MAC address of wlan0. I do this, and nothing changes. It's then I notice that the MAC address of eth0 and wlan0 are the same, whether bridged or not. Damn.
There will be a solution, at least I hope, but for now, I've unbridged them, and will work like that. It's not ideal, but it's better than not having an IP on 2 of 3 nodes. One thing I haven't tried yet, and must at some point (as it is roughly how our software will work), is to use a static IP address, and then use the UCI tool (OpenWRTs network manager) to disable static IPs and enable DHCP. The reason this /may/ work, is that UCI may enable the DHCP client before it brings wifi online. I could be very wrong in this, but it would be nice if it were so simple.
-
Protocol for Dynamic Address Allocation
07/18/2014 at 20:50 • 2 commentsTime to handle task 2 in the sequence of events that needs to happen to get address allocation working on the mesh - namely, to draw up a specification for a protocol. We had one suggestion, for random address allocation (node assigns its own address, checks if it's free. If not, goto 1). We considered this, however decided against it due to the fact that we want to work with both IPv4 and IPv6, (with IPv4 being the priority right now due to its ubiquity, even though IPv6 would be more useful long term), and as a result, too many collisions would occur, making this incredibly inefficient. Furthermore, it would only push the problem elsewhere - how to get the parameters to the node, such as address range, subnet masks, etc. that are needed for allocation of an IP address. Supposing we don't then the node has free roam over the entire address space, meaning that it cannot be linked to other networks!
So, with that done, we set about coming up with a way of dynamically allocating IP addresses, in such a way that allowed us to stay reliable and fault tollerant. We quickly converged on the idea of a heartbeat signal, on top of vanilla DHCP - the server currently implementing DHCP would emit a heartbeat, regularly, and if it dies, somebody else takes to implementing DHCP. But, how to decide who gets this role? If we don't decide it, there's a race condition, and we could end up with 2 or more DHCP servers active at any given time, which can cause major issues with the network. We toyed with the idea of going in IP address order, but then decided that it would be easier if there was a determined order, in order of the nodes connecting to the network (or rather, getting their IP addresses).
With that done, there was a new issue - if a node goes down in the middle of this list somewhere, other nodes need informing, and the list needs to be kept up. We decided to put that information in the heartbeat itself. Another issue, solved in the same way, was synchronising the lease file, which every DHCP server has - if we don't, the new server could be allocating addresses the first one already did. So, put it in the heartbeat.
But how to detect if a node goes down, that isn't the active DHCP server? Why, have a circular buffer of nodes, of course. Each node sends a heartbeat signal to its next node, and when a node goes down, the next node can detect this, and inform the active DHCP server, while in the meantime starting to receive heartbeats from the node before it in the buffer. Simple, right?
But what when two networks merge? This was a bit of a deeper issue. It was obvious that one node needed to shut down its DHCP server, so the other could take over. But how to decide which one does, in a way that ensures reliability each time? We toyed with the idea of a coin toss (or its digital counterpart), but if both decide to shut down, or both decide the other should, then what? Then we thought about the fact that, in all likelihood, one would receive a heartbeat from the other at a different time than the opposite - the chances of both heartbeats getting to their destinations at the same time can be considered pretty much 0. So, the first to receive a heartbeat from the other should shut down, right? Wrong. There's another optimization, that was thought of while writing the formal specification - the server with the least number of nodes should shut down, and if they have the same number of nodes, then do a random thing.
With the above in place, it is now possible to start a network up automatically - simply have a node bring its DHCP online, start a heartbeat, and go into the above loop. If another DHCP server is already on the network, the new one will shut down, and that's the end of the matter. If not, there's now an active DHCP server.
There is a little more to it than that, and the specification in its current form can be found here.
-
Compiling for OpenWRT
07/17/2014 at 19:57 • 0 commentsRecently, we ran into a serious issue - namely, how exactly to go about allocating IP addresses to nodes, in a way that satisfies all properties needed of this project - easy to set up, little-to-no maintenance, ¿?fault tolerant, and generally out of the way. This immediately put out static IP addresses, due to its difficulty to orchestrate, and increased knowledge required. It also, however, eliminates DHCP, due to the fact that it requires a single point of failure - namely, the DHCP server itself.
Given this, we've got a tradeoff to make, right? Do we sacrifice ease of setting up, or fault tolerance? Honestly, I'd sooner keep both, given the chance. So, that's what we've set out to do - build a system that can dynamically allocate addressing schemes, that is fault tolerant, and fully automatic - that is to say, requires no maintenance, set up, or anything else end users really do NOT want to be doing.
This leaves us with 3 tasks - get compilation working on OpenWRT, so we can write software for it, write a protocol capable of performing this role, and write the software for OpenWRT to implement the protocol. This post will deal with the first of the three tasks, the end result of which will be a "Hello World" implementation running on the router.
Fairly obviously, I'll need a cross development system. Already I'm apprehensive, after what happened *last* time (and believe me when I say, there was a lot more to that story than I posted about; it was heavily condensed). Turns out, though, that the fine people who develop OpenWRT release an SDK - a toolchain, and packaging system rolled into one, that requires hardly any work to set up and use, or so the theory goes in any case.
They do have the SDK released as a bzipped tarball on their site, under downloads, for exactly the processor I'm compiling for. Surely it can't be as easy as that? Nope, 'fraid not. Turns out they only release their toolchains, and SDKs, for x86_64. I still use 32 bit, as I can't be bothered to go through the rigmarole of reinstalling an OS, just to double the amount of bits I get (and given the fact that I only have 2GB of RAM anyway, what's the point?)
So, in light of this, I need to search out the SDK from elsewhere - which means compiling the Buildroot system. Thought I'd gotten away with it, too. So I download this, update the feeds, and enter the menuconfig. I check the box saying "Build SDK" (or words to that effect), and run make. It builds, and I cannot find the SDK. Anywhere.
I go back into the menuconfig, and check "Build Toolchain", and remake it. Then I find the SDK. In the one place I didn't look. D'oh. I get this installed on my system, and write a simple "Hello world" application, with the makefiles associated (you need 2 - one for the environment setup, and one to actually build the thing). Running the SDK on this yielded me a package, which itself took me all my time to find.
I SCP this onto the router, fairly painlessly, and use ipkg to install it, only to find there is no ipkg on my router. This time, it's fairly simple: I was using outdated instructions, and since then, they've renamed ipkg to opkg. Using this, I install my newly minted package, successfully this time. Now for the real test.
root@OpenWrt:~# helloworld
Hello, world!
root@OpenWrt:~#
Success! Never thought I'd be quite so happy to get hello world running. -
On Address Allocation
07/15/2014 at 14:46 • 2 commentsOne of the large problems with mesh nets, that has plagued experts for many years, is the allocation and management of IP addresses. A short summary of this issue follows.
For the allocation of IP addresses in a functioning network, 2 options exist:
- Static IP allocation through the network administrator
- Dynamic IP allocation using DHCP or DHCPv6
In a given mesh network, the above solutions carry serious flaws, given the nature of mesh nets themselves: static IPs are insufficient, given the nature of our work, due to the fact that it raises the barrier to entry. Detailed instructions could work around this problem, however users are typically not willing to go to these kinds of lengths. Furthermore, it also loses significant flexibility, in that setting up a new node requires knowledge of every existing node on the network - meaning that nodes cannot change second by second, in an environment such as, for example, a municipal network consisting, in part, of roaming mobile telephones.
The other option, running a DHCP server, introduces a single point of failure in the system - should, for any reason, the network become segregated into 2, or the node hosting the DHCP server fail, the network is left in a state wherein no new nodes can request an IP address, meaning that, for all intents and purposes, the network is no longer functional. There is currently no way of synchronising 2 DHCP servers for automatic failover, barring the current standard for data centers, of having one server with 70% of the addresses in the address pool, and another server with the remaining 30% to act as a backup.
Since, in a mesh environment, there is no way of knowing which nodes will be accessible from where at a given time, since nodes could fail and the network could segregate, an alternative, or extension, to DHCP must be available to allow the splitting, merging, and reallocation of active DHCP servers, address pools, and node addresses. -
Working Mesh
07/14/2014 at 20:18 • 1 commentAfter the mishaps of a few weeks ago, of trying to compile a full Linux install from scratch for ARM, at the suggestion of a reader here on Hackaday Projects (mschafer), OpenWRT was adopted instead. This may change later, but for now this is the OS we are going with.
With that said, as mentioned in our previous post, we're also using existing hardware, again for the time being. So, as a result, we decided to go with a TP-Link TL-WR841N, for no more reason than the fact that it was the cheapest router available in any convenient location to me, that ran OpenWRT. So, I bought one, brought it home, double-triple-checked it really did run OpenWRT before opening it (thank you student budget), and went ahead and opened it, and wired it to my dev box over ethernet.
Issues with having DHCP for my home router and DHCP from the TP-Link give me 2 default routes notwithstanding, the flash itself was a relatively painless process: download the firmware image, run a firmware upgrade with that as the patch, sit back with a coffee for a few minutes, and watch as the default route issue comes back, but on 192.168.1.0/24 this time, rather than 192.168.0.0/24. Still, easy fix.
A couple of moments later, and I'm telneting into my shiny new router, for which the warranty lasted barely half an hour, and setting a password. As per the instructions, I log out to ssh back in, only to find that ssh did not like that, due to key conflicts. That's right, I forgot, 192.168.1.1 was where I had my server a while ago. Delete the key and try again.
Back in the router, time to get wireless up and running. Following the instructions on the wiki, and again this doesn't take too long. Looks like it's time to up the ante - buy 2 more, for a fully functioning mesh net.
Doing so, flashing as before, and a lot of swearing over having to delete ssh keys every 2 minutes, and I have a basic setup, in which all routers have a password, and OpenWRT. They still need wireless enabling though, so let's do that. This time, I feel confident that I know what I'm doing without consulting the OpenWRT wiki. I vi into /etc/config/wireless, and delete the line disabling wireless, only to find that that's needed. Quick fix, use the UCI tool to re-enable it, and bob's your uncle, 3 routers with wifi enabled. Now what?
Turns out, the OpenWRT wiki is very, very comprehensive, and the UCI tool is very flexible. Setting up wifi was again a matter of following the wiki instructions, editing /etc/config/wireless, reloading the file with UCI, throwing an IP onto the interface, and ping to my hearts content. Repeat for all 3, and a very simple 3-node mesh network now exists on a set of shelves to my right. Now, to make all this happen in about 3 seconds, with no human intervention.
-
Altering Scope and Aim (Slightly)
07/10/2014 at 20:33 • 0 commentsAs was pointed out to us in the comments, it would lower the barrier of entry if we were to provide the firmware for the project, in such a form as to be compatible with existing routers. This would allow people to use their own hardware, rather than having to spend money on new devices (and, as we all know, the main people who will be setting mesh nets up have enough boxen lying around as is...)
We thought this was a fantastic idea, however it comes with a caveat - not everyone will want to flash their own firmware, and as a result, we feel that we should provide a device of our own, as well. This is, however, a secondary goal - the firmware itself will take focus, at least in the beginning.
This also comes with a change in the method used to connect to the host - initially, we were using USB. Now, however, to utilize this existing hardware, we must use Ethernet. This comes with its own challenges, not least of which is addressing - do we forward packets straight to the network (as we would have done with USB), or do we provide some sort of NAT and DHCP? Or, other?
This exposes another issue, that existed already anyway, and that's the allocation of addresses on the mesh itself. Each device, which I'm now going to refer to as an access point for convenience, will need its own IP address for routing purposes. These addresses will need to be allocated in some way, and if the net is to stay truly failure resistant, a single AP hosting a DHCP server will not work, for reasons that should be obvious. However, for now, the goal is to get a device booting OpenWRT, so for now this isn't a concern. All things in due time.
-
Embedded Linux
06/19/2014 at 15:30 • 0 commentsNow that we've decided on an embedded ARM computer running Linux, we need to decide on what ARM chip to use, and build a system image for it. For this, we need to know how large the load is, and for a given chip, if Linux has been ported. As was previously stated, the maximum load will be around the 2400kframe/s mark. So, about 2.4 million routing operations need to happen, per second. Of course, the chances of running the network flat-out like that are low, especially for our tests, so it won't matter too much if we can't get the full 2400. This is only a prototype, and we live by the philosophy of getting things working first, then making them work well.
Using all of this, we converged to the ARM926 processor, although we have not yet decided on the specific chip to use. From there, we can now generate an image, and test it using QEMU. This involves a number of steps, as I shall outline below.
The first thing that is needed is a compiled kernel. To get this, a full GNU toolchain (minus GLibC) is needed. Therefore, we need to first download GCC 4.9.0, and Binutils 2.24, the latest version at the time of writing. Now, we need to compile them, for ARM. After much web scouring, I found a set of what seemed like comprehensive tutorials. They were slightly out of date, however that shouldn't matter too much. I started with Binutils, as was stated, by first making a usr directory in the projects folder, to host all of this, then switching to the binutils directory and running:
# mkdir binutils-build
# cd binutils-build
# export TARGET=arm-linux
# export PREFIX=ProjDir/usr
# ../binutils-2.19.1/configure --target=$TARGET -prefix=$PREFIX --disable-nls --disable-werror
# make
# make install
All worked fine, and I had myself a brand new local Binutils installation. Great! Now, for GCC. I did something similar (changing the configure options slightly), only to receive errors. Apparently, this variant of the ARM archiecture isn't supported any longer. Back to the drawing board. After much googling, I found that the TARGET tuple was wrong - outdated, in fact. Damn. So much for the age of the doc I was using not mattering. Changing $TARGET to arm-none-eabi, I recompiled Binutils, then GCC. It worked fine this time, thankfully, but little did I know, there was more to come.
Now, I downloaded the Linux 3.15.1 sources. Starting by adding my new binaries to the $PATH, I configured the kernel, starting by setting CROSS_COMPILER to arm-none-eabi- and setting it to build as embedded. Upon making it, I received several complaints from my newly minted, fresh out of the oven, compiler, that several flags were not recognized. One of these indicated something to do with MMX, which indicated that the kernel was trying to build as x86. More Googling, and I exported $ARCH=arm. A few more configuration changes, and it works again. We have a kernel image.
Now, upon trying to boot this image, nothing would happen. I reconfigured QEMU, trying all the permutations of options I could think of, and got nothing. Looking back in my configuration of the kernel, and I had the platform set to ARMv6, whereas my chosen processor architecture was ARMv5TE. A few changed variables and we were off again. Sort of. Getting further than last time, it still wouldn't boot. It was around then I noticed the option for ARM Versatile (which was the machine I was using to test, in QEMU). Setting this, it finally booted, giving me a nice error about how it wasn't able to find any root directory, and couldn't find an init process to start. Success!