The Most Dangerous Memory

Recently I embarked on a fun little debugging journey.

Embedded work, of course. A working debugger? Never heard of one (I could ostensibly use the GDB stub, but setting a breakpoint was liable to crash the whole system about 75% of the time, and otherwise stepping a line took upwards of 30 seconds).

My story begins with a change to the build environment. Goodbye -O2; hello -O0 -g! The goal’s to use what little remote debugging I have to fix a silly ol’ bug. Easy fix. But now there’s this weird hang just before my program’s supposed to quit.

Weird. The code sends a message and waits for a reply, which never comes. I look at the thread on the other side. It’s waiting to receive the message. So… the message passing code is broken now? But I haven’t touched it, and it’s been working for just about ever.

I look a little closer. One thing I can actually reliably do with my debugger is pause, and read out some structures wherever the roulette wheel lands (useful when things hang). Turns out my channel_id is now 0. What? When I started executing, it was a nice, friendly 149. But something’s reverted it to zero. Is deinitialization happening too early? Nope. Maybe memory corruption?

And so, like a seasoned engineer I sprinkle assert(foo->channel_id) about the codebase. All in all a beautifully round 100 asserts did the trick. They spanned about 7 or 8 abstraction layers, with the deepest assert pair telling me that channel_id becomes zero somewhere in the middle of a some peripheral driver code. The culprit function was a long, long sequence of updates to memory-mapped I/O registers. Initialization code, I suppose (this was way underneath any abstraction I’d yet dealt with). All in all, that autogenerated peripheral-init function was probably about 5000 lines long, with exactly one local variable. Pretty cute code.

So somewhere in the middle of this function, which has absolutely nothing to do with my clobbered channel_id variable on the heap, there must a sketchy memory write or two. I put my thinking cap on.

Of course. It’s got to be a timing bug related to some interrupt, right? But after running my code some 10 times over, the exact same memory was being clobbered somewhere in a very precise region of assertions. No timing bug I’ve ever seen has been so consistent. So maybe it wasn’t timing after all.

Aha! One of the register writes must set up a faulty DMA write to my memory. So I pull out the massive spec sheet, flip through to the relevant component. Hmmmm. No DMA writes. No DMA reads. Nothing about this peripheral touches memory.

Foiled again, I can’t think of anything to do but pull up the assembly output for that function. Again, since it’s 5000 lines of register writes I doubt I’ll be able to glean anything very interesting from the assembly output. But I look.

Wait. What?

You’re telling me this function which uses exactly one (1) local variable, a uintptr_t to be exact, reserves 10kiB (!!!!!) of stack space? A spark of an idea in my head, I go back and print out the exact addresses for my clobbered data (in the heap) and my stack pointer in the reg-write function.

Isn’t that something. The stack is overflowing into the heap, isn’t it?

Come on, `clang`!

I was very surprised to learn that a sequence of function calls, even without any local variables around, might just occupy stack space linear to the number of calls. This is in clang on ARM, with no optimizations. You can also see this behavior on clang in x86_64. GCC doesn’t have this problem, probably because of some GPL fairy dust or something.

Instead I expected that the arguments of each function (there were only two args) would be computed in reused registers. After all, ARM has no shortage of general purpose registers, and the ABI specifies that the first 4 arguments of a function are passed on callee-saved registers.

I also expected that if any stack space was used to call a function, clang would reuse the same words in the stack such that the reserved stack space wouldn’t grow with the number of function calls.

Moral of the story?

The edge of the stack is a spooky place to be. This time it was C code, but of course in C you generally take your life into your own hands. But this could have as easily happened in a language like Rust, which doesn’t do anything fancy to prevent excess stack usage and in fact heavily encourages you to rely on the stack as your primary store of data. So:

Audit your autogenerated code, and better yet audit your stack usage with compiler flags like -fstack-usage. But again, that's a GCC option, not something that you can easily use with Clang. And even when you can use GCC, such tools can't help you with dynamically determined stack usage, like when you use recursion or variable-length arrays. But if you're using these features without reliable stack protection (guard pages, etc.), you pretty much assume responsibility for your program to explode in unexpected ways.

I suppose if there's any single takeaway from my sad story, it would be a three-word mantra:

Stack considered harmful!

So just use Java, where everything's on the heap? D:

a blog

The Most Dangerous Memory

Come on, clang!

Moral of the story?

Come on, `clang`!