Everyone Needs Secure Usability

There’s an interesting article in LWN about C11 atomic variables and the kernel that struck me for a few reasons:

In Chrome Land, we’re always trying to improve people’s ability to understand and effectively use Chrome’s security indicators and controls, such as the Location Icon (the green padlock, the red padlock, et c.), the Origin Info Bubble (the bubble you get when you click on the Location Icon), the Security Panel in the Dev Tools, the permissions and Content Settings, the security exceptions in the JavaScript Console, and so on. (By “people”, you can see that I mean everyone: ‘end-users’, web developers, system and network administrators, et al. Those categories are fluid.)

Similarly, the LWN article describes how the Linux kernel developers and C11 standardizers are trying to improve C developers’ ability to understand and effectively use atomic variables so that we can all enjoy performant yet safe concurrent programs. If the mechanisms that enable concurrency (like locks and atomics) are confusing, we’re going to get buggy (and vulnerable) programs. But if they’re too slow or heavyweight, people won’t use them — and we’ll get buggy (and vulnerable) programs.

So the first thing to do is to find out what it is that the community using the product needs and can effectively use. So the C11 standards people, academic programming language researchers, and kernel developers are all working together to figure out the shape of the thing.

Here is a representative example of the kinds of problems that crop up:

Another area of concern is control dependencies: situations where atomic variables and control flow interact. Consider a simple bit of code:

x = atomic_load(&a, memory_order_relaxed);
if (x)
    atomic_store(&y, 42, memory_order_relaxed);

The setting of y has a control dependency on the value of x. But the C11 standard does not currently address control dependencies at all, meaning that the compiler or processor could play with the order of the two atomic operations, or even try to optimize the branch out altogether; see this explanation from GCC developer Torvald Riegel for details. Again, the results of this kind of optimization in the kernel context could be disastrous.

For cases like this, Paul suggested that some additional source-code markup and a new memory_order_control memory model could be used in the kernel to make the control dependency explicit:

x = atomic_load(&a, memory_order_control);
if (control_dependency(x))
    atomic_store(&b, 42, memory_order_relaxed);

But this approach is unlikely to be taken, given just how unhappy Linus was with the idea. From his point of view, the control dependency should be obvious — the code is testing the value of x, after all. Any compiler that would move the atomic_store() operation in an externally visible way, he said, is simply broken.

Just as with ‘end-user’ interfaces, these low-level APIs must adapt to the needs and expectations of the people who use them. Despite the tendency to consider ‘end-users’ as ‘ignorant’ or ‘not computer-literate’, and the tendency to consider kernel hackers as ‘elite’, these different group have concerns about and problems with the usability of their every-day tools that are more similar than people sometimes imagine.

Another thing that struck me is that magic simply may not be possible, and our best recourse might just be to live with the limitations of simplicity. Linus’ way of saying this is arresting:

Yet another concern is global optimization. Compiler developers are increasingly trying to optimize programs at the level of entire source files, or even larger groups of files. This kind of optimization can work well as long as the compiler truly understands how variables are used. But the compiler is not required to understand the real hardware that the program is running on; it is, instead, required to prove its decisions against a virtual machine defined by the standard. If the real computer behaves in ways that differ from the virtual machine, things can go wrong.

Consider this example raised by Linus: the compiler might look at how the kernel accesses page table entries and notice that no code ever sets the “page dirty” bit. It might then conclude that any tests against that bit could simply be optimized out. But that bit can change; it’s just that the hardware makes the change, not the kernel code. So any optimizations made based on the notion that the compiler can “prove” that bit will never be set will lead to bad things. Linus concluded: “Any optimization that tries to prove anything from more than local state is by definition broken, because it assumes that everything is described by the program.”

Indeed, the article describes the existing concurrency tools that the Linux kernel developers built for themselves, that the C11 mechanisms would attempt to replace. The kernel developers understand the tools they made, and they seem to work, and maybe that’s sufficient.

I can’t help but be reminded of how a web browser cannot always ‘decide’ for the person that a certificate validation error is or is not important: it depends on context that is simply not available to the program interpreting its impoverished input. Whether it’s the compiler trying to interpret kernel source code or the browser trying to interpret an X.509 certificate chain, there’s often simply not enough information to produce an output that is both correct and optimal.

There is also a problem of unaligned incentives that contribute to the secure usability problem, just as we often see happen in the Web PKI and in the design, implementation, and use of the Open Web Platform: different groups of people have different interests and are under different pressures, and sometimes their goals come into conflict.

The problem is that compilers tend to be judged on the speed of the code they generate, so compiler developers have a strong incentive to optimize code to the greatest extent possible. Sometimes those optimizations can break code that is not written with an attentive eye toward the standard; the kernel developers’ perspective is that compiler developers will often rely on a legalistic reading of standards to justify “optimizations” that (from the kernel developer’s viewpoint) make no sense and break code needlessly. Highly concurrent code, as is found in the kernel, tends to be more susceptible to optimization-caused problems than just about anything else. So kernel developers have learned to be careful.

This pressure — speed, speed, speed! — leads compiler developers to make the language opaque, and to break the conceptual integrity of the language in a way that can actually make it unsafe and unusable:

There are too many compiler optimisations for people to reason directly in terms of the set of all transformations that they do, so we need some more concise and comprehensible envelope identifying what is allowed, as an interface between compiler writers and users.

The C11 standard is meant to be that “envelope,” though, as Peter admitted, it is “not yet fully up to that task.” But if the remaining uncertainties and problems can be addressed, C11 atomics could become a common language with which developers can reason about concurrency and allowable optimizations. Developers might come to understand the issues better, and kernel code might become a bit more widely accessible to developers who understand the standard.

Cryptographer Dan Bernstein has been saying a similar thing: Do compilers actually need to produce high-performance code, in the general case? Where needed, can they? And if not, is the damage to the language’s conceptual integrity worth it? He contends (as I read him, at least) that the answer to these questions is No: Most code is not hot (run often and hence in need of optimization); and, true hot spots require optimizations that compilers cannot yet perform — only humans, locally optimizing, understand their application domain and their machine well enough to get a good result. (Bernstein is generally working on cryptographic algorithms that he optimizes to work absolutely as well as possible on each distinct chip — local optimizables that the general-purpose (‘global’, if you will) compiler optimizations don’t really help with.) From Bernstein’s presentation:

Mike Pall, LuaJIT author, 2011: “If you write an interpreter loop in assembler, you can do much better... There’s just no way you can reasonably expect even the most advanced C compilers to do this on your behalf.”

Thus it might not be that people need optimizing compilers; instead, they may benefit more from dumb-but-conceptually-integral compilers, and more usable tools for writing, profiling, and testing hand-written assembly code.

I think we can learn a few widely-applicable UX design lessons from all this compiler talk: