This post is an attempt to pin down my intuition that an “interface”, broadly defined, can be a productive conceptual frame for a wide variety of security problems and solutions. I can’t promise that this post makes total sense; it’s just thinking out loud at this point.
There are many ways to understand software security engineering. One (all-too-)prevalent view is of security as a cat-and-mouse game: by hook or by crook, any little thing you can do to attack or avoid being attacked counts as “security engineering”. Especially for defenders, this view leads directly to failure. It’s analogous to micro-optimizing a fragment of code (a) before profiling it to see if it’s really a hot spot; (b) without testing to see if the micro-optimizations help or hurt; and (c) without any quantified performance target.
For example, consider a web application firewall (WAF). People often buy these to “secure” their web applications, saying things like, “Hey, even if the web application is well-engineered, belt and suspenders, right?! Belt and suspenders!” But ask: How much does the WAF cost to buy? How much does it cost to install, configure, and run? Who looks at its logs and reports, and how much does that person’s time cost? (Don’t forget the opportunity cost.)
How does the WAF affect the application’s performance and reliability? Possibly not well.
How much attack surface does the WAF itself create and expose? Often, a WAF can create significant new risk. I once found an XSS vulnerability in a web application, and ran a demonstration exploit so I could document that it worked. No big surprise there. After a while, a guy came up to me and said he was he WAF operator for that app, and did these weird pop-ups he kept seeing have anything to do with my security testing? I didn’t even know the app was (supposedly) being protected by a WAF, but I had accidentally exploited both the app and the WAF in one shot.
A correct WAF configuration is equivalent to fixing the bug in the original application. Why not just do that?
I want to forget all about both belts and suspenders; instead, I want to buy pants that actually fit.
I prefer to think of security as a class of interface guarantee. In particular, security guarantees are a kind of correctness guarantee. At every interface of every kind — user interface, programming language syntax and semantics, in-process APIs, kernel APIs, RPC and network protocols, ceremonies — explicit and implicit design guarantees (promises, contracts) are in place, and determine the degree of “security” (however defined) the system can possibly achieve.
Design guarantees might or might not actually hold in the implementation — software tends to have bugs, after all. Callers and callees can sometimes (but not always) defend themselves against untrustworthy callees and callers (respectively) in various ways that depend on the circumstances and on the nature of caller and callee. In this sense an interface is an attack surface — but properly constructed, it can also be a defense surface.
Here are some example security guarantees in hypothetical and real interfaces:
bool isValidEmailAddress(String address, Set
knownTLDs) returns true if the email address is syntactically valid for
SMTP addresses according to RFC 3696, and if the domain part is in a known
All array accesses are checked at run time; an attempt to use an index
that is less than zero or greater than or equal to the length of the array
ArrayIndexOutOfBoundsException to be thrown. (From the Java
DNS queries and responses can be read, copied, deleted, altered, and forged by an attacker on any network segment between client and server.
Within a single goroutine, the happens-before order is the order expressed by the program. (From the Go language documentation.)
The true technical security guarantee that an interface’s implementation provides is not necessarily the same as the guarantee the caller perceives. I’ll call this the interface perception gap, for lack of a less-awful term. The gap could exist for many reasons, including at least:
the guarantee is implicit (i.e. not in the interface definition)
the guarantee is explicit, but the caller did not read or understand the interface definition
possibly because the interface definition is too complex for the caller to understand
possibly because the guarantee is not in the caller’s mental model of the interface or of the caller’s own requirements
the interface misuses terms in its own definition
the interface definition is so poor that the caller must imagine their own implicit definition
Gaps in contracts tend, over time, to become implicit guarantees and non-guarantees. It can be possible to assert new technical guarantees in the gaps. Consider address space layout randomization (ASLR). The executable loaders of operating systems never specified the precise location in memory of the program text, heap, stack, libraries, et c. in memory; this freed up implementors to randomize those locations to thwart exploit developers, cat-and-mouse style. When it was invented, ASLR was a decent way to buy some time (a couple years at most) for the authors of programs written in unsafe languages to fix their bugs or port to safe languages. However, it was never going to be possible for ASLR to fully solve the problems of unsafe languages, for many reasons, including at least:
ASLR was a new technical guarantee retrofitted into the interface perception gap of pre-existing executable loaders that had to be compatible with existing code, and thus not all program components could be randomized with a high degree of entropy.
Programs generally must be recompiled with new options, or at least with old options previously thought of as being exclusively for dynamically-loadable library code — that is, there wasn’t enough of a perception gap in the toolchains’ interfaces! As a result, the guarantee of ASLR is still not ubiquitous, more than a decade later.
Many program errors are still exploitable due to the limited granularity of what program parts can be efficiently randomized — there is an implicit guarantee of run-time efficiency that extreme ASLR could violate.
In applications that give attackers significant but not directly malicious control over run-time behavior — for example, as any dynamic programming environment like a web browser must do — the attacker can significantly reduce the effective entropy of ASLR, thus weakening the already-weak guarantee.
Previously low-severity bugs, like single-word out-of-bounds read errors, become information leaks that can undo all the benefits of ASLR and enable an attacker to craft a reliable exploit. The implied ‘interface’ of an out-of-bounds read primitive changes: while an OOB read should be guaranteed not to happen, the ‘guarantee’ changes from “likely possible but mostly harmless” to ”there goes ASLR... now all those ROP exploits are back in scope.” Oops.
Perhaps because ASLR was not (to my knowledge) clearly documented as a temporary cat-and-mouse game, engineers have come to rely on it as being the thing that makes the continued use of unsafe languages acceptable. Unsafe (and untyped) languages will always be guaranteed to be unsafe, and we should have used the time ASLR bought us to aggressively replace our software with equivalents implemented in safe languages. Instead, we linger in a zone of ambiguity, taking the (slight) performance hit of ASLR yet not effectively gaining much safety from it.
Sometimes, interface perception gaps are surfaced, and the interface and implementation change to close the gap. A classic example is the denial-of-service problem in hash tables: If an attacker can influence or completely control the keys of the pairs inserted into a hash table, they can cause the performance to degrade from the (widely perceived — but usually explicitly disclaimed!) ~ O(1) performance guarantee for hash table lookup. Defenders can either explicitly claim the performance guarantee by randomizing the hash function in a way the attacker cannot predict, or (if they specified a more abstract interface) switch to an implementation (such as a red-black tree) that does not suffer from the problem.
The technical strength of a security mechanism is limited when it is not backed by an explicit contract. Explicit, understandable, tested, and enforced guarantees, which could reasonably fit into the caller’s mental model, are best.
A guarantee that is not also perceived by its callers is limited in effectiveness. Consider an interface for a map data structure: If the implementation is guaranteed to be a sorted tree, callers can trust that they can iterate over the keys in sorted order without having to do any extra work. But if they don’t understand that part of the interface definition, they might mistakenly waste time and space by extracting all the keys into an array and pointlessly re-sorting it. The problem is reversed if the interface is explicitly defined to be (say) a hash table, but the caller does not realize that.
Similarly, a security guarantee that callers do not perceive — but which is present — can cause callers to miscalculate their risk as being higher than it is. While it might seem that is OK, because callers will “err on the side of caution”, in fact the misperception can have an opportunity cost. (In a sense, a self-denial-of-service.)
A non-guarantee that is not perceived can also become dangerous. For example, although documentation explicitly disclaims it, users often perceive that programs can maintain (e.g.) confidentiality for the user’s data even when the underlying platform is under the physical control of an attacker. Such an attacker’s capabilities tend to be well outside the users’ mental models; and in any case, documentation (a secondary interface definition) is a poor substitute for a user-visible interface definition in the GUI (a primary definition).
Interface misperceptions are sometimes widely or strongly held, and can become implicit or even explicit guarantees, and can force brittleness or even breakage into the interface. As an extreme example, consider the User Account Control feature introduced in Windows Vista. After it was released, Microsoft published a blog post (a secondary interface definition) and tried to roll back the expectations that callers developed when reading the primary definitions (the GUI and aspects of the API):
It should be clear then, that neither UAC elevations nor Protected Mode IE define new Windows security boundaries. Microsoft has been communicating this but I want to make sure that the point is clearly heard. Further, as Jim Allchin pointed out in his blog post Security Features vs Convenience, Vista makes tradeoffs between security and convenience, and both UAC and Protected Mode IE have design choices that required paths to be opened in the IL wall for application compatibility and ease of use.
Perhaps the core problem with UAC, Integrity Levels, and User Interface Privilege Isolation is that one interface, the security principal (in Windows, represented by the access token), is too hard to compose with another interface: the traditional multi-process/single principal windowing environment for presenting user interfaces. Modern platforms require a 2-part security principal (see the Background section in that document), composable with a user interface paradigm that allows users to distinguish the many cooperating principals. (Consider the EROS Trusted Windowing System as an example alternative.)
At the beginning of this blog post, I poked a little fun at WAFs. Making fun of WAFs is traditional picnic banter in my tribe (application security engineers), so I feel it is only fair to put a little sacred cow hamburger on the grill, too. Here are 2 examples.
array comparison to defeat timing side-channel
attacks. Consider for example the HMAC defense against CSRF:
HMAC_SHA256(secret_key, session_token + action_name). It should be
computationally infeasible for the attacker to ever guess or learn the token
value, but a timing side-channel, such as that introduced by a naïve byte array
comparison allows the attacker to guess the token in a feasible amount of time
and attempts (proportional to N = number of bits in token). A canonical solution
is to use an array comparison function that always takes the same amount of
time, rather than returning as soon as it finds a mismatch.
The trouble with this is that, apart from the code being slightly subtle, there is no interface guaranteeing that the code will indeed take the same amount of time on all inputs. Several things are permissible, given the documented interfaces between the programmer and the ultimate execution context:
the compiler might find a way to optimize the function;
XOR instruction might not take the same amount of
time to compute all inputs; or
the machine (real, or virtual!) might even transform and optimize the code before running it.
Does the expected timing guarantee still hold, given these interfaces and their non-guarantee? As Lawson says, the solution is fragile and you have to test it every time the execution environment changes.
An additional, essentially fatal problem is that many real-world applications are implemented in very high-level languages like Python and Java, where there are even more layers of abstraction and therefore even less of a constant-time interface guarantee.
An alternative solution, which I learned from Brad Hill, is to forget about
trying to run in constant time, and instead to blind
the attacker by making what timing information they learn useless. Rather
than directly comparing the timing-sensitive tokens (say, SAML blob signatures
or CSRF tokens), HMAC the received blob and the expected blob again (with a new,
separate HMAC key), and then compare those HMAC outputs (with any comparison
function you want, even
memcmp). The attacker may indeed observe a
timing side-channel — but the timing information will be random relative to the
input. This is due to the straightforward, documented, and tested interface
guarantee of the HMAC function as a pseudo-random function. And it works as
expected in any language, on any computing substrate.
Consider another cryptography-related security conundrum: the supposed need to clear secrets from RAM when the secrets are no longer needed, or even to encrypt the RAM (presumably decrypting it in registers?). This is supposed to ensure that live process RAM never hits the disk (as in e.g. swap space), nor is available to an attacker who can read the contents of RAM. The usual threat scenario invoked to warrant this type of defense is that of a physically-local forensic attacker, usually of relatively high capability (e.g. capable of performing a cold boot attack or a live memory dump). The goal is to not reveal secrets (e.g. Top Secret documents, passwords, encryption keys, et c.) to such an attacker.
The trouble with this goal is that there can be no interface guarantee that clearing memory in one area will fully erase all copies of the data. The virtual memory managers of modern operating systems, and the dynamic heap allocators of modern language run-times, in fact guarantee very little in the way of memory layout or deterministic behavior. Instead they provide guarantees of more-or-less high performance, which additional security guarantees could complicate or render infeasible.
realloc memory, the
userland run-time or the kernel might make a copy that you can no longer
reliably reference (so you can’t reliably clear it).
When you free memory, the kernel might not zero the pages out until the last second before giving them to the next requestor. Thus, the time window in which they are prone to discovery by the forensic attacker increases.
Kernel APIs like
which purport to lock memory into physical RAM pages (stopping the pages from
being swapped out to disk), do not necessarily work the way you expect, or even
In a garbage-collected run-time, essentially any amount of copying, moving, and reallocating is possible. There can be no guarantee that a piece of data is stored in exactly 1 location in RAM, and that you can clear it.
The same holds for virtual machines, of course.
Essentially, there can be no guarantee that a high-capability forensic attacker cannot find secrets in RAM or swapped-out process memory; the more complex the operating system and run-time, the less likely it is that you can even probabilistically defeat such an attacker.
The most you can realistically do in the general case is mitigate the problems with full disk encryption and whatever degree of physical security you can get. In specific cases, such as cryptographic keys, you can keep the keys in a tamper-resistant, tamper-evident hardware security module.
This post is partly an attempt to investigate why the “security vs. convenience” dichotomy is false. I think it’s worse than a false dichotomy, really; it’s a fundamental misconception of what security is and of what an interface is — and of what ‘convenience’ (an impoverished view of usability) is.
But also it’s an attempt to re-frame security engineering in a way that allows us to imagine more and better solutions to security problems. For example, when you frame your interface as an attack surface, you find yourself ever-so-slightly in a panic mode, and focus on how to make the surface as small as possible. Inevitably, this tends to lead to cat-and-mouseism and poor usability, seeming to reinforce the false dichotomy. If the panic is acute, it can even lead to nonsensical and undefendable interfaces, and a proliferation of false boundaries (as we saw with Windows UAC).
If instead we frame an interface as a defense surface, we are in a mindset that allows us to treat the interface as a shield: built for defense, testable, tested, covering the body; but also light-weight enough to carry and use effectively. It might seem like a semantic game; but in my experience, thinking of a boundary as a place to build a point of strength rather than thinking of it as something that must inevitably fall to attack leads to solutions that in fact withstand attack better while also functioning better for friendly callers.
The safest interface is still no interface — don’t multiply interfaces unnecessarily. But when you must expose something, expose a well-tested shield rather than merely trying to narrow your profile or hide behind a tree.