Update 25 Aug: Removed 2 paragraphs that made more sense in an older draft than in the post as finally published.
Recoverability is my word for a desirable computer safety property: The ability to fully assert ownership and control over the machine and all its operations. (This includes the ability to fully relinquish control of the machine, too.)
Recoverability is crucial in many everyday situations:
You can see that recoverability really is an everyday problem, for everyone, when you consider how many kinds of devices require recoverability. A sampling:
Recoverability is mostly about code integrity, but maintaining data confidentiality (usually by ensuring it’s destroyed) is also important.
Our goal should be to make recoverability and relinquishment first-class, well-supported, documented, discoverable ceremonies that people can easily and regularly use. For example, resetting a digital assistant really should delete all user data storage, and should affirmatively reset all system software (including all peripheral firmware!) to a known-good state.
Of course, that’s harder than it sounds. If your SSD’s firmware is compromised, it’s probably lying to you about updating the operating system and the firmware itself.
In most computing devices, there are tons of places where no-longer-wanted data and code can remain, thwarting our ability to recover the device. Malware might hide in the firmware. (Many peripherals, including keyboards, network interfaces, storage devices, cameras, the Mac Touchbar, and more have updatable firmware.) Many printers and scanners keep a copy of what they’ve printed and scanned — how do you wipe your tax records off your printer before selling it on Craigslist? And so on.
It might seem that we could ease recoverability by designing the system such that it is essentially W^X — writable data can never become executable code. However, it is very close to impossible to achieve this for a variety of reasons. Among others:
Perhaps the only real way to achieve W^X is on a pure-ROM system: no writes. As great as the NES was, a system designed on that principle has very limited utility. (But more than none! And maybe sufficient for some of the use-cases?) Although NES Game Paks eventually got writable RAM, it was volatile and hence recoverable.
That suggests another option: volatile installation. Even if the code is writable, as long as the memory is exclusively volatile, the device is recoverable. For example, Apple Lightning cables work this way. (More fun from Lisa Braun.) CPU microcode updates can work the same way.
We can also achieve a certain degree of recoverability if there are code updates, including in non-volatile memory, but all updates are authenticated (such as by cryptographic code and keys from ROM or a TPM). This gives us a good degree of recoverability until the non-updatable crypto is cracked. (See also USB-C authentication.)
For data confidentiality — mainly, rendering data unusable upon relinquishment — the only real way is to always write only ciphertext into non-volatile memory, then to destroy the key when relinquishing. Modern storage technology does not give us a way to be sure that data is deleted. (See e.g. wear-leveling.) We can only hope to make it indecipherable.
But all these ‘easy’ mechanisms leave us with a question: do we have to trade off updatability for recoverability? Even volatile installation depends on the integrity of the installation source (usually your primary operating system).
What are you actually, really supposed to do to recover your computer after it has been compromised?
Recovering and re-verifying the integrity of all your data and network accounts could be the topic of several books. For this post, I just mean the computer itself. After a successful attack, is your computer merely e-waste?
Unfortunately, all your software and forgotten firmware is potentially relevant attack surface but also a potential persistence mechanism — breaking recoverability. We have to assume the worst in case of actual compromise. But depending on the hardware and firmware design, we may not have a way to recover all the firmwares. This includes those in the storage devices, which can break our ability even to recover the primary OS.
Thus recoverability is an unsolved privacy, usability, economic, and even environmental problem. It’s a fun and important problem for these reasons and (especially to me) because solving it requires a holistic, general view of computer systems. It’ll never be enough to ‘just’ design a good update protocol, or kernel, or UX, or memory subsystem. All those pieces (and more) must fit together in a coherent narrative and ceremony that people can observe, believe, and rely on every day.
Mara Tam reminded me in conversation that the shared device use case demonstrates a particularly acute need for recoverability and relinquishment. Any other errors or omissions are mine, of course.
Someday I’d like to write at greater length about this use case. My colleagues and I have spent significant time chewing on it, and although it’s not easy it’s crucial that all platform developers handle it. Not only is it possible to do more than nothing, there may be some relatively straightforward improvements to be made.