Maybe We Can Have Nice Things

18 Feb: See below for some nice updates!

Programming languages advance by introducing new constraints. A key reason we don’t use assembly language for everything is that the lack of constraints make it too hard to use for everyday programming. Before goto was considered harmful, people wrote machine code that jumped all over the place, and programmers had to maintain a mental model of the complete machine state and the full implications of each jump — a recipe for bugs.

Then, structured programming was introduced: structured languages still compiled down to gotos (or arbitrary jumps), but the programmer could think in terms of more limited jumps: if, switch/case, call, return, for. These constrained jumps are much easier to understand; for example, when you’re reading code, you can know that return doesn’t return just anywhere. It returns only to the caller, as identified by a pointer on the stack. Later, language designers added additional constrained jumps like throw/catch, and virtual function calls.

(throw is a little bit too goto-y for my taste, since you can’t tell locally where the relevant catch block is. But that’s a story for another time.)

A key innovation of C++ was to introduce RAII, which essentially ‘piggybacks’ on the value of the stack and enriches it with a lot more power. (The additional complexity is usually manageable, and worth it.) It allows you extend the automatic memory management that the stack provides, initializing and cleaning up complex resources instead of just primitive values like integers and floats. You can automatically close open files, release dynamic storage, and so on. And it’s deterministic.

But there was still the problem of the heap: a free-fire zone with no constraints, riddled with memory leaks (heap resources allocated but never released) and use-after-free bugs (heap resources re-used even after having been released).

A key innovation of Rust has been to statically constrain the lifetimes of heap resources, enabling us to more completely solve the worst remaining memory unsafety problem. (Previous solutions to the heap lifetime problem were dynamic, not static, and hence expensive in space and time — as well as being non-deterministic. These limitations reduce the applicability of dynamically-managed languages to applications and environments where these costs are affordable.)

And, of course, taming object lifetimes greatly eases the problem of safe, efficient concurrency. Concurrency is the key to improving performance in modern systems.

Beyond memory safety, Rust makes more use of typefulness than I typically see in other mainstream languages in its niche. For example, Rust’s rich enums and pattern matching make it easier to write state machines, the new type idiom makes it easier to get additional type safety (and improves the interface-as-documentation factor), and so on. You can work to get similar benefits in other languages, but Rust’s syntactic mechanisms and idiomatic usage create affordances for these easier patterns.

Another freeing constraint Rust has introduced has been to systematize and automate dependency management: the Cargo package management system. Good dependency management is a monstrously hard problem. Any dependency management system, including manual or ad hoc management, poses a variety of problems:

The NPM ecosystem provides the clearest modern illustration of these problems. (See page 11 of Github’s report on security, for example.)

However, for all of NPM’s problems, at least it is a package management system at all! It’s easy to pick on NPM (or predecessors like CPAN, or CTAN, or...), but even at its worst it’s a huge improvement over manually managing dependencies (such as by manually vendoring them into your source tree, or just telling the user to install such-and-such libraries before attempting to compile).

Life is better with NPM, and with Rust’s Cargo, Go’s go get, and so on. Even when they aren’t perfect yet, they provide a framework for improvement, by constraining where dependencies come from and how we maintain them.

But a lot of work is still necessary. As an example of a Nice Thing Indeed, Cargo has this add-on package called supply-chain, which will show you all the packages a given package depends on. It will also estimate how many individual publishers author those dependencies. Here is what happens when you run supply-chain on itself:

~/src/rust/cargo-supply-chain % cargo supply-chain publishers

The following crates will be ignored because they come from a local directory:
 - cargo-supply-chain

The `crates.io` cache was not found or it is invalid.
  Run `cargo supply-chain update` to generate it.

Fetching publisher info from crates.io
This will take roughly 2 seconds per crate due to API rate limits
Fetching data for "adler" (0/79)
[77 items, including some surprising ones, elided...]
Fetching data for "xattr" (78/79)

The following individuals can publish updates for your dependencies:

 1. alexcrichton via crates: flate2, wasm-bindgen-backend, wasi, bitflags, proc-macro2, wasm-bindgen-macro, wasm-bindgen, openssl-probe, unicode-xid, wasm-bindgen-macro-support, filetime, semver, tar, unicode-normalization, libc, js-sys, bumpalo, log, wasm-bindgen-shared, cfg-if, cc, web-sys
 [55 authors elided...]
 57. zesterer via crates: spin

Note: there may be outstanding publisher invitations. crates.io provides no way to list them.
Invitations are also impossible to revoke, and they never expire.
See https://github.com/rust-lang/crates.io/issues/2868 for more info.

All members of the following teams can publish updates for your dependencies:

 1. "github:rustwasm:core" (https://github.com/rustwasm) via crates: web-sys, js-sys, wasm-bindgen-macro, wasm-bindgen-macro-support, wasm-bindgen-backend, wasm-bindgen, wasm-bindgen-shared
 2. "github:servo:cargo-publish" (https://github.com/servo) via crates: core-foundation-sys, percent-encoding, form_urlencoded, unicode-bidi, core-foundation, idna, url
 3. "github:servo:rust-url" (https://github.com/servo) via crates: percent-encoding, form_urlencoded, idna, url
 4. "github:rust-bus:maintainers" (https://github.com/rust-bus) via crates: security-framework-sys, security-framework, tinyvec
 5. "github:rust-lang-nursery:libs" (https://github.com/rust-lang-nursery) via crates: bitflags, log, lazy_static
 6. "github:serde-rs:owners" (https://github.com/serde-rs) via crates: serde_derive, serde, serde_json
 7. "github:rust-lang:libs" (https://github.com/rust-lang) via crates: libc, cfg-if
 8. "github:rust-lang-nursery:log-owners" (https://github.com/rust-lang-nursery) via crates: log
 9. "github:rust-random:maintainers" (https://github.com/rust-random) via crates: getrandom

Github teams are black boxes. It's impossible to get the member list without explicit permission.

~/src/rust/cargo-supply-chain % cargo supply-chain update
Note: this will download large amounts of data (approximately 250Mb).
On a slow network this will take a while.

Now, that’s a lot of dependencies by a lot of publishers whom I don’t know. (Although it’s not automated, if you dig around you’ll find that many of those authors are well-established members of the Rust development team, so trusting them is an easier sell.) Another bummer is that, when I built supply-chain, my default $CFLAGS broke the build (Update 18 Feb: with an almost certainly spurious and not security-relevant warning, -Wunused-macros). (My flags are quite persnickety: -Weverything -Werror -std=c11. Very little code builds with these flags. 😇) Apparently, some of supply-chain’s own dependencies depend on C code. Alas.

But that’s OK! Cargo provides a framework for working on these problems. Over time, I’d like to see things move along these lines:

Another good thing about Rust is its friendly community. Not all systems programming communities are as welcoming as Rust’s is. Rust, and some other communities, have taken proactive steps to maintain a healthy community. I think it’s fair to say the Rust community is doing relatively well, especially in the systems programming niche.

Like all language communities, whether of natural languages or artificial languages, the community and the body of literature and the oral tradition are what matter. In its niche, Rust looks like the option with the most momentum around a more positive, healthier community. The community and the language are probably not perfect — nothing is, if perfect is even a thing — but Rust looks like the community most open to solving its problems, and most capable of solving systems programming problems.

Thanks to Adrian Taylor for reminding me to mention typefulness, concurrency, and Safety Dance.

Thanks to Sergey Davidoff, supply-chain maintainer, for pointing me at crev and noting that Safety Dance is more about reducing unsafe than C.