Improving Software ‘Numbers’

Update: Thanks again to Saleem Rashid, this time for pointing me at Saturating in Rust Nightly.

Programming languages should expose a flexible variety of explicit types and operators for handling arithmetic overflow and related problems. As language design problems go, this might be relatively less difficult to achieve. Rust is closest to where programming languages need to be, but not all the way there yet.

I believe that software should be able to reliably compute arithmetic expressions. You may say I’m a dreamer, but I’m not the only one.

To represent an infinite set in finite space is problematic. This is not news to most software engineers. We always need a coherent policy for how to deal with our inability to represent some elements of infinite sets, or expressions that would evaluate to representable elements if only we had space.

We can consistently apply some policy (whatever it may be) using the language’s type system: by encoding policy in the set’s type, and operators on and functions of it, we can get (if nothing else) consistent representation of and behavior in error states. (I’ll call them representation errors generically.)

Surely, we can all agree that this is the minimum necessary for program correctness.

Surely, we can all agree that correctness is the minimum necessary for program safety.

Surely, we can all agree that providing people correct and safe software is our duty as engineers.

Even the most fundamental objects of computation, the reals and the integers and arithmetic on them, require some policy for unrepresentable values — even if it is simply to crash when no more memory is available to an arbitrary-precision arithemtic library (for example).

Machine words are more limited still, and have far less range to represent the reals and integers than do arbitrary-precision types. Even so, for practical efficiency, we typically use machine words to represent elements of these sets. For many circumstances, this is not a problem.

The machines we typically use implement reasonable error policies, at least for the integer types. For example, ARM’s and Intel’s policy is to make arithmetic modular, and to indicate carry, borrow, and overflow in a status register. (See e.g. carry flag, overflow flag, and conditional execution.)

Unfortunately, most of the programming languages that aim for machine-level efficiency — e.g. C, C++, and Rust — do not provide a standard way to access the machine status register directly, or otherwise indicate representation errors. (Rust does provide indirect access, however; see below.) Programmers must switch to machine language, or must use non-standard, implementation-specific features such as __builtin_add_overflow.

It would be helpful if these languages’ developers defined standardized syntax and semantics for accessing the error state that the machine provides. (If the programmer uses such a mechanism, but compiles for a machine that doesn’t signal representation errors, the compiler could emulate such a signal, otherwise attempt to accommodate the programmer, or raise a compilation error.)

However, some languages often violate and obscure the clear and simple policy that most machines define.

Language Unrepresentable Signed Integer Behavior Unrepresentable Unsigned Integer Behavior
C, C++ Undefined behavior Modular arithmetic
Rust
(default Release configuration)
Modular arithmetic Modular arithmetic
Rust
(default Debug configuration)
Machine trap Machine trap
Java, Go Modular arithmetic Modular arithmetic

There are a number of problems with this state of affairs:

The usual reason given for the use of undefined behavior is that it can enable certain micro-optimizations. This is potentially useful and a legitimate policy option. (Language designers hesitate to pay the micro-cost of overflow checking even in special cases. But to me it seems like unchecked_add is available for the few people whose application is unacceptably affected by a check.)

In any case, and in any language, “potentially (but untestably) fast, hard to predict, possibly wrong, and possibly unsafe”, even if occasionally desirable, should not be the default policy, as it is for signed arithmetic in C and C++.

Fortunately, all of the languages in mainstream use have active standards committees and change processes. They could define ways to access the machine state, and could define standard types and operators implementing a wide range of reasonable representation error policies. (In addition to the suite of special policy operators that Rust provides, it’d help to have types whose default operators implement each of those policies, as well — not just Wrapping.)

In some cases, it may even be possible to change the default policies. I think language designers and standardizers should do so.

(Perhaps something similar could be done for the error modes of floating-point arithmetic, too.)


Thanks to Saleem Rashid for pointing out some gaps, and to Alex Gaynor for pointing out an inaccuracy!