Expose The Correct Interface (Even When Taking Shortcuts)

When you’re designing a data structure in C, you often want to be space-efficient. For example, consider a struct of bools:

typedef struct {
    bool isAwesome;
    bool isBiped;
    bool drinksCoffee;
    bool isFurry;
    bool hasRabies;
} Friend1;

typedef struct {
    uint8_t isAwesome:1;
    uint8_t isBiped:1;
    uint8_t drinksCoffee:1;
    uint8_t isFurry:1;
    uint8_t hasRabies:1;
} Friend2;

Clearly, Friend2 is (or, can be) more space-efficient. On my machine, sizeof(Friend1) is 5 and sizeof(Friend2) is 1. In an application that requires allocating millions of friends, this can make a big difference.

Similarly, consider 2 approaches to implementing a string:

typedef struct {
    size_t count;
    uint8_t* bytes;
} String1;

typedef struct {
    uint32_t count;
    uint8_t bytes[1];
} String2;

As before, String2 is much more space-efficient (16 vs. 8 on my machine). Additionally, by having bytes be immediate instead of a pointer to a (potentially far away) array, we can improve data locality. (But it means that we need to be a bit trickier in the implementation of our constructor.)

Notice how, unlike with the example of Friend2, String2 significantly changes the data structure. First, on a 64-bit machine, the use of uint32_t instead of size_t means that we cannot have strings with more than 232 – 1 bytes. Since we’re trying to save memory, that can make sense, but it does mean we have a limitation we must enforce when interfacing with other code. (Thankfully, uint32_t will always safely cast up to a size_t on >= 32-bit machines.)

As long as we expose correct and abstract interfaces to these data structures, we can freely change between implementation strategies — we can swap Friend1 and Friend2 if necessary, and we can swap String1 and String2.

To see what I mean by “correct”, consider these hypothetical and incorrect Friend constructors:

Friend* newFriend_bitfield(int options) {
    Friend f = calloc(1, sizeof(Friend));
    f->isAwesome = F_IS_AWESOME & options;
    f->isBiped = F_IS_BIPED & options;
    // ...
    return f;
}

Friend* newFriend_ints(int isAwesome,
                       int isBiped,
                       int drinksCoffee,
                       int isFurry,
                       int hasRabies)
{
    Friend* f = calloc(1, sizeof(Friend));
    f->isAwesome = isAwesome;
    f->isBiped = isBiped;
    // ...
    return f;
}

In newFriend_bitfield, we are exposing to the caller the bitfield space optimization that we used in Friend2, and requiring the caller to learn and use constants like F_IS_AWESOME — not too complicated, but it is one more thing for the programmer to learn. Although well-behaved callers will have readable call-sites such as

Friend* f = newFriend_bitfield(F_IS_AWESOME | F_IS_FURRY);

it will be possible to have nonsensical call-sites such as

Friend* f = newFriend_bitfield(42);

If an interface allows something, no matter how silly, rest assured that some programmer somewhere will indeed invoke it that way.

Similarly, newFriend_ints appears to allow any integer value, even though it will of course immediately cast them to bool.

The correct interface is immediately understandable, and allows either implementation:

Friend* newFriend(bool isAwesome,
                  bool isBiped,
                  bool drinksCoffee,
                  bool isFurry,
                  bool hasRabies)
{
    Friend* f = calloc(1, sizeof(Friend));
    f->isAwesome = isAwesome;
    f->isBiped = isBiped;
    // ...
    return f;
}