Code Hygiene

Here are some random thoughts about how to organize code (on the small scale — organizing code at a large scale is a whole different thing). I’ve broken it up into 2 sections. This post is by no means complete; for a much more complete story, I highly recommend The Practice Of Programming by Kernighan and Pike.

Factoring (And Re-Factoring) Code

Make each function/method do one thing and do it well. This is the principle of compositionality: It’s easier to compose minimal, single-purpose functions to get an understandable complex solution than it is to bake all the complexity into 1 function. It’s also much easier to test minimal functions.

Write unit tests for each unit (function/method). With tests, you can ensure that bugs are really fixed, and can later change the code with more confidence that you are not introducing new bugs. Writing tests seems like annoying, boring work at first, but you’ll be glad you did.

Prefer functions that only read their inputs. Some functions cause side-effects (changes in state) to their arguments; it is hard to fully understand and test such functions. If you need to create a new value from the inputs, prefer read from the inputs and return a new value. Many languages, like Python and Go, allow you to return multiple values and assign multiple variables:

def read_foo_object(input_stream):
    """Reads 1 `Foo` object from `input_stream` (a readable file object). Returns the `Foo` and/or any errors."""
    # ...
    return foo, error

foo, error = read_foo_object(file("foos.txt", "rb"))
if error is not None:
    # Oops
    # Use foo

Document each input and each output of each function. It’s the polite thing to do.

Names are documentation, and crucial to understanding. Object class names (and instance names!) should be simple nouns or noun phrases (Vehicle, EmergencyVehicle, SerializableSet, UtilityMuffin). Function/method names should be infinitive verb phrases (extract_juice, inspect_vehicle, is_purple, get_color, set_color). The names of arguments to functions should be complete words, not terse abbreviations.

Types are documentation, and crucial to understanding. Use the language’s type declaration system to ease the reader’s understanding of your code. (The reader might be you, 6 months from now.) If, like JavaScript and Python, your language does not have explicit type labels, you’ll have to emulate them in your documentation.

Don’t bake in hidden parameters. Prefer this:

def compute_ema(values, starting_value, alpha=0.5):
    """Given `values` (any iterable), yields the running exponential moving average."""
    previous_value = starting_value
    for value in values:
        value = alpha * value + (1 – alpha) * previous_value
        yield value
        previous_value = value

to this:

def compute_ema(values):
    """Given `values` (any iterable), yields the running exponential moving average."""
    previous_value = values[0]
    for value in values:
        value = 0.5 * value + (1 – 0.5) * previous_value
        yield value
        previous_value = value

When you pull hidden parameters out into the call interface — potentially using default values for convenience, if the language supports that feature — you make the function more general and more useful to more callers.

Version Control

Commit to your revision control system early and often. This way, you can recover any previous working version, figure out exactly which change went wrong, and repair much more easily.

Make each commit do one thing. Don’t commit 3 different changes in 1 commit. It’s much easier to understand and use your commit log when it contains many commits (small or potentially large) that each make 1 meaningful change to your code, than when it contains commits (small or large) that make unrelated changes in unrelated areas of the code.

Write meaningful and complete commit messages. Don’t use the -m option to “git commit ...”, because it’s hard to write complete sentences and (yes) paragraphs in your commit message from the command line. Associate your favorite text editor with git, which makes it easier to write full commit messages.

The top line of the commit message should be 1 concise sentence that would fit in the subject line of an email. (Many development teams set up a bot that sends emails for each commit.) The top line should summarize the change; if you can’t summarize the change in 1 line, the change may be too large for 1 commit.

Then there should be a blank line, followed by 0 or more paragraphs explaining the change in more detail. (These paragraphs might, for example, make up the body of an email message.) This is documentation for your future self and your teammates about what this change does, so be clear and complete (but not verbose). Prefer commit messages like this:

Ensure files are always closed when copy_music_files returns. In some cases, the function copy_music_files would return before closing all files. This change fixes

to messages like this:

close feiles