Asserting The Value Of Music

Musicians are like Republicans: Unaware that they are often acting against their own economic interest. Rare is the musician who insists on being paid as much as a plumber. Even rarer is the musician who understands the basics of contracts and copyrights.

Music fans are also like Republicans: Convinced that roads and clean water are magical and need not be paid for. Why pay when you can infringe on copyrights for free? Why buy CDs or vinyl when you can stream for free? Who cares if the musicians can’t eat and have no health insurance?

Now that I’ve gotten your attention with partially- or fully-warranted insults, I’ll challenge you to prove that I am wrong and an ass. :)

If you are a musician, take this vow with me: I vow to not play any gig without a written contract and which pays at least some amount of actual dollars.

If you are a fan, take this vow with me: I vow to pay for all the music I listen to more than once (thus allowing for try-before-you buy browsing on Bandcamp and YouTube). I also vow to buy merchandise at any free (…“free”) performances I happen to go to.

Because I’m fortunate, I can afford to spend $15 per week on new (non-used — musicians don’t get paid when we buy used at Amoeba, even though buying used is efficient in other ways), independent music in DRM-free (Bandcamp, direct download, vinyl, CD) formats. I’ll blog (probably on Monday evenings usually) about all my discoveries so you can make sure I’m honest, and I’ll tweet with the hashtag #ValueMusic.

I hope you will value music too, and buy as much as your budget allows. Music doesn’t come from the Music Fairy, it comes from people who tend to have more dreams than food.

Posted in music, personal | Tagged ,

Code Hygiene

Here are some random thoughts about how to organize code (on the small scale — organizing code at a large scale is a whole different thing). I’ve broken it up into 2 sections. This post is by no means complete; for a much more complete story, I highly recommend The Practice Of Programming by Kernighan and Pike.

Factoring (And Re-Factoring) Code

Make each function/method do one thing and do it well. This is the principle of compositionality: It’s easier to compose minimal, single-purpose functions to get an understandable complex solution than it is to bake all the complexity into 1 function. It’s also much easier to test minimal functions.

Write unit tests for each unit (function/method). With tests, you can ensure that bugs are really fixed, and can later change the code with more confidence that you are not introducing new bugs. Writing tests seems like annoying, boring work at first, but you’ll be glad you did.

Prefer functions that only read their inputs. Some functions cause side-effects (changes in state) to their arguments; it is hard to fully understand and test such functions. If you need to create a new value from the inputs, prefer read from the inputs and return a new value. Many languages, like Python and Go, allow you to return multiple values and assign multiple variables:

def read_foo_object(input_stream):
    """Reads 1 `Foo` object from `input_stream` (a readable file object). Returns the `Foo` and/or any errors."""
    # ...
    return foo, error

foo, error = read_foo_object(file("foos.txt", "rb"))
if error is not None:
    # Oops
else:
    # Use foo

Document each input and each output of each function. It’s the polite thing to do.

Names are documentation, and crucial to understanding. Object class names (and instance names!) should be simple nouns or noun phrases (Vehicle, EmergencyVehicleSerializableSet, UtilityMuffin). Function/method names should be infinitive verb phrases (extract_juice, inspect_vehicle, is_purple, get_color, set_color). The names of arguments to functions should be complete words, not terse abbreviations.

Types are documentation, and crucial to understanding. Use the language’s type declaration system to ease the reader’s understanding of your code. (The reader might be you, 6 months from now.) If, like JavaScript and Python, your language does not have explicit type labels, you’ll have to emulate them in your documentation.

Don’t bake in hidden parameters. Prefer this:

def compute_ema(values, starting_value, alpha=0.5):
    """Given `values` (any iterable), yields the running exponential moving average."""
    previous_value = starting_value
    for value in values:
        value = alpha * value + (1 – alpha) * previous_value
        yield value
        previous_value = value

to this:

def compute_ema(values):
    """Given `values` (any iterable), yields the running exponential moving average."""
    previous_value = values[0]
    for value in values:
        value = 0.5 * value + (1 – 0.5) * previous_value
        yield value
        previous_value = value

When you pull hidden parameters out into the call interface — potentially using default values for convenience, if the language supports that feature — you make the function more general and more useful to more callers.

Version Control

Commit to your revision control system early and often. This way, you can recover any previous working version, figure out exactly which change went wrong, and repair much more easily.

Make each commit do one thing. Don’t commit 3 different changes in 1 commit. It’s much easier to understand and use your commit log when it contains many commits (small or potentially large) that each make 1 meaningful change to your code, than when it contains commits (small or large) that make unrelated changes in unrelated areas of the code.

Write meaningful and complete commit messages. Don’t use the -m option to “git commit …”, because it’s hard to write complete sentences and (yes) paragraphs in your commit message from the command line. Associate your favorite text editor with git, which makes it easier to write full commit messages.

The top line of the commit message should be 1 concise sentence that would fit in the subject line of an email. (Many development teams set up a bot that sends emails for each commit.) The top line should summarize the change; if you can’t summarize the change in 1 line, the change may be too large for 1 commit.

Then there should be a blank line, followed by 0 or more paragraphs explaining the change in more detail. (These paragraphs might, for example, make up the body of an email message.) This is documentation for your future self and your teammates about what this change does, so be clear and complete (but not verbose). Prefer commit messages like this:

Ensure files are always closed when copy_music_files returns.

In some cases, the function copy_music_files would return before closing all files. This change fixes https://bugtracker.example.org/123.

to messages like this:

close feiles

 

Posted in Hackbright, software | Tagged

Privacy And Security Settings in Chrome

Chrome has a lot of handy privacy and security options, but it isn’t always obvious how to use them. In this post I’ll demonstrate my favorites, and try to explain a bit about what they do.

My goal with these configuration changes is to get Chrome to expose less attack surface to potentially malicious web pages, and to be less chatty on the network. I definitely can’t and don’t guarantee that they will work for you or solve any particular problem you have. But maybe you’ll find this to be a fun learning experience. (Also, although I work for Google on the Chrome Security team, I am not blogging in any official capacity, and I don’t have an omniscient view of Chrome security.)

Chrome has a feature that allows you to create multiple “profiles”, each with their own distinct settings. Because we want to change the settings in a way that will make some web sites work less well (or even not at all), we won’t want to be locked in that mode. Therefore, we need to create a new, distinct profile to use as the private/secure mode. That way, you can always go back to a regular profile easily, to get normal web functionality.

First, create a new profile:

Create a new profile.

After creating the new profile, you get a new window running that profile (note the cat icon in the upper right corner):

After creating a new profile.

In this privacy- and security-sensitive special profile, do not sign in to Chrome. Signing in to Chrome, also known as Chrome Sync, is a convenient feature that syncs all your settings across all your signed-in Chrome profiles on all your devices, and makes it easier to log in to Google services. You might like it in your regular mode profile, but we want this profile be more loosely coupled to the cloud.

Go to the Settings page in the new profile’s window, and click on “Show advanced settings…” (shown here at the bottom):

Show Advanced Settings.

Scroll down to the Privacy section of the Settings page, and check or un-check the various options as you see fit. Here’s how I set them for this profile:

My preferred Privacy settings.

These options (except for Do Not Track) cause Chrome to send extra traffic on the network (some of that traffic is encrypted), and is a prime candidate for un-checking — especially if you intend to use Chrome with Tor. For more information, see the Chrome Privacy Whitepaper. (In particular, think carefully about disabling phishing and malware protection; see its section in the privacy whitepaper.)

Click on that Content settings… button here in the Privacy section, as well:

Block 3rd party data and clear all upon exit.

I’ve changed the Cookies and site data settings, as you can see: “Block third-party cookies and site data” means that when you are reading e.g. http://blog.example.com, an ad included  in the page from http://ad-company.com cannot set new cookies or site data. “Keep local data only until I quit my browser” means that Chrome will clear the locally-stored data (like cookies and HTML5 LocalStorage) when you quit. (This is similar to, but not exactly the same as, what Chrome’s Incognito mode provides.)

Scroll down and you will see many more options for Content settings. I’ll highlight some that are particularly important. First, block JavaScript by default:

Block JavaScript by default.

However, you can optionally re-enable JavaScript in HTTPS pages:

Enable JavaScript on HTTPS page loads.

I like to do this so that I can get rich JavaScript functionality in web sites like Twitter and Gmail that go to the trouble of authenticating themselves (and their code) using HTTPS — but sites serving unauthenticated junk cannot run JavaScript. It’s interesting how many sites still work without JavaScript. (Sometimes they even work slightly better.)

Next, we disallow external protocol handlers, and we block all plug-ins:

Disallow external protocol handlers and block all plugins.

Important note about blocking plug-ins: The “Click to play” option means that plug-ins are disabled by default, but that you can (left-)click on their area on the screen to run them. However, that left-click is clickjackable. It’s better to select “Block all”, which is really “right-click to play” — yes, you can still run plug-ins when you want to. To run plug-ins, right-click on their screen area, which brings up a native-type (operating system) context menu, and select Run This Plug-in:

Run This Plug-in

Thus, you can be ensured that plug-ins run only when you want them to.

Next, we disable location services and notifications:

Disable location services and notifications,

Disallow sites from taking over the mouse or capturing data from media sensors:

Disallow sites from taking over the mouse or capturing data from media sensors.

Turn off un-sandboxed plugins and don’t allow automatic downloads:

No un-sandboxed (NPAPI) plugins and no automatic downloads.

Do not remember passwords or form field entries:

Do not remember passwords or form field entries.

Tell Chrome not to auto-detect what language the page is in, to ask where to place each download, and not to fetch certificate revocation data:

Don't auto-translate, ask where to place each download, and don't fetch certificate revocation data.

Note that you can still use Google Translate by right-clicking on a page and selecting Translate to English (or whatever your native language is). Un-checking “Offer to translate…” disables the automatic language detection functionality.

We leave certificate revocation disabled by default because the protocol that does it can leak information about your browsing to a server.

Finally, visit chrome://plugins and affirmatively disable the ones you don’t need, for good measure:

Disable plug-ins.

Have fun!

Posted in software | Tagged , , ,

Followup To Downloading Software Safely

I’ve received some emails, tweets, and Hacker News comments about my post Downloading Software Safely Is Nearly Impossible. Thanks for reading and I hope you got a kick out of my mumblings.

I’d like to address some of the comments and questions people had, as briefly as possible.

  • Yes, you need a trusted computing base (TCB). I alluded to this when I said “You’re pretty sure the NSA did not interdict it during shipment, and thus that it comes only with the flaky goatware Microsoft, Lenovo, and any number of Lenovo’s business partners intended for it to have.” Our goal as security engineers is to limit the size of the TCB. It is, after all, quite goaty already…
  • The TCB includes the set of X.509 trust anchors for our TLS library.
  • Yes, I harp on and on about HTTPS. That is because authenticating the delivery channel — while not necessarily sufficient to indicate code integrity — is the bare minimum effort we should require from our software sources. Especially for software that is related to cryptography and security. Here is GnuPG’s bug tracker; would you want to log into it or report security-sensitive bugs using it? :

    GnuPG's bug tracker: Not inspiring confidence.

    Not inspiring confidence.

  • Also note that there are at least 2 different problems with HTTPS in that post: HTTPS not being available, and the HTTPS site differing in contents from the HTTP site. Again, for a software distribution site, we’d like something that smells a bit better.
  • Some people claim that PGP keyservers don’t need to use HTTPS, because the keys authenticate themselves with the web of trust. And it’s true, the WoT does allow us to fairly easily distinguish this fake key from this real key for EFF’s Seth Schoen. But,
    • Seth is one of the most well-connected people in the WoT, so a key with only 1 signature stands out as odd. Would a fake key for a normal person stand out so well? The WoT is not as good an authentication mechanism as we might hope it to be. As nice as it is, verifying software packages based on PGP keys we grab from key servers is thus not a slam-dunk alternative to or replacement for HTTPS — and nevermind the usability delta between HTTPS and PGP.
    • By now, we understand that metadata for communications is at least as valuable as the contents, in many cases. Shouldn’t PGP users have confidentiality in their directory lookups? Yes.
  • Yes, as many commentators noted, we should use something like Authenticode: binaries should be signed, and their signatures checked at run- or install-time. However, that still requires a TCB of code-signing trust anchors (the same companies that are your TLS trust anchors), and the difficulty for users of verifying the code authors is at least as difficult as verifying the authenticity of an HTTPS web origin. I.e., not super easy. Definitely better than nothing.
  • This is an extremely hard problem, no doubt about it. Although my post is very snarky and sarcastic, I don’t think it’s an easy problem. I also fight the problem uphill.
  • A big part of the solution is to isolate sources of code based on their cryptographic identity. This is how Android works, and it is how the open web works (when you use HTTPS or other authenticated origins). I’m not very knowledgeable about iOS, but I understand they also rely on code-signing and on sandboxing. If the isolation is strong, much of the risk is reduced — remember, a big part of my problem was that PuTTY (or any program) runs with the full privilege of my user account on the platform. Reduce the privilege, reduce the problem.
    • If course, now the problem is exposing a privilege-granting UI to users so that applications can share with explicit approval. One size does not fit all, and that continues to be a hard secure UX problem.
  • Finally, I’m not really a fan of Web Crypto. I think more mistakes will be made with it than successes; it’s just that I also think that of native-code crypto. The problem does not lie with the implementation environment, but with the (often perverse) incentives developers have combined with the high level of expertise needed to use cryptography appropriately and well. Clearly, all the people who hope to use web crypto to replace TLS, implement DRM, achieve security against the server that sent the JS, implement homebrew challenge-response protocols, and so on are in for a heartbreak. But still, there are potentially good applications for cryptographic algorithms exposed to JavaScript, and native code does not have a privileged place in crypto. If anything, due to the lack of privilege separation in legacy platforms, native code is a worse place to put powerful code.
Posted in software | Tagged ,

TrustyCon Recap and Video

Last Thursday we had a great time at TrustyCon, the trustworthy alternative to the RSA Conference. Many thanks are due to the conference sponsors and organizers: iSEC Partners, EFF, and DEFCON. My old iSEC boss Alex Stamos, and lots of EFF employees, put in a lot of volunteer hours to make the conference a success. Thank you!

I really like single-track conferences. Everyone has a shared experience, and there is much less of the “lobby-con” or “bar-con” phenomenon, which I don’t enjoy much. This only works if all the talks are good, and at TrustyCon they sure were. My favorite was Ed Felten’s talk that closed out the day. Annalee Newitz has a good write-up of it on io9.

The entire conference is on YouTube. My talk, co-presented with Dan Boneh, starts at about 4:33:00 — yes, 4 hours 33 minutes; the entire conference is 1 long video. The sound cuts out at the beginning but then it comes back quickly, don’t worry. Boneh’s topic, cryptographic software obfuscation, should amuse and disturb you equally. :)

Posted in personal, software | Tagged ,

Downloading Software Safely Is Nearly Impossible

Let’s say you have a brand-new Windows laptop and you’re just oh, so happy. You’re pretty sure the NSA did not interdict it during shipment, and thus that it comes only with the flaky goatware Microsoft, Lenovo, and any number of Lenovo’s business partners intended for it to have. Now all you need is an SSH client so that you can connect to your Linux machines, and all will be peachy. Here is how to get an SSH client.

  1. Do a web search for [ windows ssh client ].
  2. Follow the first hit to http://www.putty.org/. Now, since you want to get the good and true PuTTY that Simon Tatham wrote, and not some unauthenticated malware, you check for the lock icon and the “https://” URL scheme. It’s not there — worrying, considering that Tatham is supposedly an encryption software developer.
  3. No need to worry, though; putty.org is not even owned by Tatham. It’s currently owned by someone named “denis bider”, who presumably just likes to domain-squat on other people’s product names and provide links. OK. Let’s follow the link to…
  4. http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html. Ahh, this has Tatham’s name right in the path part of the URL, so… wait, is that good? Actually, no; only the hostname can indicate site ownership. Richard Kettlewell currently owns greenend.org.uk.
  5. Look for, and fail to find, the lock icon and the “https://” URL scheme. Again, shouldn’t cryptography and security software — like all software — be delivered always and only via an authenticated service?
  6. Manually add the “https://”. Note that the site does not respond to HTTPS. Begin to doubt that this is the right site.

    PuTTY is not available via HTTPS.

    PuTTY is not available via HTTPS.

  7. Not to worry! Scroll down and note that Tatham offers links to RSA and DSA cryptographic signatures of the binaries, e.g. http://the.earth.li/~sgtatham/putty/latest/x86/putty.exe.RSA. Note that earth.li is currently owned by Jonathan McDowell. When you click the link to the signature, you do indeed get an RSA signature of something, but there is no way to know for sure who the signer was or what they signed — any attacker who could have compromised the site to poison the executable PuTTY programs (or performed a man-in-the-middle attack on your connection to the site) could also just as easily have compromised the signatures.
  8. Attempt to download the signature via HTTPS instead, https://the.earth.li/~sgtatham/putty/latest/x86/putty.exe.RSA, and note that the server responds with a 404. Become increasingly suspicious.

    Is this a bad sign? It feels bad.

    Is this a bad sign? It feels bad.

  9. Take a breather to read Tatham’s explanation of how overly-complex his signing infrastructure is, but not why the delivery channel is anonymous.
  10. Briefly wonder if Tatham’s PGP keys are noted in a central registry, such as MIT’s PGP key server. Nope.
  11. Briefly wonder if it matters that MIT’s PGP key server is unauthenticated.

    The MIT key server is unauthenticated.

    The MIT key server is unauthenticated.

  12. Recall that even if you could get Tatham’s PGP key from an authenticated key server, you’d still need to download a PGP program. Rather than repeat the steps in this tutorial for GnuPG, give up and decide to download an unauthenticated copy of PuTTY.
  13. Note that Tatham refers you to http://www.pc-tools.net/win32/freeware/md5sums/ for an MD5 calculator for Windows, and briefly consider at least checking the anonymous (hence useless) MD5 digest for PuTTY. Noting that http://www.pc-tools.net also does not respond to HTTPS, forego that waste of time.
  14. Having downloaded putty.exe, think long and hard before clicking on it. Note that when you execute it, it will run with the full privilege of your user account on this Windows machine. It will have the ability to read, delete, and modify all your documents and emails, and will be able to post your porn collection to Wikipedia.
  15. Hope that it does not.
  16. Click on putty.exe anyway. Connect to your account on your Linux server, which is now also under the control of an unauthenticated program from the internet. Consider that, if the download was not poisoned, this thing calling itself “PuTTY” was written by a developer who might know how to implement RSA in C, but who does not know how or why to use RSA. (Are you even connected to your real Linux server, at this point? Hard to know.)
  17. Note that, suddenly, Web Crypto is starting to look damn good despite the objections of the native code chauvinists. At least JavaScript runs under the same origin policy and is sandboxed by Chrome’s multi-process model, so it wouldn’t have the full run of your Windows user account.
  18. Despair.
Posted in software | Tagged , | 2 Comments

Maps And Their Applications

This morning I was hanging out with my Hackbright mentee, and we discussed how one of her programming problems could be solved using a Python dictionary or JavaScript object. In fact, you can use a dictionary in lots of ways. Here are some:

  • As a set (an unordered group of elements, each element appearing only once). For example, you can get the unique elements of a list or array by collapsing them into the keys of a dictionary.
  • As the underlying storage for the fields of a dynamic object. “Dynamic object” here means “an object that gets fields added or removed”. E.g. in JavaScript, you can add new fields and values at any time to an object, while in Java or C the fields of an object are static and unchanging once defined. (And, for that reason, accessing the fields of a Java or C object can be done in a much faster way.)
  • As a sparse array.
  • As the storage for a memoized function (which can, in turn, be a way to optimize expensive functions, including expensive recursive functions). With Python decorators, you can easily memoize any function, and the canonical way to do that is with a dictionary.
  • Many other cache applications.
  • Data compression.

For the rest of this post I’ll use the term map to refer to what various languages/APIs/sources in the literature call dictionaries, hashes, “objects” (only? in a JavaScript context), associative arrays, symbol tables, or tables. A map is any data structure that groups a dynamic number of key-value pairs together, and allows us to retrieve values by key, to insert new key-value pairs, and to update the values associated with keys. We almost always require that the data structure allow us to do these operations very quickly. (After all, we’re going to be using them all the time to solve lots of problems!)

Most, but not all, languages come with some kind of map interface built-in. Notably, the C language does not (but C++ does).

You can implement a map in many ways. Here are some examples:

  • A simple linked list of pairs. This will be slow (O(n)), and insufficient for general use.
  • A hash table. This is generally very fast (roughly O(1)), although it requires you to have a good hash function for your key type. Most languages allow you to provide a custom hash function for your object types (e.g. the __hash__ method in Python, or the hashCode method in Java).
  • A binary search tree. Fast (O(lg n)). Unlike in a hash table, the keys will be ordered.  This could be a useful property.
  • A skip list.

Note that although a general-purpose map is useful for many problems, it is not always ideal for a particular problem. For example, although you can use a map as a set, it’s a bit of a waste of space — it’s a set of key-value pairs, but you only need the key. Similarly, maybe you need not only to take the unique elements from a group of elements, but also to print them out in sorted order. If your dictionary type is implemented as a hash table, you’ll have to sort the keys (at an additional cost of O(n lg n) plus the memory allocation to store the keys in an array). By contrast, if your dictionary is implemented as a binary search tree, after you are done inserting all the elements, they’ll already be sorted. (The trade-off is that it might have cost more to insert the elements. When in doubt, test!)

Because one size does not fit all, many languages provide a variety of map and set implementations. For example, C++ has map, set, unordered_map, and unordered_set. The Java Map interface has many implementations. Other languages, like Python, Perl, Ruby, and JavaScript, provide just one map implementation — usually a hash table of some kind. Some, like Python, also provide a distinct set type or API, and you should use it when it fits your needs.

It’s a good exercise to implement a map yourself, in at least one way, in at least one language. I recommend starting with a simple hash table, and then working up to a good binary search tree like a red-black tree.

Posted in Hackbright, software | Tagged , ,