Philip Potter

Open for extension?

Posted on 23 September 2014

Suppose you’re writing Java, but because you don’t like pain all that much you’re using Google’s Guava library to make managing collections easier. You have a class which needs to return a concatenation of two lists, so you use Guava’s Iterables.concat() static method:

public class CertificateStore {
    private List<Certificate> myCertificates;
    private List<Certificate> partnerCertificates;

    public Iterable<Certificate> getCertificates() {
        return Iterables.concat(myCertificates, partnerCertificates);
    }
}

Now a consumer wants to add these certificates to an existing collection. So they try to use the Collection.addAll() method:

List<Certificate> certs = new ArrayList<>();
certs.add(otherCertificate);
certs.addAll(store.getCertificates()); // ERROR!

The problem is that Collection.addAll() takes a Collection argument, not an Iterable. The frustrating thing is that addAll() isn’t doing anything which requires a Collection instead of an Iterable: it’s just iterating through the elements of the collection, adding each one in turn.

This is almost certainly because Collection was introduced in Java 1.2, but Iterable was only introduced in 1.5.

The functionality we want is implementable using only public methods of Collection, so you could write your own static method to do it, as Guava have with Iterables.addAll(Collection,Iterable). Then the consuming code would look like this:

List<Certificate> certs = new ArrayList<>();
certs.add(otherCertificate);
Iterables.addAll(certs,store.getCertificates()); // okay now

This works, but uses a very different syntax. Guava’s extension to Collection just looks different. It’s not a first-class usage of Collection. It’s not obvious at a glance that certs is the thing being modified here.

What we ended up doing instead was this:

public class CertificateStore {
    private List<Certificate> myCertificates;
    private List<Certificate> partnerCertificates;

    public List<Certificate> getCertificates() {
        return ImmutableList.<Certificate>builder()
            .addAll(myCertificates)
            .addAll(partnerCertificates)
            .build();
    }
}

By returning a List instead of an Iterable, we could use Collection.addAll() directly, resulting in clean, consistent code for the consumer, at the cost of slightly more verbose (and slightly less efficient) code on the producer side.


What this example demonstrates is that being open for extension isn’t just about what’s possible. It’s also about making extensions look just like the code they are extending. Users will know the core language features and libraries; code that doesn’t resemble those will add a roadblock to readability.

The obvious fix to this problem is to allow people to extend classes with their own method, a practice known in the Ruby community as monkey-patching. This is undesirable for a number of reasons: methods are not namespaced, so if two different consumers of a library create a method with the same name, they will conflict; further, monkey-patching often allows access to private internals of a class, which is not what is wanted here.

Let’s imagine what the example would look like using Clojure’s protocols. Suppose I am using a Collection protocol, with an add function but no addAll function at all:

(defprotocol Collection
  (add [c item])
  ;;... other methods
  )

Now I want to add all elements from a Java Iterable to my collection. If I wrote the code out in full, it might look like this:

(def certs (create-list)) ;; implements Collection protocol
(add certs other-certificate)
(doseq [cert (seq (get-certificates store))]
  (add certs cert))

If I want to define a function to encapsulate this iteration, it might look like this:

(defn add-all [coll items]
  (doseq [item (seq items)]
    (add coll item)))

And now, my consuming code looks like this:

(def certs (create-list))
(add certs other-certificate)
(add-all certs (get-certificates store))

There’s no visual difference between the core protocol function add, and my extension function add-all. It’s as if the protocol has gained an extra feature. But because my extension is an ordinary function, it is namespaced. If we add explicit namespaces in, it might look like:

(ns user (:require [collection.core :as c]
                      [iterable.core :as i]))

(def certs (c/create-list))
(c/add certs other-certificate)
(i/add-all certs (get-certificates store))

Namespaces are a honking great idea! And in this case, my extension is namespaced, so even if someone else does the same (or a similar but incompatible) extension, there won’t be a name collision.

Clojure’s protocols aren’t unique in being open for extension in a transparent way. Haskell’s typeclasses and the Common Lisp Object System (CLOS) both also exhibit this feature. Being open for extension isn’t enough, if your extensions look like warts when placed next to core code. Extensions should be as beautiful as the code they extend.