Philip Potter

  • Open for extension?

    Posted on 23 September 2014

    Suppose you’re writing Java, but because you don’t like pain all that much you’re using Google’s Guava library to make managing collections easier. You have a class which needs to return a concatenation of two lists, so you use Guava’s Iterables.concat() static method:

    public class CertificateStore {
        private List<Certificate> myCertificates;
        private List<Certificate> partnerCertificates;
    
        public Iterable<Certificate> getCertificates() {
            return Iterables.concat(myCertificates, partnerCertificates);
        }
    }
    

    Now a consumer wants to add these certificates to an existing collection. So they try to use the Collection.addAll() method:

    List<Certificate> certs = new ArrayList<>();
    certs.add(otherCertificate);
    certs.addAll(store.getCertificates()); // ERROR!
    

    The problem is that Collection.addAll() takes a Collection argument, not an Iterable. The frustrating thing is that addAll() isn’t doing anything which requires a Collection instead of an Iterable: it’s just iterating through the elements of the collection, adding each one in turn.

    This is almost certainly because Collection was introduced in Java 1.2, but Iterable was only introduced in 1.5.

    The functionality we want is implementable using only public methods of Collection, so you could write your own static method to do it, as Guava have with Iterables.addAll(Collection,Iterable). Then the consuming code would look like this:

    List<Certificate> certs = new ArrayList<>();
    certs.add(otherCertificate);
    Iterables.addAll(certs,store.getCertificates()); // okay now
    

    This works, but uses a very different syntax. Guava’s extension to Collection just looks different. It’s not a first-class usage of Collection. It’s not obvious at a glance that certs is the thing being modified here.

    What we ended up doing instead was this:

    public class CertificateStore {
        private List<Certificate> myCertificates;
        private List<Certificate> partnerCertificates;
    
        public List<Certificate> getCertificates() {
            return ImmutableList.<Certificate>builder()
                .addAll(myCertificates)
                .addAll(partnerCertificates)
                .build();
        }
    }
    

    By returning a List instead of an Iterable, we could use Collection.addAll() directly, resulting in clean, consistent code for the consumer, at the cost of slightly more verbose (and slightly less efficient) code on the producer side.


    What this example demonstrates is that being open for extension isn’t just about what’s possible. It’s also about making extensions look just like the code they are extending. Users will know the core language features and libraries; code that doesn’t resemble those will add a roadblock to readability.

    The obvious fix to this problem is to allow people to extend classes with their own method, a practice known in the Ruby community as monkey-patching. This is undesirable for a number of reasons: methods are not namespaced, so if two different consumers of a library create a method with the same name, they will conflict; further, monkey-patching often allows access to private internals of a class, which is not what is wanted here.

    Let’s imagine what the example would look like using Clojure’s protocols. Suppose I am using a Collection protocol, with an add function but no addAll function at all:

    (defprotocol Collection
      (add [c item])
      ;;... other methods
      )
    

    Now I want to add all elements from a Java Iterable to my collection. If I wrote the code out in full, it might look like this:

    (def certs (create-list)) ;; implements Collection protocol
    (add certs other-certificate)
    (doseq [cert (seq (get-certificates store))]
      (add certs cert))
    

    If I want to define a function to encapsulate this iteration, it might look like this:

    (defn add-all [coll items]
      (doseq [item (seq items)]
        (add coll item)))
    

    And now, my consuming code looks like this:

    (def certs (create-list))
    (add certs other-certificate)
    (add-all certs (get-certificates store))
    

    There’s no visual difference between the core protocol function add, and my extension function add-all. It’s as if the protocol has gained an extra feature. But because my extension is an ordinary function, it is namespaced. If we add explicit namespaces in, it might look like:

    (ns user (:require [collection.core :as c]
                          [iterable.core :as i]))
    
    (def certs (c/create-list))
    (c/add certs other-certificate)
    (i/add-all certs (get-certificates store))
    

    Namespaces are a honking great idea! And in this case, my extension is namespaced, so even if someone else does the same (or a similar but incompatible) extension, there won’t be a name collision.

    Clojure’s protocols aren’t unique in being open for extension in a transparent way. Haskell’s typeclasses and the Common Lisp Object System (CLOS) both also exhibit this feature. Being open for extension isn’t enough, if your extensions look like warts when placed next to core code. Extensions should be as beautiful as the code they extend.

  • Some gluten-free baking recipes

    Posted on 13 September 2014

    Just baked this afternoon: two gluten free baking recipes. Both turned out amazingly, so I had to document them for posterity!

    Gluten-free nankhatai

    These are almond biscuits spiced with a heavy dose of cardamon.

    This is based on this recipe. It’s awkward because the ingredients are measured in cups, whereas I much prefer to measure by weight, particularly for baking.

    Ingredients:

    • 50 g gram flour
    • 100 g ground almonds
    • 10 g cardamom pods
    • pinch of salt
    • 75 g icing sugar
    • 110 g butter
    • almonds to garnish

    For this recipe, I bought flaked almonds (ie thinly sliced almonds) and chopped them up in a food processor. This resulted in a delightfully irregular texture in the final product.

    Grind the cardamom with a mortar & pestle, sift with the gram flour and add the ground almonds and salt. Cream the butter and sugar together, then mix in the dry ingredients and knead. Put in the fridge for 15 minutes.

    Take the dough out, make into 16 small balls and put onto a greased baking tray. These balls will spread out during cooking so give them plenty of elbow room! Put a flake of almond on each and press gently. Put the baking tray back into the fridge for a further 15 minutes.

    Bake at 150℃ for around 20 minutes, taking care to ensure they don’t turn brown. Place on a cooling rack for 10 minutes. They will still be very soft when you take them out of the oven, but they will firm up as they cool.

    ../../../images/nankhatai.jpg

    As you can see, we were stuck with a baking tray that was far too small and our nankhatai soon invaded each others’ personal space. It didn’t stop them being delicious though!

    Gluten-free banana bread

    For this, we followed this recipe but with two substitutions:

    • instead of cherries and sultanas, we used figs
    • instead of rice flour, we used gram flour

    It worked an absolute treat – would definitely do again. It was good to have two recipes to use up the gram flour.

  • Being a developer on support

    Posted on 31 August 2014

    One of the things that gets talked about a lot in the devops world is that developers should support the applications that they build. However, something I’ve seen is developers getting severe culture shock from their first experience of support. Support work is a totally different style of work from ordinary development, and there’s often little or no explanation or instructions as to what you should be doing, and how you should do it.

    The first problem: interruptions

    Ordinary development work eschews interruptions. You work on hard problems that often don’t have immediately obvious solutions (or worse, have immediately obvious wrong solutions). You may need to go through several iterations of design before finding one that solves the problem adequately. You hold a lot of context in your head, so an interruption can be incredibly disruptive.

    Support work is made of interruptions. You will get interrupted by alerts going off, integration partners emailing, people warning you of upcoming deploys, people needing help with something, people wanting reports taken from the production systems. In between this, you will want to try to get other work done, such as increasing the signal/noise ratio of the alerting system, cleaning up code, automating support processes, or updating support documentation. Which leads us on to:

    The second problem: multitasking

    Ordinary development work focuses on a single task at a time. Humans are bad multitaskers, particularly when it comes to complex tasks like development work. The context you store in your head when working on a feature gets lost when switching to another task.

    In support work, multitasking is a fact of life. If a production alert goes off while you’re updating documentation, you have to stop writing and start fixing. When the alert is resolved, you may then have to schedule a post-mortem, before finally getting back to the documentation you were writing. Many of your tasks will require a small amount of effort, but will block on something external – such as sending off a certificate signing request to a CA, or waiting for a release to be signed off by the release manager before deploying it. In my experience, you’ll have three to six tasks in progress at any one time.

    When on support, I’ve seen developers try to work on features in between interruptions. For the reasons listed here, this is a bad idea. Development work and interruptions work do not mix well. Development work requires a lot of context to be held in the head at once, and interruptions and task-switching are disproportionately more disruptive to development work than other kind of work.

    The third problem: juggling priorities

    In my experience as a developer, priorities are decided for us at our morning standup meeting. Everyone gets assigned to work on a story card for the day. If you finish a card, you pick the next one up off the backlog. It’s nice and simple, it requires almost no effort, and you don’t have to do it that frequently, so you can focus on writing code. There’s normally a product manager or business analyst whose job is worrying about priorities so that you can worry about getting work done.

    In support work, new tasks can appear throughout the day: as user tickets, production alerts, or requests from colleagues. They will appear when you are in the middle of something else. Some of them will be less important than the task you are currently working on, so need to be captured and filed for later. Some will be more important, and so you will need to drop your current work to field the new task. You will need to make frequent snap decisions about whether or not to park the current task to work on the new urgent task, or to just capture and file it for later that day. Making frequent, rapid prioritization decisions – often with your head still in the context of some other task – is an alien experience to someone used to more typical development work.

    The solution

    If you want to have a good time as a developer on support, you are going to need a system of some sort. You won’t be able to keep track of all incoming tasks in your head, so you need a way to capture them. You also need a way of keeping track of tasks that are in progress but blocked on some external dependency or parked while a higher-priority task is performed. Importantly, your system must be very lightweight: it should be optimized for speed of the most common operations: task capture, prioritization, and identifying work in progress and blocked tasks. Speed is far more important than extensive features. I recently worked for a 5-day period on the GOV.UK support rota, during which my system recorded that I got through around 25 individual tasks. If my system were too heavyweight, I might spend as much time on task capture and prioritization as on real work!

    Once you have rapid task capture and prioritization, then you may wish to think about capturing context: how much work has already been done on this task? Which source files need to be edited to fix this bug? Who am I waiting for a response from before I can continue working on this task? Not all tasks will require context recorded, but storing even a tiny amount of context can help manage a multitasking workload where tasks often get preempted by higher priority incoming tasks.

    The simplest possible system is pen and paper. It supports efficient capture by scribbling a note down; it supports identifying work-in-progress by drawing a box by each item and ticking it when it gets done. Prioritization is not supported so well, as it’s not so easy to reorder items in a paper list than in some sort of digital form; but if the list doesn’t grow too big, this isn’t too much of a problem.

    Example paper ticklist.

    There are plenty of digital systems available too. Some people on my team have been using Trello to track support work. This has the added benefit that you can share the same trello instance between multiple people: useful if you have two or more support workers on the rota concurrently. It is supremely lightweight – task capture, tracking, and prioritization are all operations which take seconds rather than minutes.

    Example trello wall.

    I definitely wouldn’t use a more heavyweight project management tool such as JIRA for tracking tasks. Remember that you will normally be capturing new tasks as a result of an interruption while working on an existing task; the longer task capture takes and the more complex the workflow to capture a task, the more context you will lose on what you’re currently working on. Task capture in Trello is extremely fast: in JIRA, it’s both painfully slow, and takes you through multiple screens with screeds of options, requiring more focus to navigate through. Your system needs to be fast and lightweight.

    Whatever system you end up with, it has to be one that works for you. Try something out, find out what works and what doesn’t, experiment with it by making changes, and keep reevaluating and iterating. For further ideas on evolving your system, the book Time Management for System Administrators is full of useful suggestions.

  • London emacs meetup, 20th May 2014

    Posted on 21 May 2014

    Last night was the second London emacs meetup. Here are my rough-and-ready notes, taken in org-mode, like all my notes.

    emacsspeak: emacs for visually impaired people

    intro, @bodil & @dotemacs

    • welcome!
    • format:
      • one or two short talks
      • about things that are useful for all emacs users
      • then afterwards, gather around particular fields of interest

    talk: writing major modes (by @dotemacs)

    intro

    • last time, @bodil gave a talk
    • definition:
      • “encapsulating a set of editing behaviours”
    • book: GNU Emacs Extensions, O’Reilly
    • Yukihiro Matsumoto (matz)
      • “I started Ruby development with influence from emacs implementation”
      • “but as an emacs addict I needed a language mode”
      • “auto-indent was a must”
      • “back in 1993, there was no auto-indenting language mode for a language with such syntax”
      • “If I couldn’t make ruby-mode work, the syntax of ruby would have become more C-like”
    • Bob Glickstein & Scott Andrew Borton

    modes

    • two types: major & minor
    • (define-minor-mode ...)
      • keymap (specific for the minor mode)
      • variable <name>-mode
      • command called <name>-mode
        • toggles the mode
      • add it to a hook
      • (add-hook 'some-mode 'your-mode)
    • major mode:
      • memorable name :)
      • hooks (<name>-mode-hook)
      • syntax table
        • defines how things should appear (highlighting)
      • entry function (<name>-mode)
      • syntax highlighting; two options:
        • optimized regexes
          • Q: what does “optimized” mean here?
            • eg matching keywords would mash together into a mega-regexp
        • use regexp-opt
          • takes a list of strings, and outputs a regexp which matches all of those things
          • aside: rx module compiles sexps to regexes
      • indentation
      • syntax table “behaviour” & movement
        • how you jump between functions, classes, & other constructs the language has
      • remove all buffer-local variables
      • set the variables
        • major-mode to <name>-mode
        • mode-name to string “name”
      • keymaps; two options:
        • sparse-keymap
          • suitable for up to half a dozen
        • define your own, otherwise
      • bind mode to files
        • (add-to-list 'auto-mode-alist '("\\.bar" . foo-mode))
      • run user defined hooks for the mode
        • defined with <name>-mode-hook
      • provide mode
        • (provide '<name>)
    • cookies
      • ;;;###AUTOLOAD
    • “this sounds like a lot of work, why reinvent the wheel”?
      • cookie cutters
        • sample-mode (available on emacswiki)
        • derived-mode (part of emacs)
    • checkdoc
      • good tool to run once you have defined a mode
      • kind of a lint tool
    • ecukes
      • framework that allows you to write tests in:
    • espuds
      • step definitions

    questions:

    • keybinding conventions?
      • Who gets to claim the C-c space?
      • it’s really complicated
        • there’s seven or eight layers
        • minor modes override major
      • C-x normally reserved by emacs
      • C-c C-<something> major mode generally
      • C-c <something> own use (shouldn’t be modes)
      • super and hyper of course
        • windows key & os x command
    • why do I get so many compilation warnings when installing modes through package-install?

    tangent: eshell

    • written by someone working on windows and didn’t have a decent shell
    • redirecting straight into a buffer is nifty
    • all interactive commands are bound as shell commands
    • Plan9 features built in

    tangent: testing

    • why use ecukes and espuds?
      • why can’t I just do my testing in a repl?
        • you’ll have to restart your emacs a lot because your tests will be messing global mutable state
        • automating a test suite would make things more convenient than manually using C-x C-e
    • is there a good mode for editing these tests?
      • you could use cucumber-mode
      • is there a way to jump to the step-definitions?
    • is there something between cucumber and C-x C-e for testing?
    • is there a way to test keybindings without using ecukes?
      • ecukes has the most momentum (seemingly)
    • dash.el does tests nicely
      • the README has some examples which are the unit tests
      • see magnars/dash.el

    tangent2: magnars emacsrocks talk

    ideas for bird of feather groups

    • highlighting (first)
    • multiple emacs woes (OS-supplied vs emacs 24)
    • eshell (dotemacs)
    • bulletproof emacs for the lightweight users (mickey)
    • sql in emacs (mickey)
    • org-mode (everyone later)
      • org-reveal?
    • haskell (later)
    • flymake/flycheck
      • jfdi (apart from java)
      • need to be aware of the lang-specific linter you need

    making the most of paredit/smartparens (bodil)

    from earlier discussion

    • “I made paredit work for python!”
    • comparison
      • paredit works out of the box
      • smartparens configurable to work in any mode
    • paredit for haskell!

    bodil’s config

    • bodil-smartparens (find it on github.com/bodil )
      • make smartparens behave as much like paredit as possible
      • turn on smartparens-strict-mode in your lisp mode hook
    • paredit’s M-<up> and M-r (splice-sexp-killing-backward and raise)
      • both useful for pulling expressions out of let bindings
    • bodil: I’d actually recommend paredit for lisps, but smartparens is useful for other langs
      • I’m actually using it for haskell
    • html tagedit – like paredit for html
      • syntax-aware killing
      • slurp and barf
      • magnar again :)
    • autoindent in curly-brace languages
      • ie when pressing RET in function(){|}
      • want to create new line indented, with curly brace on line following that
    • structural-haskell mode
      • sort of like paredit for haskell syntax

    eshell

    • interactive commands available as regular shell commands (eg find-file)
    • lots of commands replaced with emacs-friendly modes (eg man)
    • can work with tramp (eg cd /sudo::/etc; find-file passwd)
    • configuration goes in emacs.d/eshell
      • in particular aliases in emacs.d/eshell/aliases
    • buffer redirection
      • use C-c M-b to select and insert a reference to a buffer
      • for example:
        • echo foo >> #<buffer *scratch*>
    • plan 9 features

    tangent: webkit.el

    • takes an external webkit window and puts it on top of the appropriate emacs window

    tangent: fish

    • why fish?
    • sick of bash
    • completion is really cool
  • Refactoring systems

    Posted on 21 February 2014

    “Refactoring is the process of changing a software system in such a way that it does not alter the external behavior of the code yet improves its internal structure.” – Martin Fowler, in Refactoring

    As a developer who fell into operations work by accident, I often relate what I see in production systems to my experiences manipulating code in an editor. One thing I’ve been thinking about recently is refactoring. When I was working on Java day-to-day, I would routinely use IntelliJ IDEA’s refactoring support: extract method, extract class, rename method, move method. Refactoring was an intrinsic part of good design: you never hit the perfect design first try, so you rely on refactoring to gradually improve your code from the first slap-dash implementation.

    It actually took some time before I read the original refactoring book. If you haven’t read it, go read it now! Even 14 years after it was written, it’s still relevant. It describes how to do refactoring when you don’t have fancy refactoring tools: through a series of small, incremental changes, which at no point cause the code to break, you can evolve from a poor design towards a better design.

    If I have one criticism of the book, it’s that it doesn’t reach far enough. The book’s subtitle is “Improving the Design of Existing Code”; but I think it should have been “Improving the Design of Existing Systems”. The approach of taking small, incremental changes to an existing system, to improve the overall design while at no point breaking anything, is by no means limited to code. This is acknowledged by the 2007 book Refactoring Databases; but it also applies to other aspects of system design such as DNS naming, network topology and mapping of functions to particular applications.

    The importance of maintaining a working system throughout a refactoring is high when developing, but it’s critical when modifying a production system. After all, when you’re developing code, the worst that can happen is a failing test, but the consequences in production are much more severe. And you don’t get any of IntelliJ’s automagical refactoring tools to help you.

    Sometimes you will need to make a deployment to production in between steps of a refactoring: perhaps to ensure you have introduced a new service name before configuring any consumers of that service to use it. As a result, refactoring whole systems can be slower and more laborious than refactoring code. Nevertheless, it’s the only way to improve the design of an existing system, short of creating an entirely new production in parallel and switching traffic over to it.

    Here are some recent refactorings I have performed on production systems:

    Rename service

    Problem: the name of a service, as it appears in DNS, does not describe its purpose.

    Solution: introduce a new name for the service. Gradually find each of the service’s consumers, and reconfigure them to use the new name. When the old name is no longer needed, remove it.

    Rename entire domain

    Problem: the name of a domain does not describe its purpose.

    Solution: introduce a new domain as a copy of the old. Modify the search domain of resolvers in that domain to search the new domain as well as the old. Find any references in configuration to fully-qualified domain names mentioning the domain, and change them to use the new domain. When no more consumers reference the old domain name, remove it from the search path, and finally remove the old domain entirely.

    I recently performed this dance to rename an entire environment from “test” to “perf”, to give it a more intention-revealing name. Using shortnames as much as possible, rather than fully-qualified names, made the job much easier. (There are other reasons for using shortnames: by standardizing on particular shortnames for services, you reduce the amount of environment-specific configuration needed.)

    Change IP address of service

    Problem: you want to shift a service from one public IP address to another.

    Solution: introduce a route from the new public IP to the service. Test the service works at the new IP – the curl --resolve option is very useful for testing this without having to modify DNS or /etc/hosts entries. Ensure any firewall rules which applied to the old IP address are copied to apply to the new address. When certain that the new IP address works, change your public DNS record from the old IP address to the new. Wait for the TTL to expire before using the old IP address for any other purpose. Finally, remove any stale firewall rules referring to the old address.

    This might seem like an odd thing to want to do, so here’s the context: in one of our environments, we had three public IP addresses listening on port 443. I wanted to expose a new service on port 443, but restricted at the firewall to only certain source addresses, so it couldn’t be a shared virtual host on an existing IP. One of our IPs was used by a reverse proxy with several virtual hosts proxying various management services – graphite, sensu, kibana. Our CI server, however, was occupying a public IP all to itself. If I created a new virtual host on our reverse proxy to front the CI server, I could serve it from the same IP address as our other management services, freeing up a public IP for a new service with separate access control restrictions.

    Merge sensu servers

    Problem: you have two sensu servers monitoring separate groups of clients, when you would rather have one place to monitor everything.

    Solution: choose one of the servers to keep. Reconfigure the clients of the other server to point at the original server. Remove the now-unused sensu server.

    Note: this refactoring assumes that the two servers have equivalent configuration in terms of checks and handlers, which in our situation was true. If not, you may need to do some initial work bringing the two sensu servers into alignment, before reconfiguring the clients.

    Remove intermediate http proxy

    Problem: On your application servers, you are using nginx to terminate TLS and reverse proxy a jetty application server. You wish to remove this extra layer by having jetty terminate TLS itself.

    Solution: Add a second connector to jetty, listening for https connections. Reconfigure consumers to talk to jetty directly, rather than talking to nginx. Once nothing is configured to talk to nginx, remove it.

    In our specific situation, we were using DropWizard 0.7, which makes it easy to add a second https connector. DropWizard 0.6 assumes that you have exactly one application connector, and it’s either http or https, but not both. We have some apps that are running DropWizard 0.6; our refactoring plan for them involves first upgrading to DropWizard 0.7, followed by repeating the steps above.

    It’s not a catalog, it’s a way of thinking

    The original refactoring book presented refactoring as a catalog of recipes for achieving certain code transformations. This is a fantastic pedagogical device: by showing you example after example, you can immediately see the real-world benefit. Then once you’ve seen a few examples of refactoring, it starts to become natural to come up with more. The same tricks turn up again and again: introduce new way of doing things, migrate consumers from old way to new way, remove old way. This general scheme applies to all sorts of refactoring from simple method renames to system-wide reconfiguration.

    Data makes everything harder

    The easiest part of the system to refactor is the application server. Given a good load balancer, a properly stateless application with a good healthcheck, and a good database, you can create a completely new design of application server, add it to the load balancer pool, test everything still works correctly, and remove the old design.

    Refactoring the system at the data layer is much harder. Migrating data from one DBMS to another is painful. Splitting a single database server shared between two applications into two database servers is painful. For certain types of migration, the ability to put the system into read-only mode might be necessary (or at least incredibly helpful).

    You won’t get the design right first time

    The reason that refactoring is important is that you won’t get the design right first time. You will inevitably need to make changes to accommodate new information. And so when you wistfully imagine the system as you’d like it to be, you have to discover the small steps which will get you there from the system you currently have. The way to reach a truly great design is to start with an okay design and evolve it.

    I don’t claim any of this is new

    I’m sure that these strategies have been used by operations people for years, without calling it “refactoring”. My main point is that the activities of code refactoring and system refactoring are based on the same underlying principles, of identifying small changes which do not change external behaviour in order to improve internal structure.