Philip Potter

  • Why does GPG need so much entropy?

    Posted on 14 March 2013

    Today I have a case of conflicting advice from different tools.

    I’m trying to generate GPG keys on virtual machines. The usual way of generating cryptographically secure keys is to use a good entropy source, such as /dev/random. However, this source gets most of its entropy from physical devices attached to the machine, such as keyboards, mice, and disks, none of which are available to a VM. Therefore, when generating GPG keys, it’s quite common to find yourself starved of entropy, with a message like:

    We need to generate a lot of random bytes. It is a good idea to perform
    some other action (type on the keyboard, move the mouse, utilize the
    disks) during the prime generation; this gives the random number
    generator a better chance to gain enough entropy.
    
    Not enough random bytes available.  Please do some other work to give
    the OS a chance to collect more entropy! (Need 280 more bytes)

    However, things got more curious when I looked at the manpage for /dev/random (which you can find with man 4 random):

    While some safety margin above that minimum is reasonable, as a guard against flaws in the CPRNG algorithm, no cryptographic primitive available today can hope to promise more than 256 bits of security, so if any program reads more than 256 bits (32 bytes) from the kernel random pool per invocation, or per reasonable reseed interval (not less than one minute), that should be taken as a sign that its cryptography is not skilfully imple- mented.

    I did a bit of reading into the meaning behind this. The argument goes something like this:

    • there is no cryptographic primitive available today which can guarantee more than 256 bits of protection
    • the attack we’re trying to protect against by using true entropy rather than a pseudorandom number generator (PRNG) is that somebody guesses the seed value
    • if guessing the seed value is at least as hard as just brute-forcing the key, then your PRNG is not detracting from the security of the system.

    Breaking a GPG key requires less than 256-bits' worth of effort: I don’t remember the details, but a 2048-bit GPG key requires something in the region of 100-200 bits' worth of effort to crack, as you are performing a prime factorization.

    So, why is GPG greedily asking /dev/random for 280 bytes of entropy, when all it conceivably needs is 32? I’m not sure, and I’d be delighted to learn, but it seems that /dev/random and GPG fundamentally disagree on what the contract is between them. What this means for me as a user, however, is that GPG is massively gorging itself on entropy from my entropy-starved VM, which means it takes forever and a day to generate GPG keys on a VM.

    Interestingly, OS X implements its /dev/random device differently; it uses Schneier, Kelsey and Ferguson’s Yarrow algorithm, which operates on a similar basis to that given above: once you have achieved a certain minimal level of true entropy, you can use that as a seed to a PRNG to feed cryptographic key generators with no loss of security in the system. That means that once it has gathered its initial 256 bits (or whatever) of entropy, OS X’s /dev/random will continue generating random bits effectively forever, making it a much better choice for PRNG on a VM.

    PS: Instead of brute-forcing the seed, there is a potential alternative attack against the PRNG, which is that someone finds a way to predict the PRNG output with much less computational effort than brute-force guessing the seed. But this is much the same kind of attack as “someone finds a problem with AES” or “someone finds a problem with GPG” — ie we presume our cryptographic primitives are good because no known attack against them has been discovered, not because we are able to prove that no attack is possible. Using true entropy instead of a PRNG guards against attacks against your PRNG, but you still need to worry about attacks against your crypto algorithm if you’re being that paranoid. IOW, I don’t think GPG’s strategy here seems to be the right tradeoff.

  • Interview with John Allspaw on continuous delivery

    Posted on 01 October 2012

    I just finished watching this great interview of John Allspaw on Devops and Continuous Delivery. John Allspaw is SVP of Tech Operations at Etsy.

    It’s worth watching the full talk, but here are some of the things I took from it:

    Gradually approaching continuous delivery

    “We may deploy 20 times a day, but we wouldn’t deploy 20 times a day if we went down 20 times a day. The only reason we got to 20 times a day, is that the first time we deployed 5 times a day, it worked out.”

    I have nothing to add to this.

    Devops does not mean one big team

    I particularly liked the question “What’s the role of operations in an organization that wants to practice devops?” There is an idea floating around that devops means that there should no longer be separate development and operations teams — and while there is a lot of merit in forming cross-functional teams, this doesn’t necessarily mean that we can (or should) do away with operations entirely.

    Certainly, product-focussed teams should be taking on a lot of what was traditionally operational responsibility — but they don’t have to take it all on. For example, John describes this process at etsy as freeing the operations from “reactive work” — eg deployments — and allowing them to focus instead on “proactive work” — eg designing infrastructure.

    Continuous delivery and database migrations

    I have previously been a developer, writing Java code and using dbdeploy to migrate the database schema in line with deploying a new version of code which requires a schema change. I have also been on a more operational team, where I was deploying other people’s ruby code to production using capistrano, where the deploy:migrations target handles database migrations in sync with an application deployment. Database migrations have always made me nervous — they are in general irreversible, and the database is such a core part of the system that a failed migration can be disastrous to try to recover from.

    John Allspaw has worked on places that deploy 50 times a day. He has fielded the question “If you deploy 50 times a day, how do you change the database 50 times a day?” The answer is quite simply: you don’t.

    He instead describes his previous experience at flickr, where frequent code deployments were enabled by separating code and database deployment. The database is migrated maybe once a week, and the schemas are in place before the code that needs to use those schemas is deployed.

    This is one of those obvious-in-hindsight revelations. Working with Clojure has previously encouraged me to look for entangled concerns in code and to decompose code into simple pieces. John’s solution to the database migration problem is the same approach in a different sphere — we want to deploy frequently, but database deployment is risky. Ergo, we should decouple code deployment from database deployment.


    That’s what I got from the talk, but he spoke about a whole bunch more topics beyond that. Give it a watch!

    Link again: John Allspaw on Devops and Continuous Delivery

  • SSL_do_handshake errors with nginx and haproxy

    Posted on 26 September 2012

    A short post about a problem we were having.

    If you are load balancing https traffic with haproxy in tcp mode, and you are fronting this with nginx, and you get 502 errors accompanied by these SSL errors in nginx’s error log:

    SSL_do_handshake() failed (SSL: error:1408C095:SSL routines:SSL3_GET_FINISHED:digest check failed)

    then you need to turn off the proxy_ssl_session_reuse option:

    proxy_ssl_session_reuse off;

    By default, nginx tries to reuse ssl sessions for an https upstream; but when HAProxy is round-robining the tcp connections between different backends, the ssl session will not be valid from one tcp connection to the next.


    UPDATE: @zaargy points out that the development branch of haproxy has https support. Awesome!

  • Specifying dependencies in rspec-puppet

    Posted on 28 June 2012

    This doesn’t seem to be specified anywhere in the rspec-puppet documentation, so I thought I’d leave it here for the moment. Suppose you have a puppet type which always depends on another:

    define foo () {
        file { "/etc/${name}": }
        Bar[$name] -> Foo[$name]
    }

    If you want to write an rspec-puppet unit test for this, it will fail because it can’t find the resource Bar[$name], unless you define it as a precondition:

    describe 'foo', :type => :define do
        let(:title) { 'my-foo' }
        let(:pre_condition) { 'bar { "my-foo" }' }
        it { should contain_file('/etc/my-foo') }
    end
  • Rpm, Ruby, and Bundler

    Posted on 06 June 2012

    I was on a project recently where we wanted to deploy a Ruby Sinatra application to a CentOS 6.2 production environment. Our means of distributing software to all our environments was RPM – we took our sinatra app, packaged it into an RPM, and stuck it in production. Installing all our software via RPM has certain advantages:

    • On a production system, all installed files belong to some RPM. This means that any given file in production can be traced, via the RPM it belongs to, back to the Jenkins job which created it and the source code version which defined it.
    • We were using puppet for configuration management, and puppet has good support for installing RPM packages via the package resource type. This resource type ensures idempotency and enables the ability to roll versions of software forward or backward with confidence.

    Any nontrivial ruby application will want to depend on some gems, and ours was no different. We used bundler to manage our gem dependencies. This carries its own advantages:

    • We don’t have to care about transitive dependencies. Bundler pulls them in for us.
    • Conversely, we can lock the transitive dependencies we’ve pulled in using Gemfile.lock and by checking our gems into vendor/cache.
    • Our gems are isolated from those belonging to other applications, so different apps can use different versions of the same gem in safety.
    • Bundler is quite capable of managing different sets of gems for build, test, and deployment. This means we can also control which version of rake, rspec, rack-test etc will be used to build and test our application in CI.

    We were also following the advice of vendor everything — we were running bundle package to download gem files to vendor/cache and checking them into source control. This practice means:

    • CI doesn’t need to talk to rubygems.org to build our app. We have faster and more reliable builds.
    • We are firmly locked to a particular set of gems. You might think that Gemfile.lock does this, but you’d be wrong. Gemfile.lock doesn’t by default carry specific versions of transitive dependencies, only constrained versions. For example, we use the passenger gem, which pulls in fastthread. Our Gemfile.lock has a dependency on fastthread (>= 1.0.1), meaning that we don’t know specifically which version of fastthread will be used. By vendoring everything, we know exactly which version of which gem is used by any given version of our source code, because it is the version saved in vendor/cache.

    However, we quickly hit a number of issues with bundler which made it difficult to package up our RPM satisfactorily:

    • Bundler wants to be run on the target machine at deployment time, not on the CI server at build time
    • Bundler has lots of implicit configuration in .bundle/config and in environment variables

    RPM and Bundler’s competing installation practices

    In order to get bundler to install the gems that we have previously stored in the vendor/cache directory, we need to run bundle install --deployment. The --deployment option combines all sorts of desirable options for a production environment:

    • Bundler normally installs gems to the system gem path. In deployment mode, they are installed to vendor/bundle instead. This provides isolation from gems used by other apps on the same machine.
    • Bundler normally will update Gemfile.lock if you have made changes to your Gemfile. In deployment mode, this is considered an error, as it indicates that the checked-in gems do not match those specified in the Gemfile.

    A fundamental question we had was: at what point in the build/test/deploy process should we run bundle install --deployment? The bundler docs are pretty clear about this: all of the deployment examples run bundle install on the target machine; the bundle install overview page says of --deployment: “Do not use this flag on a development machine.”, though it offers no reason why. (The man page says: “it will cause in an error when the Gemfile is modified”, but it doesn’t say why this will happen. For that, see the next section.)

    Conversely, the philosophy of RPM is pretty clear too: bundle install --deployment should not be run on the target machine, because it creates a vendor/bundle directory which does not belong to any RPM. This means that when we uninstall or upgrade the RPM, the vendor/bundle directory will be left behind, potentially poisoning the bundle for future versions of the app. We could add a %preun script in our RPM specfile to remove the bundle and the .bundle/config file, but it’s a hack. What we really want is to deploy our gems into their final locations on the CI server, and package them up into an RPM.

    It seems that bundler and RPM have competing design principles, so they don’t want to play nicely together.

    Bundler’s implicit state

    Bundler also has a confusing habit of implicitly creating and storing all sorts of state. There are two main culprits here: the .bundle/config file, and environment variables.

    The .bundler/config file, which lives in the same place as the Gemfile, is the reason that you shouldn’t run bundle install --deployment on a development system. Bundler will save state to this file about the installation that it has done: location of installed gems, excluded groups, whether or not the gemfile is frozen, etc.

    Bundler also sets up some environment variables which mean that bundler is not reentrant. We came up against problems during our build process, where within our Rakefile we had the line:

    bundle install \
      --path %{buildroot}/usr/lib/%{name}/vendor/bundle/ \
      --deployment \
      --binstubs %{buildroot}/usr/lib/%{name}/vendor/bin/ \
      --without test

    If we ran the rakefile using plain old rake package, it would create our package with no issues. However, we want to use bundler to manage all of our gems — build, test and production dependencies. We want to use a bundler-provided rake, not a system-installed one. But if we ran rake using bundle exec rake package, it would fail with the following errors:

    $ bundle exec rake package
    # ... lots of output ...
    + bundle install --path /home/ppotter/src/node-api/BUILDROOT/node-api-0.0.3-9001.x86_64/usr/lib/node-api/vendor/bundle/ --deployment --binstubs /home/ppotter/src/node-api/BUILDROOT/node-api-0.0.3-9001.x86_64/usr/lib/node-api/vendor/bin/ --without test
    Could not find rake-0.9.2.2 in any of the sources
    Run `bundle install` to install missing gems.

    This is confusing — bundle install --deployment shouldn’t care about the rake gem, because in our Gemfile we’ve declared it in the test group, which we are excluding using --without test. Furthermore, the working directory for this command is /home/ppotter/src/node-api/BUILD/node-api, which is different from the directory where we are running bundle exec rake package, so any /home/ppotter/src/node-api/.bundle/config file which the outer bundler process has created should not conflict.

    The error occurs because bundler achieves much of its magic by setting various environment variables. To prevent the outer bundler instance — the one that runs rake — from interfering with the inner bundler instance — the one that installs our gems in deployment mode to the BUILDROOT directory — we need to unset those environment variables:

    env -u BUNDLE_GEMFILE -u BUNDLE_BIN_PATH -u RUBYOPT -u GEM_HOME -u GEM_PATH \
    bundle install \
      --path %{buildroot}/usr/lib/%{name}/vendor/bundle/ \
      --deployment \
      --binstubs %{buildroot}/usr/lib/%{name}/vendor/bin/ \
      --without test

    Modifying the .bundle/config file

    The .bundle/config file (which lives in the same place as the Gemfile) contains configuration which tells bundler where it has installed its gems. If we want to package our bundler-installed gems into an RPM, we need to also package .bundle/config so that bundler will know where the gems live on the target machine. Here is mine, after running the above command:

    ---
    BUNDLE_WITHOUT: test
    BUNDLE_FROZEN: "1"
    BUNDLE_BIN: /home/ppotter/src/node-api/BUILDROOT/node-api-0.0.3-9001.x86_64/usr/lib/node-api/vendor/bin/
    BUNDLE_PATH: /home/ppotter/src/node-api/BUILDROOT/node-api-0.0.3-9001.x86_64/usr/lib/node-api/vendor/bundle/
    BUNDLE_DISABLE_SHARED_GEMS: "1"

    This is clearly going to cause problems if we package this file as-is, because the gems are not going to live in these directories but instead in /usr/lib/node-api/vendor/bundle. We need to strip the leading BUILDROOT path from the directories in this file before we can package it. We do this with a sed script in the %install section of the RPM specfile:

    sed -i -e 's,%{buildroot},,' %{buildroot}/usr/lib/${name}/.bundle/config

    I’m sure that this is not the “bundler way” of doing things, but as I have said before, bundler and RPM’s worldview are seemingly irreconcilable and something like this is necessary to get them to work together.

    Architecture-specific code

    The process of installing gems can also install system-specific extensions which will not be as portable as pure ruby code, nor as portable as the source .gem files from the vendor/cache directory. This is another reason for recommending that you run bundle install --deployment on the target machine rather than in the build environment.

    RPM, however, also has a way of coping with this portability problem, by marking packages as architecture-specific. If you don’t specify an architecture yourself, rpmbuild will even autodetect any system-specific binaries in your RPM and give it an appropriate tag. We relied on this behaviour and sure enough, our resultant RPM is considered x86_64 code rather than noarch. This is fine for our production environment, where all machines run the same hardware and OS.

    Installing bundler itself

    We still need bundler to be present on the target machine. We used the fantastic fpm tool to create a rubygem-bundler RPM, and made our node-api RPM depend on rubygem-bundler.

    We could have used fpm to package every single gem as a separate RPM. We didn’t go with this option, because it doesn’t enable separation of sets of gems used by different applications, and it doesn’t allow us to know exactly which gems will be used by a particular source code version.

    Future possibilities

    Bundler 1.1 (which hasn’t yet been released) provides a --standalone option which allows you to install gems in such a way that they don’t depend on bundler. I’d be very interested to investigate this option for packaging ruby apps as RPMs, although since we were running our ruby through Phusion Passenger I wonder whether it would work for us, or if it only works for the bundler-created binstubs.

    Outcome

    After all of this work, we have a solution which combines the advantages of both RPM and bundler:

    • Every file we create on the target machine belongs to an RPM, and can be traced back to a particular Jenkins job which created that RPM
    • The gems our application uses are isolated from any other gems on the target machine
    • We can use bundler to handle transitive dependencies, while at the same time locking in the versions of all gems for a particular source code version. This means we have confidence that exactly the same gems were used in testing on the CI server as are used in production.