Dockerization pt 3: Deploybot

One of the key components to making a good integration between products is understanding the mental model of each product. What one product calls a “counter” another could call a “metric” or a “stat”, for example; or worse, one product could be reporting, say, the amount of free memory, while another is reporting the percentage.

The same goes for integration points between teams. When I built our previous release pipeline, I discovered very quickly that while developers were comfortable talking about repositories, operations thought in terms of applications, and neither knew nor cared how many moving parts went into a single app so long as it was all on a server together. This resulted in a system in which devs had to create a long, complex manifest to explain to ops exactly what code should be deployed by the night shift when, and a number of errors came out of that process.

I knew if I had per-container deploy buttons in production, like I did for pre-production, we’d have similar issues. Developers want to push single containers out into test environments, but operations wants to deploy a single application, with as little fuss as possible, and a quick painless rollback if they notice problems. So when it came time to move to containers in production, I designed them a custom application to do just that. I called it Deploybot.

Rancher ships with a catalog of common apps that you might want to deploy in your container environment. It also comes with the ability to make your own catalogs. A catalog consists of a docker-compose and a rancher-compose; the former is a standard format for spinning up a set of containers that depend on each other as a single unit, while the latter is a Rancher-specific extension to the format that allows the product to add extra metadata to the containers once they’re spun up. In the GUI, the catalog makes it really simple to deploy a whole application at once: just a few clicks and you can have a running app within seconds. Furthermore, when new versions are released to the catalog, it’s only two clicks to upgrade to the new version, which updates only those containers which have changed.

Not only does this let us quickly upgrade, it keeps a history of everything that’s ever been deployed. This means we can easily spin up a version in develop that is identical to this week’s production environment — or last week’s, or next week’s. Furthermore, we can roll back easily to a given deployment in the event of an emergency, and we have a record in the UI of every deploy that’s been done.

I determined quickly that the API would let me upgrade from the catalog if a new version had been pushed. From there, my biggest problem was logistics. Let’s say an app has 4 containers in it. At present, we had 4 build plans, each of which built a single container and pushed it to artifactory. When deploying a container at a time, we could easily just use the Rancher API to push that container to our test environment. But for a catalog update, things were more complicated.

Step 1: Mark for Release

The first problem: how do we determine if a container has passed testing when the final phase of our testing is still manual? Artifactory has a system of properties on a given artifact; you can do a search using the API for only containers that have a given property. So I added a deploy target called “Mark for Release” after our development and test targets; when a given asset is ready to go to production, the lead developer for the team “deploys” to that “target”. Because I’m a masochist, I did this in node.js:

addLabel (args) { // eslint-disable complexity no-console

            if (!args || !args.container || !args.version || !args.label || typeof args.value === 'undefined') {
                return Promise.reject(new Error('Incorrect parameters: please supply container, version, label, and value'));
            }
            console.log(`Adding label ${args.label}=${args.value} to ${args.container}:${args.version}`);

            const {container, version, label, value} = args;

            return request({
                method: 'PUT',
                uri: `${opts.uri}/artifactory/api/storage/docker-local/${container}/${version}?properties=${label}=${value}`,
                auth: {
                    user: opts.user,
                    pass: opts.password,
                    sendImmediately: true
                }
            });
        }

This is just a wrapper around a PUT request to artifactory; docker-local is our local docker repository, as you can tell by the oh-so-creative name. Later logic will pull the list of items with that label and find the highest-numbered container that is ready to be released. At first, this worked great… right up until someone decided they wanted to fall back to an older version of a container and we had to manually remove the label from the newer container. Then I added guard conditions:

markForRelease (artif, args) {
        return artif.addLabel({
            container: args.container,
            version: args.ver,
            label: 'release',
            value: 'true'
        }).then(() => artif.getAllItemsByLabel('release', 'true'))
            .then((items) => artif.filterDockerResults(items))
            .then((containers) => {
                const promises = containers.map((item) => {
                    const [, name, version] = artif.parseVersion(item);

                    // Guard: only replace the current container, with versions higher than this one
                    if (name !== args.container) return Promise.resolve();
                    if (Number(version) <= Number(args.ver)) return Promise.resolve();

                    return artif.addLabel({
                        container: args.container,
                        version: item.substring(item.lastIndexOf('/') + 1),
                        label: 'release',
                        value: 'false'
                    });
                });

                return Promise.all(promises);
            })
            .catch((err) => {
                console.error(err);
                process.exit(1);
            });
    }

After calling the above function, it then fetches all the items with that label. (FilterdockerResults removes anything that’s not a container, because the layers of a container are also labeled when you label a container itself). At that point, anything with a higher number is un-marked, so that the highest numbered container is still the right one to deploy.

Step 2: Update catalog

The next problem is how to get the right version numbers and edit the catalog. To do that, I had to deep dive into the catalog format for Rancher.

Any given catalog is a git repository. Inside the repository, there are a series of folders: one for each application. In each of those folders is a couple descriptive files and a series of folders: one for each version. In the version folder you have two files: a docker compose and a rancher compose. When you make changes, you are meant to put out a new folder with a new version number so that people can optionally pull in the update at their leisure. When Rancher detects that there is a newer version than the one you are on, it will display a little icon in the UI to indicate that an upgrade is available.

What I needed was a way to take an existing configuration file, change the numbers of the specific containers, and make a new version with that updated information. I could have used Regular Expressions to do the parsing, but that just smelled like the wrong solution. So instead, I added a folder called “_template” to the list of versions. Rancher will ignore this, because it’s not a numbered version; furthermore, the files inside are “docker-compose.hbs” and “rancher-compose.hbs”, which Rancher does not think it can parse. However, my Node.JS application can read these in as Handlebars templates.

This means I can write a template like this:

version: '2'
services:
  nginx:
    image: artifactory:5000/nginx-fram:{{nginx-fram}}
    stdin_open: true
    volumes:
    - /var/log:/var/log
    tty: true
    labels:
      io.rancher.container.pull_image: always
      io.rancher.scheduler.affinity:container_label_soft_ne: io.rancher.stack_service.name=$${stack_name}/$${service_name}
      io.rancher.container.hostname_override: container_name
  fram:
    image: artifactory:5000/fram:{{fram}}
    stdin_open: true
    tty: true
    labels:
      io.rancher.container.pull_image: always
      io.rancher.scheduler.affinity:container_label_soft_ne: io.rancher.stack_service.name=$${stack_name}/$${service_name}
      io.rancher.container.hostname_override: container_name

You see the double brackets? Those will be replaced through the templating engine with the correct version number for that container name. This is easily accomplished: I hydrate the template with an object, the keys of which are container names and the values of which are container numbers pulled from the artifactory API, as listed above:

        getAllItemsByLabel (label, value) {
            return request(
                `${opts.uri}/artifactory/api/search/prop?${label}=${value}&repos=${opts.repository}`,
                {
                    auth: {
                        user: opts.user,
                        pass: opts.password,
                        sendImmediately: true
                    }
                }
            ).then((response) => JSON.parse(response).results);
        },
        filterDockerResults (containers) {
            return new Promise((resolve) => {
                const uris = containers.map((item) => item.uri);

                resolve(uris.filter((item) => item.indexOf('sha256') === -1 && item.indexOf('manifest.json') === -1));
            });
        },
	getVersions (urls) {
		return new Promise((resolve, reject) => {
			if (!urls) {
				return reject(new Error('Could not get container versions from Artifactory'));		
        		}
			const retVal = {};

			urls.forEach((url) => {
				const [, name, version] = artifactory.parseVersion(url);
					if (retVal[name] && Number(retVal[name]) >= Number(version)) {
					return;
				}
				retVal[name] = version;
			});
				return resolve(retVal);
		});
	},

When I compose these three functions together, they get all the containers that have been marked for release, remove anything that isn’t a real container, and return only the highest numbered one of each. This makes a neat little object to pass into the Handlebars engine. I then take the hydrated template, write it out to disk as a new catalog version, commit, and push. Voila!

Step 3: Update Rancher

Now that there’s a new catalog version, I have to tell Rancher to update the stack to that version. First, I tell it to poll the catalog for updates. I do so with this neat little function (note the bonus fun comment):

refreshCatalog: function(environment) {
    log('Refreshing Rancher catalog');

    //This API is synch, so it won't return until the catalog is refreshed
    //...in theory
    //I'm pretty sure the above is a lie. A vicious, vicious lie. 
    //But it is the lie told to me by my ancestors, so I will repeat it.
    return request({
        uri: `${opts.url}/v1-catalog/templates?action=refresh`,
        auth: {
            username: opts.auth[environment].key,
            password: opts.auth[environment].secret
        },
        json: true// Automatically stringifies the body to JSON
    });
},

Since the function was definitely returning well before the version showed up in the catalog, I wrote a wait for this particular step:

waitForCatalog: function(catalogId, version, environment, stackName) {
    update('Waiting for catalog to show up', 'info', stackName, environment);
    let that = this;
    return new Promise((resolve, reject) => {
        //Wait for the service to finish upgrading
        let retries = opts.retries.catalog;
        let secondsBetweenRetries = 3; 
        function checkCatalog() {
            that.refreshCatalog(environment)
            .then(() => request({
                method: 'GET',
                uri: `${opts.url}/v1-catalog/templates/DealerTire:${catalogId}`,
                auth: {
                    username: opts.auth[environment].key,
                    password: opts.auth[environment].secret
                },
                json: true
            })).then((body) => {
                let versionList = body.versionLinks;

                if (versionList[version]) {
                    log('Found version', 'info', stackName);
                    log('Catalog update took ' + (30 - retries)*secondsBetweenRetries + ' seconds.');
                    return resolve();
                } else {
                    retries--;
                    log(`${retries} retries remaining`)
                    if (retries < 0) {
                        log('Did not find version in catalog. Check Bitbucket?');
                        return reject(new Error('Timed out waiting for version to appear in catalog.'))
                    }
                    setTimeout(checkCatalog, secondsBetweenRetries * 1000)
                }
            });
        }
        setTimeout(checkCatalog, 500);
    });
},

This is basically the same as the waits I mentioned in part 2, of course, just waiting for the version to show up in the catalog instead of waiting for the service upgrade to finish.

Once the catalog entry is showing up, I can submit a stack-level “upgrade” action. Since I knew I’d also have to do a “complete” action and a “rollback” action, I went ahead and genericized this function as well:

performStackAction: function (stackId, environment, action, body = {}) {
    return request({
        method: 'POST',
        uri: `${opts.url}/v2-beta/projects/${opts.projectIDs[environment]}/stacks/${stackId}?action=${action}`,
        body: body,
        auth: {
            username: opts.auth[environment].key,
            password: opts.auth[environment].secret
        },
        json: true
    });
},

Now, in the case of an upgrade action specifically, you need to pass the compose files into the API. I’m not sure what happens if they disagree; this catalog portion may not be strictly necessary if the passed-in configs are preferred. That said, we knew we wanted the record in the catalog, and it’s easy enough to hand it the same data we just wrote to disk a moment ago.

performStackUpgrade: function (stackName, environment, newnum, composes) {
    update(`Upgrading ${stackName} in ${environment} to version ${newnum}`, 'info', stackName, environment)
    return findStack(stackName, environment).then((body) => {
        if (body.data.length <= 0) {
            throw new Error(`Could not find stack ${stackName} in ${environment}`);
        }
        let stackId = body.data[0].id;
        let catalogId = getCatalogId(stackName);
        return this.performStackAction(stackId, environment, 'upgrade', {
            dockerCompose: composes.docker,
            rancherCompose: composes.rancher,
            externalId: `catalog://Dealertire:${catalogId}:${newnum}`
        }).then(() => stackId);
    });
},

The “externalId” key is what’s telling it that we’re doing a catalog upgrade. However, passing in blank composes results in nothing changing; it won’t read them out of the catalog to determine what to change. It’d be great if it did, though, hint hint.

We then wait for upgrade in much the same way we did when updating a single service, then report success back to the user. The overall sequence thus looks something like this:

return updStack(`Upgrading stack ${stack}...`, 'pending')
.then(() => gitLib.clone())
.then(() => gitLib.config())
.then(() => updStack('Creating new version...', 'pending'))
.then(() => artLib.getContainers())
.then((urls) => artLib.getVersions(urls))
.then((versions) => gitLib.update(catalogID, versions))
.then(retval => {releaseData = retval})
.then(() => gitLib.commit(`Release for ${stack} on ${releaseData.internalID}`, catalogID, releaseData.folder))
.then(() => updStack(`Created version ${releaseData.id}. Waiting for update...`, 'pending'))
.then(() => ranLib.refreshCatalog(env))
.then(() => ranLib.waitForCatalog(catalogID, releaseData.id, env))
.then(() => updStack('Performing upgrade...', 'pending'))
.then(() => ranLib.performStackUpgrade(stack, env, releaseData.folder, releaseData))
.then((id) => releaseData.stackID = id)
.then(() => updStack('Waiting for upgrade to complete...', 'pending'))
.then(() => ranLib.waitForActionCompleteStack(releaseData.stackID, env, 'upgraded', stack.display))
.then(() => updStack('Waiting for health checks...', 'pending'))
.then(() => ranLib.waitForHealthCheckStack(releaseData.stackID, env, stack.display))

“updStack” here is my function to send a message back through the websocket to the client; git is my git library, artLib is the artifactory library, and ranLib is the rancher library. You have to wait for health checks here or you’ll run into issues if you complete the upgrade right away, but again, that’s pretty straightforward.

Deploybot

The interface to my final product, therefore, is really straightforward: a password prompt with AD authentication so only our Operations team can deploy, then three buttons per application. This screenshot was taken during the initial rollout; we have dozens of applications now, each with their own set of three buttons. Ops will do an upgrade, then leave the stack in the upgraded state while they confirm that the application is running correctly; when that’s done, they can complete or roll back, according to the test result. All the functionality for rollback and complete comes from Rancher; I’ve just exposed it here in a series of handy buttons to avoid them having to dig through the UI in an emergency or at 3 in the morning.

Is it overcomplicated? Yeah, totally. There’s a lot I plan to change in version 2.0, which will probably need to be tweaked when we upgrade Rancher and move to a Kubernetes-backed system. But it gets the job done, and after the first few weeks of finding edge cases and smoothing them out, it works like a charm. Sometimes that’s what you’re looking for in a devops world: a reliable, straightforward tool that gets the job done so you can get back to releasing new functionality.

Dockerization part 2: Deploying

Now that we have containers, we need to push them to our subprod environments so they can be tested. Bear with me, this is where things get a little complicated.

Docker Setup

Most people take the easy way out when they move to docker: they ship their containers to the cloud and let someone else manage the installation, upgrades, and maintenance on the docker hosts. We don’t do things the easy way around these parts, though, so we have our own server farm: a series of VMs in our datacenter. Everything below the VM is maintained by another team; my team is responsible for the software layer of the VM, and the containers that run on top.¬†We have a handful of servers in our sub-prod environments, and then a handful more in our various production DMZs.

For management, most people seem to choose Kubernetes, but again, we don’t do things the easy way around here, so we went with a less popular product called Rancher. Now, Rancher is a management interface that can sit on top of a number of underlying technologies, including Kubernetes, but we chose to use their house-brand management system, called Cattle, instead. They were nice enough to give us a bunch of training in Docker, including the advice that forms the basis for their theme: if servers were pets, carefully maintained and fed over the years, containers should be like cattle, slaughtered and replaced as soon as they seem to be ill so they don’t infect the whole herd.

Rancher is a really great tool if you’re working in the GUI. It has the concept of an Environment (which we use to separate dev from QA from demo), which spans across one or more Hosts (the servers that run Docker and manage the containers). Inside the Environment are Stacks, which are a collection of related containers with a name. It also handles a lot of the networking between containers, as it comes with its own DNS for the internal container network so you can just resolve Stackname/ContainerName to find a given container in your Environment. You can upload a docker-compose.yml file to create a stack if you’re using Compose, and the extra metadata Rancher uses can be stored in a rancher-compose.yml that also can be uploaded when you make a stack.

Rancher running on my local machine, showing a project I have in progress

Deployment

Manual deployment is super easy in Rancher: create a new stack, add services, paste in the container name from our build step, and let it handle everything else. Moving between environments manually once it works in dev is also easy: download the compose files, then upload them into the next environment. But we’re doing CI/CD, and the developers are constantly asking how they can speed up their release schedule. How do we do this automatically?

There’s two tools that come with Rancher that can help here. One is the extensive API; pretty much everything you can do in the GUI can be done via the JSON-based REST API. The other is the pair of command-line tools they produce: Rancher-compose and Rancher CLI. Since I was also trying to release quickly, I used the API for my initial round of deployment scripts; in a later post, I’ll talk through how I’ve begun to convert to using the CLI commands instead, as I feel they’re faster and cleaner.

For Bamboo, I needed something that could run in a Deploy Project that would update the stack in a given environment. I decided to write a Node.JS script, because when all I have is a node-shaped hammer every build script becomes a nail ūüėČ (Actually, it was so our Node developers could read the script themselves). I didn’t do much special here, just your standard API integration using a promise-based architecture; however, this is a chunk of a bigger library I decided to write around Rancher, so you’ll see a lot of config options:

function findStack(stackName, environment) {
    return request({
        uri: `${opts.url}/v2-beta/projects/${opts.projectIDs[environment]}/stacks?name=${stackName}`,
        auth: {
            username: opts.auth[environment].key,
            password: opts.auth[environment].secret
        },
        json: true// Automatically stringifies the body to JSON
    });
}

function getContainerInfo(environment, stackName, containerName) {
	log(`Getting container info for ${containerName}`);
	return findStack(stackName, environment)
	.then((body) => request({
		method: 'GET',
		uri: `${opts.url}/v1/services/?environmentId=${body.data[0].id}&name=${containerName}`,
		auth: {
			username: opts.auth[environment].key,
			password: opts.auth[environment].secret
		},
		json: true // Automatically stringifies the body to JSON
	}));
}

function performAction(serviceId, action, environment, launchConfig, stackName) {
    update(`Performing action ${action} on service ${serviceId}`, 'info', stackName);
    return request({
        method: 'POST',
        uri: `${opts.url}/v1/services/${serviceId}/?action=${action}`,
        body: {
            'inServiceStrategy': {
                'batchSize': 1,
                'intervalMillis': 2000,
                'startFirst': true,
                'launchConfig': launchConfig
            }
        },
        auth: {
            username: opts.auth[environment].key,
            password: opts.auth[environment].secret
        },
        json: true
    });
}

    performServiceUpgrade: function (stackName, containerName, environment, image) {
        update(`Upgrading ${containerName} in stack ${stackName} in ${environment} to image ${image}`, 'info', stackName, environment)
        return getContainerInfo(environment, stackName, containerName).then((body) => {
            if (body.data.length <= 0) {
                throw new Error(`Could not find service ${containerName} in stack ${stackName} in ${environment}`);
            }
            let serviceId = body.data[0].id;
            let launchConfig = body.data[0].launchConfig;
            launchConfig.imageUuid = image;

            return performAction(serviceId, 'upgrade', environment, launchConfig, stackName)
            .catch((err) => {
                if ((err.statusCode == 422 || err.status == 422) && opts.retries.on422) {
                    log('Detected invalid state. Rolling back to retry.', 'info', stackName)
                    return performAction(serviceId, 'rollback', environment, launchConfig, stackName)
                        .then(() => this.waitForActionComplete(stackName, containerName, environment, 'active', stackName))
                        .then(() => performAction(serviceId, 'upgrade', environment, launchConfig, stackName));
                } else {
                    log('Detected error condition. Aborting', 'error', stackName)
                    throw err;
                }
            })
            .then(() => this.waitForActionComplete(stackName, containerName, environment, 'upgraded', stackName))
            .then(() => serviceId);
        });
    }

I highlighted lines 54 and 55, however, because they are a little strange. Rancher lets you update anything about a service using the same endpoint, which is kind of nice and kind of rough: I need to specify every single attribute of the service, or it’ll assume I meant to blank out the setting (rather than assuming I meant to leave it unchanged). To make this easier, I captured the existing launch configuration, then changed the container number and sent it back.

Upgrading a service in Rancher is a two-step process: first, you upgrade, which launches a copy of the new container for every copy of the existing container, and then you “finish” the upgrade, which removes the old containers. This is so that if there’s a problem with the new container, you can issue a “rollback” action, which turns the old containers back on and removes the new ones — much faster than trying to pull a fresh copy of the old container back. However, this means sometimes you’ll be trying to upgrade while it’s in an “upgraded” state, waiting for you to finish or roll back. When that happens, Rancher issues a status code 422. My library optionally rolls back and issues the upgrade action again if it encounters this state.

The hardest part was figuring out how to figure out when Rancher was done upgrading. Some of our images are huge, particularly the ones that contain monoliths we’re still in the process of breaking up; it can take several minutes for these containers to download and start up. Eventually, I settled on a polling-based strategy:

waitForActionComplete: function(stackName, containerName, environment, desiredState) {
    update('Waiting for upgrade to complete', 'info', stackName, environment);
    return new Promise((resolve, reject) => {
        //Wait for the service to finish upgrading
        let retries = opts.retries.actionComplete;
        function checkState() {
            getContainerInfo(environment, stackName, containerName).then((body) => {
                let container = body.data[0];
                log('Current state: ' + container.state);

                //Check if upgrade is done
                if (container.state == desiredState) {
                    log('Action complete');
                    return resolve();
                } else {
                    retries--;
                    if (retries < 0) {
                        return reject('Timed out waiting for action to complete');
                    }
                    log(`${retries} left, running again`);
                    return setTimeout(checkState, 1000);
                }
            });
        }
        setTimeout(checkState, 500);
    });
}

This will keep running the checkState function until either the container’s state enters the desired state, or it runs out of retries (configured in the config for the library). I’ve had to tune the number of retries several times; right now, for our production deploy, it’s something outrageous like 600.

This library is called from a simple wrapper for Bamboo’s sub-prod deploys; for production, however, I got a lot trickier. Stay tuned for that write-up next week!

How to force Bamboo to build on Linux

So let’s talk about build servers for a minute. I manage the company’s Bamboo server, which we use to do builds and continuous integration. I don’t know if this is an unusual use case or what, but some of my builds require Windows and others perform best on Linux. So we have Windows agents and Linux agents.

Some things you would think are intuitive are not. For example, there’s no way to differentiate in a Script Task between CMD and Bash. How many scripts are actually cross-compatible between the two? Not many, in my experience. Often, I’d write out script tasks for Bash and they’d get farmed out to a Windows server by mistake and fail to, say, create a tar archive or wget a resource. So how can I force those to execute on Windows?

The solution I hit upon is pretty simple: I created a new executable definition called Bash, located at /bin/bash. This will auto-detect on new Linux agents, but not on Windows agents. Then I can put my scripts into the repo (which is probably a best practice anyway) and use the Bash command to run them. I can even run one-liner scripts with this task if I use the “-c” flag before the command, like “-c grep –BROKEN– results.txt | tee broken.txt” (a command I used just yesterday to pull results out of my broken link checker). Plus, you can still use the script task as normal, as long as there’s at least one Bash task in your job to force it to build on Linux.

The inverse is simple as well: I created a Powershell executable and use Powershell scripts for my Windows builds. Problem solved, plus I get the power of Powershell to use in my scripts.

Does anyone out there have any other cool tips? Let me know in the comments!

Teatime: Containers and VMs

Welcome back to Teatime! This is a weekly feature in which we sip tea and discuss some topic related to quality. Feel free to bring your tea and join in with questions in the comments section.

Tea of the week:¬†Ceylon by Sub Rosa Tea. This is a nice, basic, bold tea, very astringent; it’s great for blending so long as you don’t choose delicate flavors to blend with. It really adds a kick!
teaset2pts

Today’s Topic: Containers and virtualization

Today, I’m going to give you a brief overview of a technology I think might be helpful when running a test lab. So often as testers we neglect to follow trends in development; we figure, devs love their fancy toys, but the processes for testing software really don’t change, so there’s no need to pay much heed to what they’re doing. Too often we forget that, especially as automation engineers, we are writing software and using software and immersing ourselves in software just like they are. So it’s worth taking the time to attend tooling talks from time to time, see if there’s anything worth picking up.

Vagrant

A tool I’ve picked up and put down a few times over the past year or so is Vagrant. Vagrant makes it very easy to provision VMs; you can store the configuration for the server needed to run software right with the source code or binaries. Adopting a system in which developers keep the vagrantfiles up to date and testers use them to spin up test instances can ensure that every test we run is on a valid system configuration, and both teams know what the supported configurations entail.

At a high level, the workflow is simple:

  1. Create a Vagrantfile
  2. On the command line, type “vagrant up”
  3. Wait for your VM to finish booting

In order for this to work, however, you have to have what’s called a “provider” configured with Vagrant. This is a specific VM technology that you’re using at your workplace; in my experiements, I’ve used Virtualbox, but if you’re already using something like VMWare or a cloud provider like AWS for your test lab, there’s integrations with those systems as well.

When creating the vagrantfile, you first select a base image to use. Typically, this will be a machine with a given version of a given OS and possible some software that’s more complex to install (to save time). HashiCorp, makers of Vagrant, provide a number of base machines that can be used, or you can create your own. This of course means that every VM you bring up has the same OS and patch level to begin with.

The next step is provisioning the box with the specific software you’re using. This is where you would install your application, any dependencies it has, and any dependencies of those dependencies, and so on. Since everything is installed automatically, everything is installed at the same version and with the same configuration, making it really easy to load up a fresh box with a known good state. Provisioning can be as simple as a handful of shell scripts, or it can use any of a number of provisioning systems, such as Chef, Ansible, or Puppet.

Here is a sample vagrantfile:

# -*- mode: ruby -*-

  $provisionScript = <<SCRIPT
    #Node & NPM
    sudo apt-get install -y curl
    curl -sL https://deb.nodesource.com/setup | sudo bash -  #We have to install from a newer location, the repo version is too old
    sudo apt-get install -y nodejs
    sudo ln -s /usr/bin/nodejs /usr/bin/node
    cd /vagrant
    sudo npm install --no-bin-links
SCRIPT

# vi: set ft=ruby :

# All Vagrant configuration is done below. The "2" in Vagrant.configure
# configures the configuration version (we support older styles for
# backwards compatibility). Please don't change it unless you know what
# you're doing.
Vagrant.configure(2) do |config|
  # The most common configuration options are documented and commented below.
  # For a complete reference, please see the online documentation at
  # https://docs.vagrantup.com.

  # Every Vagrant development environment requires a box. You can search for
  # boxes at https://atlas.hashicorp.com/search.
  config.vm.box = "hashicorp/precise64"

  config.vm.provider "virtualbox" do |v|
    v.customize ["setextradata", :id, "VBoxInternal2/SharedFoldersEnableSymlinksCreate/v-root", "1"]
  end
  config.vm.network "private_network", ip: "192.168.33.11"
  
  #Hosts file plugin
  #To install: vagrant plugin install vagrant-hostsupdater
  #This will let you access the VM at servercooties.local once it's up
  config.vm.hostname = "servercooties.local"
  
  config.vm.provision "shell",
  inline: $provisionScript

end

I left a good deal of the tutorial text in place, just in case I needed to reference it. We’re using Ubuntu Precise Pangolin 64-bit as the base box, distributed by HashiCorp, and I use a plugin that modifies my hosts file so that I can always find the machine in my browser at a known host. The provision script is just a simple shell script embedded within the config; I’ve placed it at the top so it’s easy to find.

One other major feature that I haven’t yet played with is the ability for a single Vagrantfile to bring up multiple machines. If your cluster generally consists of, say, two web servers, a database server, and a load balancer, you can encode that all in a single vagrantfile to bring up a fresh cluster on demand. This makes it simple to bring up new testing environments with just one command.

Docker

I haven’t played much with Docker, but everyone seems to be raving about it, so I figured I’d touch on it as an alternative to Vagrant. Docker takes the metaphor of shipping containers, which revolutionized the shipping industry by abstracting away the handling of specific types of goods from the underlying business of moving goods around, and extends it to software. Before standard shipping containers, different goods packed differently, required different packaging material to keep them safe, and shipped in different amounts and weights; cargo handlers had to learn all these things, and merchants were a little wary of trusting their precious goods to someone who was less experienced. The invention of the standard shipping container changed all that: shipping companies just had to understand how to load and transport containers, and it was up to the manufacturers to figure out how to pack them. Docker does the same thing for software: operations staff just have to know how to deploy containers, while it’s up to the application developers to understand how to pack them.

Inside a docker container, the application, its dependencies, and its required libraries reside, all pinned to the right versions and nestled inside the container. Outside, the operating system and any system-wide dependencies can be maintained by the operational staff. When it’s time to upgrade, they just remove the existing container and deploy the new one over top. Different containers with different versions of the same dependency can live side ¬†by side; each one can only see its own contents and the host’s contents.

And thus, we reach the limit of my knowledge of Docker. Do you have more knowledge? Do you have experience with Vagrant? Share in the comments!