Dockerization Part 1: Building

I’ve been long overdue for a series of articles explaining how our current build system works. One of the major projects I was involved with before this recent reorg involved overhauling our manual build process into a shiny new CI/CD system that would take the code from commit to production in a regulated, automated fashion. As always, the reward for doing a good job is more work like that; when we decided to move to Docker to better support our new team structure, I ended up doing a lot of the foundational work on our new build-test-deliver pipeline. Part one of that pipeline is, of course, building and storing containers.

Your mission, if you choose to accept it

In the old world, before we dockerized our applications, we were following a fairly typical system (that I designed): our CI server runs tests against the code, then bundles it up as an archive file. After that, one environment at a time and on request, it would SCP the tarball down to the server, stop the running process, remove the old codebase, and unpack the new before starting the process again. There were configuration files that had to be saved off and moved back in afterward in a few cases, but we had all those edge cases ironed out. It was working, and there were almost no changes to it in the year before we launched docker.

As we were preparing to go live, I didn’t want to lose the build pipelines we had worked so hard on. And yet, docker containers are fundamentally different than tarballs of code files. Furthermore, our operators (who are responsible for putting code into production) complained of having too many buttons to click: often, our servers had 3-4 codebases on them, meaning 3-4 buttons to click to update one server. They definitely didn’t want to do one button per container. On the other hand, our developers were clear on what they wanted: more deploys, faster deploys, and breaking out their monoliths into modules and microservices so they could go even faster. How to balance these concerns?

Another wrinkle emerged as well once I got my hands on our environment: we chose Rancher as our docker management tool of choice. Rancher is a great little tool, and I enjoy working with its GUI, but when most companies seem to be standardizing on Kubernetes, it was hard to find good examples and tutorials for how to work with Rancher instead.

With all those pressures bearing down on me, my task was straightforward, but far from simple.

How to build a container in 30 days

The promise of containers seemed like it resolved a lot of our headaches overall: developers control the interior of the container, and Platform Ops controls the outside of it. In this brave new world, I don’t have to care what goes in a container, but it’s my job to ensure they get to where they’re going every time without fail. In practice, however, I found I need to understand quite a bit about containers themselves.

For the purposes of this article, you don’t need to know or care about the virtualization layer; just trust that a container is isolated from everything around it, until and unless you drill holes in it (which we do. A lot. But I understand that’s common). You will need to know a little about how they’re built, however.

Picture a repository of source code. At some point, to dockerize the application contained within, you need  a Dockerfile: a file of instructions on how to build this container. Almost every container begins with an instruction to extend from another image, much like classes extending from a base class. This was really handy for us, since it means we can put anything we need into a custom base image and all the developers will have it pre-installed.

From there, there’s a series of customizations to the container. Generally, one step involves copying the code into the container, and another tells the container what executable to run when it starts. For Node.js, we ask our developers to put their code in a standard location, then execute “npm start” when the container boots up, letting them define what that means for their application.

Once you’re happy with what the container contains, it’s time to seal it up and ship it. In this case, that means two commands: a “tag” command, which gives it a name more interesting than the default (which will be something like 2b9c0185251d), and a “push” command, which uploads the docker container to a remote repository. If the container is intended to live in a central repository, it has to be tagged with that repository as part of the name (including a port number, which usually defaults to 5000 for a Docker registry unless you put an Nginx in front to make it 80): something like “artifactory.internal:5000/dt-node-base”. Appended to that is a version: this can be a sequential number, or a word or anything else. By convention, each container is tagged twice: once with a sequential number, and once with the word “latest”. That makes it so you can always pull down the very latest node base container from our Artifactory repository by asking it for “artifactory.internal:5000/dt-node-base:latest”.

The system

So we have a number of parts to this build system that the CI/CD server has to integrate with. The first piece is to begin with raw source code, including a Dockerfile; we had been using Subversion, but the developers had been asking for Git for so long we finally broke down and bought a Bitbucket server and let them migrate.

The next piece is to build the containers with Docker. Since we were using Bamboo as our CI/CD server, I installed Docker on all the remote agents; this required an OS upgrade for them to Red Hat 7, but I was able to script the install using Ansible to make doing it across our whole system less painful.

The next piece is somewhere to store the containers when we’re done with them. As you can guess by the previous example, we decided to use Artifactory for this; this is mostly because, as the developers moved to Node, they were asking for a private NPM server, and Artifactory is able to do double duty and hold both types of artifacts.

For the communication between them, my coworker put together a script we could put on each build server that the plans could use to ensure they didn’t miss any steps. It’s straightforward, looking something like this:

#!/bin/sh -e
# $1 Project Name (dt-nodejs)

docker build -t artifactory.internal:5000/$1:$bamboo_buildNumber \
 -t artifactory.internal:5000/$1:latest

docker push artifactory.internal:5000/$1:$bamboo_buildNumber
docker push artifactory.internal:5000/$1:latest
echo "$1:$bamboo_buildNumber and $1:latest pushed to Artifactory on artifactory.internal:5000"

This means that every build tags the container with the number of the build, giving us an easy source of sequential numbers for the containers without thinking about it. It does mean, however, that building a new pipeline for an existing container name will start the numbering over from 1 and overwrite old containers, but we encourage developers to edit their build plans instead of starting over where possible. If you have any ideas on how to prevent that, I’d love to hear them.

(I’ve actually enhanced this script since, but I’ll talk about that in a future entry)


Teatime: Testing Large Domains

Welcome back to Teatime! This is a (semi-)weekly feature in which we sip tea and discuss some topic related to quality. Feel free to bring your tea and join in with questions in the comments section.

Tea of the week: Dragon Pearls by Teavana. My grandmother gave me some of this for my birthday a few years back, and it’s become one of my favorite (and most expensive!) teas since. Definitely a special occasion tea!

Edit: Teavana has stopped selling tea online after being bought by Starbucks. You know how I love a good chai, so how about the Republic Chai from Republic of Tea?

Today’s topic: Testing large domains

One challenge that intrigues me as much as it scares me is the idea of testing a product with a large domain of test inputs. Now, I’m not talking about a domain name or “big data”; instead, I mean a mathematical domain, as in the set of potential inputs to a function (or process). If you try to test every combination of multiple sets of inputs, or even every relevant one (barring a few you have decided won’t happen in nature), you’ll quickly run afoul of one of the key testing principles: exhaustive testing is impossible. Sitting down and charting out test cases without doing some prep-work first can quickly lead to madness and excessively large numbers of tests. That’s about where a BA I work with was when I offered to help,  using the knowledge I’ve gained from my QA training courses.

The Project

The project’s central task was to automate the entry and processing of warranty claims for our products. We facilitate the collection of data and the shipping of the product back to the manufacturer as an added service for our customers, as well as handling the financial rules involved to ensure that everyone who should be paid is paid in a timely fashion. However, the volume of warranty claims was growing too large for our human staff to handle alone. Therefore, we set out to construct an automated system that would check certain key rules and disallow any claim to be entered that was likely to be rejected by the manufacturer.

The domain for this process is the cartesian join of the possible inputs: every manufacturer, every customer of ours, every warehouse that can serve the customer, every specific product (in case it’s on a recall), and every possible reason a customer might return a product (as they each have different rules). Our staff did a wonderful job of boiling them down to a test set that includes a variety of situations and distinct classes, but we were still looking at over 30,000 individual test cases to ensure that all the bases were covered by our extensive rules engine. What’s a test lead to do?

Technique: Equivalence partition

The first technique is pretty straightforward and simple, but if you’ve never used it before, it can be a lightbulb moment. The basic idea is to consider the set of inputs and figure out what distinguishes one subset from another. For example, instead of trying to enter every credit card number in the world, you can break them out into partitions: a valid Visa card, a valid Mastercard, a valid American Express, a card number that is not valid, and a string of non-numeric characters. Suddenly, your thousands of test cases can cut down to a mere five!

In essence, this is what the business folks did to arrive at 30,000 from literally infinite: they isolated a set of warehouses that represent all warehouses, and a set of customers that represent all types of customers, and a set of skus that represent all types of skus.

Technique: Separation of concerns

The next thing I did isn’t so much a testing technique as a development technique I adapted for testing. I realized that we were trying to do too much in one test: combinatorial testing, functional testing, data setup verification, and exploratory testing. By separating them into explicitly different concerns, we could drastically cut down on the number of test cases. I suggested to the BA that as part of go-live we get a dump of the data setup and manually verify it, eliminating the need to test all the possible rule scenarios for all possible manufacturers. I split my test cases into combinatory happy-path tests that make sure every potential input is tested at least once, and functional testing to verify that each rule works correctly. That cut way down on the number of cases. Divide and conquer!

Technique: Decision Tables

To create the functional tests, I used a technique called a decision table. Or well, a whole set of them, but I digress. Essentially, you identify each decision point in your algorithm, using them as conditions in the top portion. You then identify each action taken as a result, and list them in the bottom portion. You input test values (often true/false or yes/no, but sometimes numeric; you could have written C3 in the example as “transaction amount” and done “<$500” and “>$500” as your values).

If any of you have written out a truth table before, this is essentially the testing version of that. In the long form, this would have a truth table of the conditions, with the actions specified based on the algorithm. You can then take any two test cases that produce identical output and have at least one identical input and elide them together.

I started putting together a decision table for each return reason, with every rule down the left and every manufacturer across the top:


As you can see, it got really messy really fast! That was when I decided to try and use equivalence partitioning on the decision tree itself. I figured, not every manufacturer cares about every rule for every reason. If I did one table per reason, and only considered the test cases that could arise from the actual data, I would have something managable on my hands.

I sat down with a big list of manufacturers and their rules, and I divided that into a set of rules which can have a threshold (giving us two cases: valid or invalid) or a “don’t care” (giving two more cases: valid but the rule does not apply, and invalid but the rule does not apply). That cut down the number of manufacturers needed to test considerably, and allowed me to begin constructing a decision table.

A list of what manufacturers consider what rules.

The output of that was a lot cleaner and easier to read:

One of eight decision tables that generated the new tests

Technique: Classification Trees

The next technique is an interesting one. When I learned it, I didn’t think I’d ever use it; however, I found it to be immensely valuable here. A classification tree begins life as a tree, the top half of the diagram you’re seeing: you break out all the possible inputs, and break out the equivalence partitions of the domain of each in a nice flat tree like this. Then you draw a table underneath it.

By OMPwiki - Own work, CC BY-SA 3.0,
By OMPwiki – Own work, CC BY-SA 3.0,

The ISTQB syllabus suggested using a specialized tool that can generate pairs, triples, et cetera according to rules you punch in, but I didn’t use it for this; my coverage criteria was just to cover each factor at least once, so I figure I need at least as many tests as the largest domain (the OEMs). I then went through and marked off items to make sure each one was covered at least once. You can do more with it, but that’s all I needed.

My makeshift classification tree

At last, we had a lovely set of combinatorial tests we could run:


These tests, if you recall above, were to verify that various customer-reason-warehouse-manufacturer combinations were configured correctly. This would ensure that each of our representative samples were used in at least one test case, regardless of their data setup.


Have you ever faced a problem like this? What did you do?

New Year’s Resolutions: 2015

My job title says that I work in SQA: Software Quality Assurance, or maybe Quality Analysis if you want to get pedantic, since we don’t actually assure quality so much as kermitflail when it’s not present.

The tests are failing!

But what is quality? How do I know when something is quality or not? I’ve been pondering the nuances this month, being as it is the first month of the year and the time when everyone tries to lay out their goals. Where do I have authority to advise, and when am I overstepping my bounds?

One of my coworkers went to Velocity last year, and came back fired up about performance and Real User Metrics. I can’t find a single definition of quality that doesn’t include performance. If I can make changes that can get metrics in front of him so he can see the realtime impact of his changes, is that SQA?

Our promotions process is slow and buggy and prone to errors. If I can get a system in place that automatically runs unit tests after a code promotion, is that Quality? What if it does linting, checking the style of the code? What if it simplifies the process of making branches to move code onto our demo servers in the first place? Where does Quality become Process? Or is there even a distinction?

Our database development team has trouble keeping their sandboxes in sync. If I poke my hands into their Subversion practices to turn deploying a new sandbox into a half-hour routine maintenance task instead of an all-day chore, is that Quality? Why or why not? Cite your sources.

Ultimately, my goals this year aren’t around improving the codebase. I’m not a developer. I don’t fix anything. What I can do, where I can do the most help, is around helping other people streamline their daily tasks so that they have the energy to make things better. If it takes someone less time to move code, they’ll be less afraid to make fixes that improve the quality. If there’s a safety net of tests, they’ll be able to do some refactors that have been on the wishlist for years. The best way to improve our codebase is to apply grease to the wheels until they turn smoothly and efficiently.

So that’s my new year’s resolution for 2015. Maybe I’ve been watching too much Emma Approved, but maybe I really can help people make their lives better.


CI with Jenkins for Javascript: Part 3: Scheduling and reporting

In Part One, we set up a Jenkins server and some unit testing. In Part Two, we added some static analysis tools to our build. But we’re still manually running all this, even if it’s all tied together now. Let’s talk about some of the features Jenkins brings to the table.

Building automatically

Our code release pipeline is going through some revisions to make better use of branching, so I have the good fortune of being able to detail for you two different build strategies for two different types of branching strategies. Today I will detail our old style, and in a future post, I will detail the updates we did to make a more branch-heavy system work.

Our original strategy involved a branch for each codebase representing our demo environment; to promote a project to demo, the code would be merged into the demo branch using Subversion. This is the easier strategy to set up,  because you always know where in the repository to point Jenkins.

The first change we made was to symlink the location our repo was checked out by  Jenkins to a network share on a demo server. This allows Jenkins to check out the code directly to the server, where it can then run the unit tests. That was the simplest way for us to get the code onto our servers, but there are many ways you can go about this step, including using FTP or SSH to update the server; if you have many servers you want Jenkins to update, that’s probably the best way to do it. We used a symlink because it plays nicely with Jenkins’ preferred build pipeline: First it checks out the code, then it runs the tests, then it would deploy to other servers. Our code does not need to be compiled before being deployed, and Jenkins was not running on a machine configured to run as a Coldfusion server, so by checking out the code directly onto a server, we had it up and running as fast as possible.

Once you’ve figured out your deployment strategy, you’re ready to trigger Jenkins to automatically build based on code promotion. There are two strategies to accomplish this task: polling and a post-commit hook. Polling is the easiest to set up; there’s literally a checkbox under “Build Triggers” called “Poll SCM”. This allows you to set up a poll strategy usinga similar syntax as the one used to configure cronjobs; for example, to poll every fifteen minutes, you use the string “H/15 * * * *”. This can be configured without ever leaving Jenkins, and it will only build when there’s new changes.

Post-commit hooks require some work in Subversion. With this strategy, you configure Subversion to activate Jenkins whenever a commit is pushed. I didn’t do this myself, but there’s some details in the subversion plugin notes about how you might set this up. Honestly, the more I read about it, the less interesting it looked. Polling every ten minutes or so would achieve the same level of detail for my organization; remember, I’m talking about major code promotions to demo that happen probably no more often than once a day.


Information Radiators

So, you have your Jenkins server pointed to your repo. It’s polling every fifteen minutes, and it reports out on the unit tests, linting results, and code complexity. You’re feeling pretty proud of yourself: this is a nice spiffy setup, capable of giving a good sense of the long-term health of the project.

Too bad nobody looks at it.

Oh sure, you can give them the dashboard link. Maybe one or two of them will poke at it every week or so. For a while. Until they get bored and wander off. IT people are humans too, and humans are notoriously averse to reading anything or seeking out information on their own. How can you make the information more in-their-face?

One answer is to present the information in a pretty easily-understood graph or chart and display that on a monitor in the hallway. As people walk past it, the information is thrust into their face, and they tend to stop and take a look at it. The nicer the visualization, the more likely people will stop to look at it and accidentally ingest the information you’re trying to get across 🙂

Jenkins has a lovely API for retrieving information about a build: on any page, add “/api” to the end. If you just add /api, it gives you a description of the formats you can retrieve the api information in; to get the JSON data, you add /api/json to any page. For human-readability, add “?pretty=true”. You can also get the data in xml format using the same method.

With that in mind, I wrote a quick javascript app that polls Jenkins for data about unit tests using Backbone to abstract away all the details. The model is something like:

var TestResult = Backbone.Model.extend({
    baseURL: "",
    build: "lastCompletedBuild",
    url: function() {
        return this.baseURL + "/job/" + + "/" + + "/testReport/api/json?jsonp=?";

And the view something like:

var PlatformView = Backbone.View.extend({
    initialize: function(options) {
        this.options = options;
        this.model.on("change", this.render, this);
        this.model.on("error", this.renderErrorState, this)
    render: function() {
        var tpl = Handlebars.compile($("#platformTemplate").html());
        var data = this.model.toJSON();
        var ts = new Date(this.options.timestamp);
        data.timestamp = ts.getMonth() + 1 + "-" + ts.getDate() + "-" + ts.getFullYear() + " " + ts.getHours() + ":" + (ts.getMinutes() < 10 ? "0" : "") + ts.getMinutes();
        var html = tpl(data);

        var model_id = this.model.get("id");
        var chartdata = [
                {label: "Pass", value: this.model.get("passCount")},
                {label: "Fail", value: this.model.get("failCount")},
                {label: "Skip", value: this.model.get("skipCount")}
        testResultsPieChart.drawTestResultsGraph(chartdata,"#" + model_id + "-chart");
        return this;
    renderErrorState : function() {
        var tpl = Handlebars.compile($("#errorTemplate").html());
        var data = this.model.toJSON();
        var html = tpl(data);

        var model_id = this.model.get("id");
        var chartdata = [
                {label: "Pass", value: 0},
                {label: "Fail", value: 1},
                {label: "Skip", value: 0}
        testResultsPieChart.drawTestResultsGraph(chartdata,"#" + model_id + "-chart");

Where testResultsPieChart uses the d3 library to convert the data into a pie chart. I tossed all this into a basic Bootstrap page, because I’m not much of a designer 🙂 The result ends up looking like:

Note that one project had managed to break their test runner while I was taking this screenshot. You’ll see that result if qUnit never finishes executing.


And that’s where I’d gotten when someone told me we were changing branching strategy to remove the idea of a single demo branch 😀 Moving goalposts keeps life interesting.

CI with Jenkins for Javascript: Part 2: Static Analysis

Part one

So. We’re up, we’re unit testing, we’re publishing results. But unit testing is only as good as the tests themselves, and that depends heavily on the programmers’ ability to write good tests. Maybe we want more than that. Maybe we want a metric that isn’t essentially self-reported. Maybe we want static analysis.

What is Static Analysis

Static Analysis is a category of testing techniques that covers any metric of code that can be collected without executing the code. These techniques can be used to measure code against an agreed-upon standard without needing anything more from a developer than the code they’ve already written. This is a great way to touch in on the quality of the code when your developers are already over-worked, stressed, and up against the wall; they can put off writing unit tests until “later”, but they can’t prevent you from looking at the code and evaluating it. Obviously, you don’t want to just spring new metrics on people, but once a standard is in place, holding people to it shouldn’t be unreasonable.


Linting with ESLint

The most common static analysis tool you’ll hear JS developers talking about is linting. If you already know about this, feel free to skip down to the practical section, but if you’re picturing the fuzzy stuff that comes out of your dryer after you wash a load of blankets, allow me to dispel the confusion a little. The basic metaphor comes from using a lint roller to clean little bits of lint (or cat hair) off a sweater so you look neater and more presentable. Linting, therefore, is the act of cleaning up little stylistic issues to make the overall code look neat and tidy.

The most widely known linter is JSLint; I believe that’s where the name came from in the first place. You can test out how JSLint works at their website. Notice the checkboxes below the input box; JSLint is configurable, but not overly so. It was designed to enforce the Crockford Conventions, which some JS developers hold to be the best possible standard for Javascript code style. However, as with all things in JS land, the “standard” is hotly debated, and in many places rejected entirely. Therefore, for linting, I prefer a tool called ESLint. Every single rule in ESLint is configurable; at the minimum, this means it has three levels of enforcement: Do not enforce, Warn, or Error. Many rules also have configurable options, such as whether spaces should be before a comma, after a comma, both, or neither.

So let’s say you’ve got ESLint, talked with your team, and come up with a configuration file that enforces your standards. We can fairly simply add that into our gruntfile for Jenkins to execute, using a package like grunt-eslint. However, we now have a problem. Unlike grunt-qunit-junit, grunt-eslint does NOT allow for writing to a file. We’d have to pipe the output, and that includes any output from grunt itself, which might make our file no longer conform to the desired output format without more massaging. So I prefer to install eslint as a standalone console application, as detailed here.

Now our buildfile has two items:



That command line breaks down as follows:

  • eslint calls the linter
  • -c eslint.conf points it to our custom configuration file
  • -f checkstyle outputs the results in checkstyle format. This can be other formats like jslint, junit, or tap, but I found the checkstyle plugin to be to my liking.
  • file paths indicate what files should be linted. Here I’m only linting the models and views for my project
  • > lintresults.xml is the linux way to pipe the results of the output into a file. This can be any file.
  • || echo is, as with last time, a way to ensure that the build does not fail when linting fails. Again, the reporting plugin will take care of marking the build as unstable when the linting fails. Without this, linting errors will prevent Jenkins from moving on to the unit tests.

We can then use the Checkstyle plugin (or any other plugin that can process Checkstyle reports) to display the results:



And voila!



Complexity with Plato

Another static analysis that can be useful to shed some light on code quality is complexity analysis. Now, if you’re planning to write angry comments, please keep in mind that all of these metrics measure one aspect of quality, and I don’t believe any of them are infallible be-all end-all measures. But complexity can tell you a little about what parts of your application are going to be more troublesome to maintain.

The most common metric for complexity is cyclomatic complexity. This is a rough measure of code complexity created in 1976 by Thomas McCabe, defined as the count of the number of linearly independent paths through the source code. Basically, this tells you how much branching, looping, and nesting is present in a piece of code. Lower is easier to understand and maintain, but obviously, code with a complexity of 1 doesn’t do very much that’s interesting at all; it’s 100% deterministic, and will always do exactly the same thing, with no change in behavior based on inputs. Your basic “Hello World” program has a cyclomatic complexity of 1; FizzBuzz tends to be around 6 or so.

Another metric of complexity is Halstead Complexity. This is a more robust set of measures proposed in 1977 by Maurice Halstead. These are calculated by counting the number of distinct operators, total number of operators, number of operands, and other such analysis to produce a slew of numbers. One such number is the difficulty index, which is half the number of distinct operators times the total number of operands divided by the number of distinct operands. In theory, this measures how difficult code is to maintain over time.

As both of these metrics are strongly correlated with lines of code, the Maintainability Index seeks to relate them to each other and to the LOC to get an overall quick-and-dirty number to represent how difficult code is to maintain. This index was created in 1991 by Paul Oman and Jack Hagemeister, and it ranges from negative infinity to a “perfect” score of 171, achieved only by an empty file with 0  lines of code. They proposed that code scoring above about 65 should be considered easy to maintain.

These metrics are all measured with a tool called JSComplexity, a tool written by Paul Booth to easily measure the complexity of javascript code. The command-line version of this tool is complexity-report, and there’s a nicely formatted HTML reporter using that tool called Plato. From that we have a Grunt wrapper called grunt-plato, which we can use to generate an HTML report that can be included in Jenkins automatically. Still with me? 🙂

The grunt setup is pretty straightforward, as before. We can add it to our existing file with a few lines:

module.exports = function(grunt) {
  // Project configuration.
   plato: {
    complexity: {
        options: {
        jshint: false
        files: {
        'reports': ['../src/source/model/*/*.js', '../src/source/ui/*/*.js']
  // These plugins provide necessary tasks.
  // Default task.
  grunt.registerTask('default', ['plato','qunit_junit','qunit']);

I’ve turned off JSHint because I’m using ESLint above. If you like JSHint, you can leave it included and skip the whole section above about checkstyle.

We already have the grunt file being run by jenkins, so we just add the report like so:

jenkins_plato_outputAnd voila! It’ll appear on the left as a link:


Which takes you right to the report.


There’s a lot more in the world of static analysis that I’d love to be able to show you. There are tools to generate dependency analyses, tools to find common bugs, tools for finding duplicated or dead code, tools to find potential security holes… but unfortunately, the tooling for javascript is rather limited. Compiled languages are always easier to analyse than interpreted ones, and strongly typed languages are easier to analyse than weakly typed ones. Frankly, though, with all the wonders javascript developers are able to produce, I have to wonder: does the community really care about quality? Maybe the tooling is limited because beyond linting and maybe some complexity, javascript developers aren’t interested in writing these kinds of tools.

Maybe I’ll write some myself.

CI with Jenkins for Javascript: Part 1: Unit Testing

In a lot of ways, the Javascript world feels like it’s trapped in the year 2k: the dot com bubble is swelling huge, and nobody has time for best practices, it’s time to reinvent everything and strike it rich. As an SQA professional, it’s immensely frustrating to outline a technique and be told “Javascript doesn’t do that.” (That’s one of three answers that ought to be banned from a webdev’s vocabulary; the other two are “I think jQuery does that” and “Maybe with Node?” Protip: using the latest shiny library is no substitute for using your brain. But I digress.)

Anyway, so let’s set the scene: A young, frazzled SQA professional, trying to get a sandbox install of Jenkins full of shiny things to prove to the Directors that really, we do need more tools, we’re not just being lazy. Jenkins’ install was a tale for another blog, but it’s up and running. Now what?


Unit testing with QUnit

The first thing, the key component for any sort of continuous-testing exercise (nevermind the integration part for now, this is only a demo) is to automate the unit testing. In my case, we used qUnit for our tests, which is pretty standard. Or was it? Since we don’t do TDD and we’re backporting testing into legacy apps that weren’t built with testability in mind, I ended up putting Coldfusion to work for us. I created a series of drop-down menus, customized for each platform we were testing, that would let you drill down to a specific component to test (model, view, library element, et cetera). It would then read a json file to find any dependencies that were required (yes, the developers were given a lecture about minimizing dependencies. No, that doesn’t mean our legacy apps would detangle themselves magically overnight. Yes, they really insist on testing views with the real Handlebars templates stored in separate files. Sure, why not.) and include them on the page, then the item under test, then the test code.

How do I make jenkins do this? The qUnit “way” seems to be to generate this page on the fly, but there ended up being quite a bit of logic implemented in Coldfusion I didn’t want to remake. And why should I? Add a “all” option to the dropdown and I had the exact result-set I wanted to include in Jenkins. What I really needed was a way to hit that page and retrieve the results.

Enter Grunt.

Grunt is a automation system made to run javascript tasks, particularly in a node environment. I found it works pretty well to treat it like ant or maven in your javascript stack: migrate the nitty-gritty logic to grunt, then execute the grunt script from jenkins. I didn’t see a plugin to do this, so I used the shell plugin to execute grunt. That gets the tests run and result files generated.

To get Grunt installed, you need Node (and the Node Package Manager). Once you have a working install of Node, you can install grunt with npm install -g grunt, which does a global install of grunt using node package manager. Of course, this isn’t enough to get running. You then have to install the command-line interface for grunt, which is packaged separately (because nothing shiny can be simple):npm install -g grunt-cli.

The Tao of Node involves creating projects, much like you do with Java and Eclipse. We’re not actually building anything here, there’s no asset pipeline involved at this stage, but you still have to have a project. So we create one. The simplest way to create a node package is with npm init, which will create a file called “package.json”. You probably never need to edit this file directly, but you can if you want. It’s just a json file.

The Tao of Grunt then involves a file you’ll be editing heavily: “Gruntfile.js”. This is where the instructions for Grunt go. These instructions come in two flavors: a list of configuration options for the specific plugin you’re using, and a list of plugins to activate for a given build target. This is kind of like ant, but with JSON instead of XML. Your basic gruntfile looks like:

module.exports = function(grunt) {

  // Project configuration.
   //JSON for config options here

  // Load the plugins

  // Default task(s).
  grunt.registerTask('default', ['sometask']);


For this use case, we want to use grunt-contrib-qunit to run my tests and snag the results. So we set that up first. Since I want to use an existing runner, I use the urls option to pass in the URL of the runner. My gruntfile then looks like:

module.exports = function(grunt) {

  // Project configuration.
   qunit: {
      all: {
        options: {
          urls: [

  // Load the plugins

  // Default task(s).
  grunt.registerTask('default', ['qunit']);


From the command line, I can then run sudo grunt from the folder with the gruntfile and bam, there’s my results. (Make sure there’s no authentication required to get to your test runner. That’s for the advanced class. Also, if you find someone teaching the advanced class, I’d love to sign up 🙂 ).
Of course, I can’t actually do that until I install grunt-contrib-qunit. Thankfully, these plugins are all available via npm. This is probably a good time to make an aside note about how npm works. See, we installed grunt globally, because we want it to be availible to multiple jenkins projects, but you’re not supposed to do that often. The better way to install dependencies is without the -g flag and with the --save-dev flag. This will do two things:

  • Download the module to the project’s dependencies folder
  • Add the module to your project.json file

So in this case, we want to do a npm install grunt-contrib-qunit --save-deps. Do that for any other plugin I discuss and you’ll be up and running in no time.
Now, that’s all fine and dandy, except that qunit prints the output in a human-readable format to the screen. We want the output in a jenkins-readable file instead. Jenkins can read xUnit, TAP, and HTML reports with the help of some readily-available plugins, so basically anything standard will do; luckily, there’s another grunt plugin that takes the output from grunt-contrib-qunit and massages it into JUnit format before saving to a file. It’s called grunt-qunit-junit, because naming conventions are weird.

module.exports = function(grunt) {

  // Project configuration.
  qunit_junit: {
        options: {
   qunit: {
      all: {
        options: {
          urls: [

  // Load the plugins

  // Default task(s).
  grunt.registerTask('default', ['qunit_junit','qunit']);


Note that qunit-junit wants to be loaded before qunit. This will write the output in junit format to a folder called test-reports. Make sure that’s writeable by jenkins!
Speaking of Jenkins…

Displaying jenkins_grunt.png

The or (||) and output aren’t strictly necessary; what that does is allow jenkins to continue building if the unit tests failed. Later, in the reporting step, a build will be marked “unstable” if the tests failed, but if you don’t have this you won’t be able to execute any later steps.

Speaking of reporting, here’s how I configured the junit reporter for jenkins:

Displaying jenkins_junit.png

And voila! Click “build” and you’ll see your results right away.