Teatime: Performance Testing Basics with jMeter

Welcome back to Teatime! This is a weekly feature in which we sip tea and discuss some topic related to quality. Feel free to bring your tea and join in with questions in the comments section.

Tea of the week: Black Cinnamon by Sub Rosa tea. It’s a nice spicy black tea, very like a chai, but with a focus on cinnamon specifically; perfect for warming up on a cold day.

Today’s topic: Performance Testing with jMeter

This is the second in a two-part miniseries about performance testing. After giving my talk on performance testing, my audience wanted to know about jMeter specifically, as they had some ideas for performance testing. So I spent a week learning a little bit of jMeter, and created a Part 2. I’ve augmented this talk with my learnings since then, but I’m still pretty much a total novice in this area 🙂

And, as always when talking about performance testing, I’ve included bunnies.

What can jMeter test?

jMeter is a very diverse application; it works at the raw TCP/IP layer, so it can test out both your basic websites (by issuing a GET over HTTP),  or your API layer (with SOAP or XML-RPC calls). It also ships with a JDBC connector, so it can test out your database performance specifically.  It also comes with basic configuration for testing LDAP, email (POP3, IMAP, or SMTP), and FTP protocols. It’s pretty handy that way!

A handy bunny

Setting up jMeter

The basic unit in jMeter is called a test plan; you get one of those per file, and it outlines what will be run when you hit “go”. In that plan, you have the next smaller unit: a thread group. Thread groups contain one or more actions and zero or more reporters or listeners that report out on the results. A test plan can also have listeners or reporters that are global to the entire plan.

The thread group controls the number of threads allocated to the actions underneath them; in layman’s terms, how many things it does simultaneously, simulating how many users. There’s also settings for a ramp-up time (how long it takes to go from 0 users/threads to the total number) and the number of executions. The example in the documentation lays it out like so: if you want 10 users to hit your site at once, and you have a ramp-up time of 100 seconds, each thread will start 10 seconds after the previous one, so that after 100 seconds you have 10 threads going at once, performing the same action.

Actions are implemented via a unit called a “Controller”. Controllers come in two basic types: samplers and logical controllers. A sampler sends a request and waits for the response; this is how you tell the thread what it’s doing. A logic controller performs some basic logic, such as “only send this request once ever” (useful for things like logging in) or “Alternate between these two requests”.

Multiple bunnies working together

You can see here an example of some basic logic:


In this example, once only, I log in and set my dealership (required for this request to go through successfully). Then, in a loop, I send a request to our service (here called PQR), submitting a request for our products. I then verify that there was a successful return, and wait 300 milliseconds between requests (to simulate the interface doing something with the result). In the thread group, I have it set to one user, ramping up in 1 second, looping once; this is where I’d tweak it to do a proper load test, or leave it like that for a simple response check.

Skeptical bunny is skeptical

In this test, I believe I changed it to 500 requests and then ran the whole thing three times until I felt I had enough data to take a reasonable average. The graph results listener gave me a nice, easy way to see how the results were trending, which gave me a feel for whether or not the graph was evening out. My graph ended up looking something like this:

graphThe blue line is the average; breaks are where I ran another set of tests. The purple is the median, which you can see is basically levelling out here. The red is the deviation from the average, and the green is the requests per minute.

Good result? bad result? Bunnies.

Have you ever done anything with jMeter, readers? Any tips on how to avoid those broken graph lines? Am I doing everything wrong? Let me know in the comments 🙂

Teatime: Testing Application Performance

Welcome back to Teatime! This is a weekly feature in which we sip tea and discuss some topic related to quality. Feel free to bring your tea and join in with questions in the comments section.

Tea of the week: Still on a chai kick from last week, today I’m sipping on Firebird’s Child Chai from Dryad Teas. I first was introduced to Dryad Tea at a booth at a convention; I always love being able to pick out teas in person, and once I’d found some good ones I started ordering online regularly. It’s a lovely warm chai, with a great kick to it. 

Today’s topic: Testing Application Performance

This is the first in a two-part miniseries about performance testing. The first time I gave this talk, it was a high-level overview of performance testing, and when I asked (as I usually do) if anyone had topic requests for next week, they all wanted to know about jMeter. So I spent a week learning a little bit of jMeter, and created a Part 2.

This talk, however, remains high-level. I’m going to cover three main topics:

  • Performance Testing
  • Load Testing
  • Volume Testing

I have a bit of a tradition when I talk about performance testing: I always illustrate my talks with pictures of adorable bunnies. After all, bunnies are fast, but also cute and non-threatening. Who could be scared of perf tests when they’re looking at bunnies?

White Angora Bunny Rabbit
Aww, lookit da bunny!

Performance Testing

Performance testing is any testing that assesses the performance of the application. It’s really a super-group over the other two categories that way, in that load testing and volume testing are types of performance testing. However, when used without qualifiers, we’re typically talking about measuring the response time under a typical load, to determine how fast the application will perform in the average, everyday use case.

You can measure the entire application, end to end, but it’s often valuable to instead test small pieces of functionality in isolation. Typically, we do this the same way a user would: we make an HTTP request (for a web app), or a series of HTTP requests, and measure how long it took to come back. Ideally, we do this a lot of times, to simulate a number of simultaneous users of our site. For a desktop application, instead of adding “average load”, we are concerned about the average hardware: we run the application on an “average” system and measure how long it takes to, say, repaint the screen after a button click.

But what target do you aim for? The Nielsen Normal Group outlined some general guidelines:

  • One tenth of a second response time feels like the direct result of a user’s action. When the user clicks a button, for example, the button should animate within a tenth of a second, and ideally, the entire interaction should complete within that time. Then the user feels like they are in control of the situation: they did something and it made the computer respond!
  • One second feels like a seamless interaction. The user did something, and it made the computer go do something complicated and come back with an answer. It’s less like moving a lever or pressing a button, and more like waiting for an elevator’s door to close after having pressed the button: you don’t doubt that you made it happen, but it did take a second to respond.
  • Ten seconds and you’ve entirely lost their attention. They’ve gone off to make coffee. This is a slow system.
Baby Bunny Rabbit

Load Testing

Load testing is testing the application under load. In this situation, you simulate the effects of a number of users all using your system at once. This generally ends up having one of two goals: either you’re trying to determine what the maximum capacity of your system is, or you’re trying to figure out if the system gracefully degrades when it exceeds that maximum capacity. Either way, you typically start with a few users and “ramp up” the number of users over a period of time. You should figure out before you begin what the maximum capacity you intend to have is, so you know if you’re on target or need to do some tuning.

Like the previous tests, this can be done at any layer; you can fire off a ton of requests at your API server, for example, or simulate typical usage of a user loading front-end pages that fire requests at the API tier. Often, you’ll do a mix of approaches: you’ll generate load using API calls, then simulate a user’s degraded experience as though they were browsing the site.

There’s an interesting twist on load testing where you attempt to test for sustained load: what happens to your application over a few days of peak usage rather than having a few hours and then a downtime in between? This can sometimes catch interesting memory leaks and so forth that you wouldn’t catch in a shorter test.

Bunny under load

Volume Testing

While load testing handles the case of a large amount of users at once, volume testing is more focused on the data: what happens when there’s a LOT of data involved. Does the system slow down finding search results when there’s millions or billions of records? Does the front-end slow to a crawl when the API returns large payloads of data? Does the database run out of memory executing complex query plans when there’s a lot of records?

A high volume of bunnies

Do you load test at your organization? How have you improved your load testing over the years? What challenges do you face?