One problem with building applications that talk to external dependencies like APIs, is that the applications are talking to external dependencies. This opens up a whole can of worms when it comes to testing the application.

You may well have heard developers saying: don’t let tests hit external dependencies!

Some folks take that to various different extremes, and don’t even let their tests talk to a database they control. That might make sense in some situation, and not in others, but something pretty much everyone agrees on is that a test suite hitting an actual API over the wire is not ideal.

If the test suite is hitting a production API, you could end up sending "funny" (offensive) test emails to a bunch of customers.

If a special testing API exists, then multiple developers hitting that test server could cause state to bleed from one test to another, causing race conditions, false positives, false negatives, or all sorts of nonsense.

Trying to reset an external API back to a specific state for each test is a fools errand. If you somehow manage it, your test suite now requires the internet, meaning anyone of your team is gonna be screwed next time they try working from a coffee shop, busy conference, plane, etc.

Here are a bunch of solutions that not only help you cut the cord, but help you get the application into specific states, improving the quality of your tests.

Mocking Code Dependencies with Unit Tests

Hopefully your application is not littered with HTTP calls to this API or their SDK directly, because that would be some tight coupling and make it reeeeal hard to switch the API for another one if the company yank it for some reason.

You probably have some thin layer wrapping their logic, giving you the chance to swap things out without changing too much of your own code. Maybe it looks a bit like this:

class Geocoder
  def address(str)
    google_sdk.geocode(str)
  end
end

The application code has VenueService which is talking to Geocoder and using the address method, which pops off to the Google Maps API to do the thing.

To avoid the test suite hitting the external API, the most likely move is to mock the Geocoder in the VenueService tests.

RSpec.describe VenueService do

  describe '.update' do
    it 'will geocode address to lat lon' do
      allow(Geocoder).to receive(:address).with('123 Main Street') do
        {
          lat: 23.534,
          lon: 45.432
        }
      }
      subject.update(address: '123 Main Street')
      expect(subject.lat).to eql(23.534)
      expect(subject.lon).to eql(45.432)
    end
  end
end

Basically what we have here is a test (using RSpec but whatever it’s all the same) which describes how the VenueService should work. The update method is being tested, and the Geocoder is being set up (monkey patched 🙈) to respond in a certain way.

For the VenueService unit tests this is fine, because the intent is to make sure VenueService works with what we think Geocoder is going to return. Unit tests for VenueService only focus on that class, so what can we do to make sure Geocoder is working properly?

Well, unit testing that class is one option, but it’s not really doing much other than talking to the Google Map SDK, and we really dont want to mock that. Why? Because we don’t own it, and mocking things you dont own is making guesses that might not be correct now, and might not be correct later. The Google Maps SDK might change, and if all we have are tests saying that the SDK works one way, but really it works another way, then you are in false positive world: a broken application with a lovely green test suite.

This will often be less of a problem for typed languages like Go, TypeScript, PHP 7, etc., but changes can happen which those type systems do not notice. For example, a foo property can still be a string, but suddenly have different contents within that string.

Integration tests are very important to make sure things work altogether.

Web Mocking in Integration Tests

Integration tests will be a bit more realistic as they hit more real code, so the behaviour is closer to what is actually likely to happen in production. This does mean integration tests can be slower than unit tests.

Some developers avoid integration tests for this reason, but that is reckless and daft premature optimization. Would you rather work on speeding up a slow but reliable test suite, or have a broken production with an untrustworthy test suite.

As integration tests hit more code, some folks think hitting the external APIs is just going to happen, but not the case!

One approach to avoid hitting the wire, yet still having realistic interactions, is to use something like WebMock for Ruby, Nock for JavaScript, or the baked in httptest in Go.

These tools are another type of mock, unlike the two other types of mocking discussed so far. Instead of mocking a class in your programming language, they mock a HTTP server. They are also very different from API specification based mocking tools like Prism, which is a whole other article.

Web mocking tools can be configured to respond in certain ways depending on what URL, HTTP method, or body params are sent to it, depending on how complex things want to get. Most of the time this is used for simple stuff.

Here’s an example taken from the CLI-tool Spectral.

const invalidOas3SpecPath = resolve(__dirname, '__fixtures__/openapi-3.0-no-contact.yaml');

describe('when loading specification files from web', () => {
  test.nock('http://foo.local', api =>
    api.get('/openapi').replyWithFile(200, validOas3SpecPath, {
      'Content-Type': 'application/yaml',
    }),
  )
  .stdout()
  .command(['lint', 'http://foo.local/openapi'])
  .it('outputs no issues', ctx => {
    expect(ctx.stdout).toContain('No errors or warnings found!');
  });
});

This test is setting up a server on the arbitrary fake hostname http://foo.local, with a GET path /openapi that returns a YAML file with some specific content.

Then other tests can confirm what Spectral will do if it tries to load an unsupported file type, the response contains a 404 status code, or any other number of edge cases.

Web mocking is great for when you want to control the response, but once again you should only mock things you own. Using this approach for the Google Maps API example would only be confirming that the Geocoder works with an assumption of what the Google Maps API is going to do. When things change in the API there is no programmatic way to know about it.

Even if the change is noticed, updating these mock setups can be time consuming. What we really want is something like Jest Snapshots, but for HTTP requests…​

Record & Replay in Integration Tests

There is a tactic called "record and replay", and it is available in pretty much every programming language in one form or another. Record & Replay has been around for years, but I did not discover it until I started using Ruby. They have a great tool called VCR ("Video Cassette Recorder").

For younger developers a VCR is like Blueray but terrible quality and the data is printed on a chunk of plastic you shove in a box under your TV. It was mostly used for recording telly you weren’t able to watch at the time, which is no longer a thing.

VCR explains the goals nicely, so I will use their words:

Record your test suite’s HTTP interactions and replay them during future test runs for fast, deterministic, accurate tests.

The basic approach is to put your test suite in "record mode", which will actually make real requests to the external services, but then it records the response. All the headers, body content, status code, the whole thing.

Then when the test suite is run not in record mode, it will reuse the recorded responses instead of going over the wire, meaning it is quick, always going to give the same result, and the entire response is being used, so you know it is accurate.

require 'rubygems'
require 'test/unit'
require 'vcr'

VCR.configure do |config|
  config.cassette_library_dir = "fixtures/vcr_cassettes"
  config.hook_into :webmock
end

class VCRTest < Test::Unit::TestCase
  def test_example_dot_com
    VCR.use_cassette("synopsis") do
      response = Net::HTTP.get_response(URI('http://www.iana.org/domains/reserved'))
      assert_match /Example domains/, response.body
    end
  end
end

This is a rather verbose Ruby example for clarity. It includes the config which would normally be tucked away in a helper, and it is manually using a cassette block, but the idea is this: You can define multiple cassettes, and switch them out to see the code working differently.

How exactly it works under the hood might be a bit too much of how the sausage is made, but it is very clever so I am going to nerd out a little. In Ruby once again there is some monkey patching going on. It knows to look out for common HTTP clients, and actually messes with their definitions a little (only in the test suite). This sounds a bit scary, but it means VCR can hijack the HTTP requests and use the recorded versions instead.

Most of these record & replay tools can be configured to use the more static web mocking tools mentioned previously. Ruby VCR for example can use webmock, just think of VCR as a helper for creating these accurate web mocks.

Another convenient thing about record & replay is the ability to have expiring cassettes. You can configure these recordings to automatically expire (vanish) after a certain amount of time, and then the test suite goes back into record mode. Or you can have them throw warnings, and hope some developers actually pay attention. This can be very annoying, but you would not believe how often I have seen client application developers use year old stubs with fields that did not exist anymore.

When recorded responses expire, clients need to go over the wire and record new responses. This can be tricky if as the API might have different data now. Some amount of effort can go into getting good data on the API for recording, which might be a case of building a sort of seed script. This annoyance is worth it in the long run, but certainly takes some getting used to.

Expiring recordings go hand in hand with deprecations and evolution, especially Sunset and Deprecated headers. If your applications are using reasonably up-to-date recordings, then your test suite can start throwing deprecating warnings, and loudly report about the code hitting is URLs marked for removal with Sunset.

The Ruby VCR was initially inspired by Chris Young’s NetRecorder are the inspiration for a lot of other record and replay tools, and they maintain an impressive list of ports to other languages:

If you are a JavaScript user then check out Polly.js, comically written by Netflix. It has some great config options.

polly.configure({
  recordIfMissing: true,
  recordIfExpired: false,
  recordFailedRequests: false,

  expiresIn: null,
  timing: Timing.fixed(0),

  matchRequestsBy: {
    method: true,
    headers: true,
    body: true,
    order: true,
  }
})

The recordIfMissing is a good option, which means when folks add new tests it will try to record the request when it is run the first time. This can catch developers out if they are not expecting it, and can lead to a rubbish response being recorded so they have to delete and try again, but again it is worth getting used to.

Another one I like is recordFailedRequests: true. This is yet another reminder that if the API is ignoring HTTP conventions like status codes, this will not work. Ask the API developers to stop ignoring conventions and build their APIs properly. Maybe send them a copy of Build APIs You Won’t Hate. if they need convincing.

All this and more in Surviving Other Peoples APIs, currently available for pre-order, with roughly 80% of the book available for download.