What is API Rate Limiting All About?

Something API clients (applications talking to APIs) are not often aware of, but
run into often, is rate limiting; the API telling you to calm down a bit, and
slow down how many requests are being made in a certain timeframe. The most
basic rate limiting strategy is often "clients can only send X requests per
second."

Many APIs implement rate limiting to ensure relative stability when unexpected
things happen. If for some reason one client causes a spike in traffic, the API
has to continue running smoothly for other users instead of crashing. A
misbehaving (or malicious script) could be hogging resources, or the API systems
could be struggling and they need to cut down the rate limit for "lower
priority" traffic. Sometimes it is just because the company providing the API
has grown beyond their wildest dreams, and want to charge money for increasing
the rate limit for high capacity users.

Often the rate limit will be associated to an "API key" or "access token", and
our friends over at Nordic APIs very nicely explain
some other rate limiting strategies:

Server rate limits are a good choice as well. By setting rates on specific
servers, developers can make sure that common use servers, such as those used to
log in, can handle a lot more requests than specialized or seldom used servers,
such as data conversion devices.

Finally, the API developer can implement regional data limits, which limit
calls by region. This is especially useful when implementing behavior-based
limiting; for instance, a developer would expect the number of requests during
midnight in North America to be lower than the baseline daytime rate, and any
behavior contrary to this without a good reason would suggest questionable
activity. By limiting the region for a period of time, this can be
prevented. — Nordic
APIs

All fair reasons, but for the client it can be a little pesky.

Throttling Your API Calls

There are a lot of ways to go about throttling your API calls, and it very much
depends on where the calls are being made from.

One of the hardest things to limit are API calls to a third party being made
directly to the client. For example, if your iOS/web/etc clients are making
Google Map API calls directly from the application, there is very little you can
do to throttle that. You’re just gonna have to pay for the appropriate usage
tier for how many users you have.

Other setups can be a little easier. If the rate limited API is being spoken to
via some sort of backend process, and you control how many of those processes
there are, you can limit often that function is called in the backend code. For
example, if you are hitting an API that allows only 20 requests per second, you
could have 1 process that allows 20 requests per second to pass through.

If this process is handling things synchronously that might not quite work out,
and you might need to have something like 4 processes handling 5 requests per
second each, but you get the idea.

If this process was being implemented in NodeJS, you could use
Bottleneck.

const Bottleneck = require("bottleneck");

// Never more than 5 requests running at a time.
// Wait at least 1000ms between each request.
const limiter = new Bottleneck({
 maxConcurrent: 5,
 minTime: 1000
});

const fetchPokemon = id => {
 return pokedex.getPokemon(id);
};

limiter.schedule(fetchPokemon, id).then(result => {
 /* ... */
});

Ruby users who are already using tools like Sidekiq can add plugins like
Sidekiq::Throttled, or pay
for Sidekiq
Enterprise, to get
rate limiting functionality. Worth every penny in my books.

Every language will have some sort of throttling, job queue limiting, etc.
tooling, but you will need to go a step further. Doing your best to avoid
hitting rate limits is a good start, but nothing is perfect, and the API might
lower its limits for some reason.

Am I Being Rate Limited?

The appropriate HTTP status code for rate limiting has been argued over about as
much as tabs vs spaces, but there is a clear winner now; RFC 6585 defines it as
429, so APIs should be using 429.

http cat is an amazing resource.

Twitter’s API existed for a few years before this standard, and they chose "420 — Enhance Your Calm". They’ve dropped this and moved over to 429, but some others copied them at the time, and might not have updated since. You cannot rule out bumping into a copycat API, still using that outdated unofficial status.

http cat shows us how a cat can enhance their calm.

Google also got a little "creative" with their status code utilization. For a long time were using 403 for their rate limiting, but I have no idea if they are still doing that. GitHub v3 (a RESTful API that was replaced with a GraphQL, but is still floating around) is still using 403:

HTTP/1.1 403 Forbidden
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1377013266
{
  "message": "API rate limit exceeded for xxx.xxx.xxx.xxx. (But here’s the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)",
  "documentation_url": "https://developer.github.com/v3/#rate-limiting"
}

Getting a 429 (or a 420) is a clear indication that a rate limit has been hit, and a 403 combined with an error code, or maybe some HTTP headers can also be a thing to check for. Either way, when you’re sure it’s a rate limit error, you can move onto the next step: figuring out how long to wait before trying again.

Proprietary Headers

Github here are using some proprietary headers, all beginning with X-RateLimit-. These are not at all standard (you can tell by the X-), and could be very different from whatever API you are working with.
Successful requests with Github here will show how many requests are remaining, so maybe keep an eye on those and try to avoid making requests if the remaining amount on the last response was 0.

$ curl -i https://api.github.com/users/octocat
HTTP/1.1 200 OK
X-RateLimit-Limit: 60
X-RateLimit-Remaining: 56
X-RateLimit-Reset: 1372700873

You can use a shared key (maybe in Redis or similar) to track that, and have it expire on the reset provided in UTC time in X-RateLimit-Reset.

Retry-After

According to the RFCs for HTTP/1.1 (the obsoleted and irrelevant RFC 2616, and the replacement RFC 7230–7235), the header Retry-After is only for 503 server errors, and maybe redirects. Luckily RFC 6584 (the same one which added HTTP status code 429) says it’s totally cool for APIs to use Retry-After there.

So, instead of potentially infinite proprietary alternatives, you should start to see something like this:

HTTP/1.1 429 Too Many Requests
Retry-After: 3600
Content-Type: application/json

{
 "message": "API rate limit exceeded for xxx.xxx.xxx.xxx.",
 "documentation_url": "https://developer.example.com/#rate-limiting"
}

An alternative value for Retry-After is an HTTP date:

Retry-After: Wed, 21 Oct 2015 07:28:00 GMT

Same idea, it just tells the client to wait until then before bothering the API
further. By checking for these errors, you can catch then retry (or re-queue)
requests that have failed, or if thats not an option try sleeping for a bit to
calm workers down.

Warning: Make sure your sleep does not block your background processes
from processing other jobs. This can happen in languages where sleep sleeps the
whole process, and that process is running multiple types job on the same
thread. Don’t back up your whole system with an overzealous sleep!

Faraday, a Ruby gem I work with often, is now aware of Retry-After. It uses the
value to help calculate the interval between retry requests. This can be useful
for anyone considering implementing rate limiting detection code, even if you
aren’t a Ruby fan.

To learn how to implement rate limiting, Google it! Nginx can help you, API Management gateways
like Kong and
Tyk do a great job,
or you can try to implement it at the application level for aaaaallllll of your
APIs, which is a bit of a pain in the butt. Some frameworks like Laravel (PHP) support basic rate limiting out of the box, which is
pretty darn cool.

All this and more in Surviving Other Peoples APIs, currently available
for pre-order, with roughly 80% of the book available for download.