API Design Basics: Pagination

How to handle pagination in your REST API, and what are the pros and cons of each method?

API Design Basics: Pagination

Pagination is a massively important concept in REST APIs, often forgotten about at first as the API is juts fine with a few hundred records, but starts crumbling like shortbread as soon as there are thousands of records. Pagination is basically breaking down a large dataset into smaller chunks, which can be fetched incrementally by a client if they really want all that data. This helps improve performance of your web server, and makes for a far better user experience as they are able to work with some of the data sooner, instead of waiting for every record ever.

Introducing pagination to an API after launch can be really difficult, and it's usually a breaking change, so get in there first and figure out your pagination strategy before you deploy. To help you pick a pagination strategy, let's look at some examples and talk through the pros and cons.

  1. Page-Based Pagination
  2. Offset-Based Pagination
  3. Cursor-Based Pagination

Page-Based Pagination

Page-based pagination uses page and size parameters to navigate through pages of data.

GET /items?page=2&size=10

This request fetches the second page, with each page containing 10 items.

There are two main ways to show pagination data in the response.

{
  "data": [
    ...
  ],
  "page": 2,
  "size": 10,
  "total_pages": 100
}

This is pretty common but forces the client to know a whole lot about your pagination implementation. As always when you want to move logic to the server-side instead of forcing clients to read lots of docs, you can add links, also known as HATEOAS, or "Hypermedia Controls" for short.

{
  "data": [
    ...
  ],
  "meta": {
    "page": 2,
    "size": 10,
    "total_pages": 100
  },
  "links": {
    "self": "/items?page=2&size=10",
    "next": "/items?page=3&size=10",
    "prev": "/items?page=1&size=10",
    "first": "/items?page=1&size=10",
    "last": "/items?page=100&size=10"
  }
}

Hypermedia controls are often seen as controversial with people chosing to skip them and pretty much ignore one of the main benefits of what makes a REST API a REST API in the process, but interestingly pagination seems to be the one use case where everyone is happy with it. If there's a next link, you can show a next button. If the next link returns data, you can show the data. You could even remove the meta object entirely and let people use links alone.

Ease of Use

  • Pro: Simple to implement and understand.
  • Pro: Easy for users to navigate through pages.
  • Pro: UI can show page numbers and know exactly how many pages there are.
  • Pro: Can optionally show a next/previous link as you know if there are more pages available.

Performance

  • Con: Involves counting all records in the dataset which can be slow and hard to cache depending on how many variables are involved in the query.
  • Con: Becomes exponentially slower the more records you have. Hundreds are fine. Thousands are rough. Millions are horrendous.

Consistency

  • Con: If you load the latest 10 records, then a new record is added to the database, then a user loads the second page, they'll see one of those records twice. This is because there is no such concept as a "page" in the database, just saying "grab me 10, now the next 10" does not differentiate which records they actually were.

Offset-Based Pagination

Offset-based pagination is a more straightforward approach. It uses offset and limit parameters to control the number of items returned and the starting point of the data, which avoids the concept of counting everything and dividing by the limit, and just focuses on using offsets to grab another chunk of data.

GET /items?offset=10&limit=10

This request fetches the second page of items, assuming each page contains a maximum of 10 items, and does not worry itself with how many pages there are. This can help with infinite scrolls or automatically "importing" lots of data one chunk at a time.

There are two main ways to show pagination data in the response.

{
  "data": [
    ...
  ],
  "meta": {
    "total": 1000,
    "limit": 10,
    "offset": 10
  }
}

Or with hypermedia controls in the JSON:

{
  "data": [
    ...
  ],
  "meta": {
    "total": 1000,
    "limit": 10,
    "offset": 10
  },
  "links": {
    "self": "/items?offset=10&limit=10",
    "next": "/items?offset=20&limit=10",
    "prev": "/items?offset=0&limit=10",
    "first": "/items?offset=0&limit=10",
    "last": "/items?offset=990&limit=10"
  }
}

Ease of Use

  • Pro: Simple to implement and understand.
  • Pro: Easily integrates with SQL LIMIT and OFFSET clauses.
  • Pro: Like page-based pagination this approach can also show next/previous buttons dynamically when it's clear there are more records available.
  • Con: Does not hel pthe UI build a list of pages if they want to show "Page 1, 2, ... 20." They can awkwardly do maths on the total / limt but its a bit weird.

Performance

  • Con: Can become inefficient with large datasets due to the need to scan through all previous records.
  • Con: Performance degradation is significant as the offset increases.

Consistency

  • Con: The same problems exist for offset pagination as page pagination, if more data has been added you could see the same record returned twice in two requests.

See this in action

Cursor-Based Pagination

Cursor-based pagination uses an opaque string (often a unique identifier) to mark the starting point for the next set of items. It's often more efficient and reliable for large datasets.

GET /items?cursor=abc123&limit=10

Here, abc123 represents the last item's unique identifier from the previous page, this could be a UUID, but it can be more dynamic than that.

APIs like Slack will base64 encode information with a field name and a value, so you can send it a order by field, and an ID, all wrapped up in an opaque string of dXNlcjpXMDdRQ1JQQTQ= to represent user:W07QCRPA4. This avoids API consumers hard-coding values so your pagination logic can change, and consumers can pass the cursor around to do the job without any worry about what it actually involves.

Ease of Use

  • Pro: API consumers don't have to think about anything and you can change the logic easily.
  • Con: Slightly more complex to implement than offset-based pagination.
  • Con: API does not know if there are more records available after the last one in the dataset so has to show a next/previous link which may return no data.*

Performance

  • Pro: Generally more efficient than offset-based pagination depending on your data source.
  • Pro: Avoids the need to count records to perform any sort of maths which means larger data sets can be paginated without suffering exponential slowdown.

Consistency

  • Pro: Cursor based pagination data remains consistent, even if new data is added or removed, because the cursor acts as a stable merker identifying a specific record in the dataset instead of "the 10th one" which might change between requests.

It can look a bit like this:

{
  "data": [...],
  "next_cursor": "xyz789",
  "limit": 10
}

Or again if you want to save the client doing the heavy lifting you can leverage hypermedia controls:

{
  "data": [
    ...
  ],
  "links": {
    "self": "/items?cursor=abc123&limit=10",
    "next": "/items?cursor=xyz789&limit=10",
    "prev": "/items?cursor=prevCursor&limit=10",
    "first": "/items?cursor=firstCursor&limit=10",
    "last": "/items?cursor=lastCursor&limit=10"
  }
}

Update from the Slack community:

I mentioned cursors cannot conditionally show next/previous links only when there is more data available there, but Adam Altman suggests its possible to mitigate this con. If the backend requests limit+1 (for example, 11), it can check if there are 11 results received, it can conditionally include a next link in the API response. If not, it can avoid the next link.

This is a clever approach. Yes its a bit of over fetching, but it could be benchmarked against general usage to see what is producing more efficient page transactions.

See it in action

Choosing the right pagination strategy depends on your specific use case and dataset size. Offset-based pagination is simple but may suffer from performance issues with large datasets. Cursor-based pagination offers better performance and consistency for large datasets but come with added complexity. Page-based pagination is user-friendly but shares similar performance concerns with offset-based pagination.

Where should pagination go?

In all of these examples there's been the choice between sending some metadata back for the client to construct their own pagination controls, or sending them links in JSON to avoid the faff.

Using links is probably the best approach, but they don't have to go in the payload. Using RFC 8299: Web Linking might be the better choice.

Link: <https://api.example.com/items?page=1&size=10>; rel="first",
      <https://api.example.com/items?page=3&size=10>; rel="next",
      <https://api.example.com/items?page=100&size=10>; rel="last"

Popping them into HTTP headers seems like the cleaner choice instead of littering your resources with metadata. As well as the feels it's also better compressed as of HTTP/2 using HPAK, and as it's a standard it can be supported by generic HTTP clients like Ketting.

Either way, pick the right pagination strategy for your dataset, document it well with a dedicated guide in your API documentation, and make sure it scales up with the dataset you're expecting to have instead of testing with a handful of records, because if you want to change pagination later it could be a whole mess of backwards compatibility breaks.