API Design Basics: Resources & Collections

Learn how to use resources and collections for a REST API, getting stuck into some real world examples, using links to get between them all, and some pitfalls to avoid.

HTTP APIs representing data generally split that down into two concepts: resources and collections. Let's take a look at these two concepts, as see how they’re used in real-world APIs.

What is a Resource?

In the context of REST/HTTP APIs, a resource represents a specific piece of data or object that can be accessed via a unique URI (Uniform Resource Identifier). This could be anything: a user, a blog post, a product, or an order.

Imagine you have an API which handles invoices and payments. Each invoice would be a resource, and each resource would have its own URI:

GET /invoices/645E79D9E14

{
    "id": "645E79D9E14",
    "invoiceNumber": "INV-2024-001",
    "customer": "Acme Corporation",
    "amountDue": 500.00,
    "dateDue": "2024-08-15",
    "dateIssued": "2024-08-01",
    "items": [
        {
            "description": "Consulting Services",
            "quantity": 10,
            "unitPrice": 50.00,
            "total": 500.00
        }
    ],
    "links": {
        "self": "/invoices/645E79D9E14",
        "customer": "/customers/acme-corporation",
        "payments": "/invoices/645E79D9E14/payments"
    }
}

Here, /invoices/645E79D9E14 is the endpoint that uniquely identifies a single resource, in this case, the invoice with the unique ID 645E79D9E14.

The resource contains loads of data, including the customer name, an array of items on the invoice, various dates, and how much of the invoice is left to be paid.

It also has "links", which can be related resources, collections, which could be pure data or could be actions, like a "pay" link which allows you to make a payment, a "send" link which helps you send an invoice, or the one we've gone with here "payments", which still allows you to create a payment, but also supports viewing a list of partial and failed payments.

What is a Collection?

A collection is a group of resources. It’s essentially a list or set of all the items of a particular type. Collections also have their own unique URLs.

Using the invoices example again, if you wanted the API to let users retrieve all invoices, you would have an /invoices collection:

GET /invoices

[
  {
    "id": "645E79D9E14",
    "invoiceNumber": "INV-2024-001",
    "customer": "Acme Corporation",
    "amountDue": 500.00,
    "dateDue": "2024-08-15"
  },
  {
    "id": "646D15F7838",
    "invoiceNumber": "INV-2024-002",
    "customer": "Monsters Inc.",
    "amountDue": 750.00,
    "dateDue": "2024-08-20"
  }
]

In JSON this collection is represented with an array, where each item in the list is a representation of a resource.

Usually the API returns some basic information about each resource in the collection, and this example has links so the client can easily load up more data for each resource it's interested in.

The vast majority of web APIs are built like this, but how can anyone know where the resources are? They could guess, go off searching around the Internet for some API documentation, or you could just... you know... tell them.

GET /invoices

[
  {
    "id": "645E79D9E14",
    "invoiceNumber": "INV-2024-001",
    "customer": "Acme Corporation",
    "amountDue": 500.00,
    "dateDue": "2024-08-15",
    "links": {
      "item": "/invoices/645E79D9E14"
    }
  },
  {
    "id": "646D15F7838",
    "invoiceNumber": "INV-2024-002",
    "customer": "Monsters Inc.",
    "amountDue": 750.00,
    "dateDue": "2024-08-20",
    "links": {
      "item": "/invoices/646D15F7838"
    }
  }
]

If you have a collection of things, and want clients to load up more data from going to the URL, it makes sense to give them that URL so that they can do that.

A website and a web API are not all that different, they both help access data, make lists of things, show more details for things, and make more actions available to users, and that's exactly how you can think about popping links in.

HTTP Methods and How They Work with Resources and Collections

REST APIs typically use standard HTTP methods to interact with resources and collections:

GET: Retrieve data.

  • /posts - Get a collection of all blog posts.
  • /posts/123 - Get a single blog post by its ID.

POST: Create a new resource.

  • /posts - Add a new blog post to the collection.

PUT: Replace an entire existing resource.

  • /posts/123 - Update the blog post with ID 123.

PATCH: Update part an existing resource.

  • /posts/123 - Update the blog post with ID 123.

DELETE: Remove a resource.

  • /posts/123 - Delete the blog post with ID 123.

APIs are about a whole lot more than just CRUD, but when thinking about collections and resources this is a simple way to start thinking about it.

Best Practices

URI Structure

The structure of URIs in REST APIs is crucial for consistency and readability. Here are some common conventions:

  • Nouns over Verbs: URIs typically use nouns (like /posts) rather than verbs (like /getPosts), because HTTP methods (GET, POST, etc.) already imply the action.
  • Pluralization: Collections are usually plural (e.g.: /posts), while resources are identified with a unique identifier (e.g.: /posts/123).

Minimal Data in Collections

When retrieving a collection, APIs often return minimal information about each resource to save bandwidth and speed up responses. This allows you to quickly scan the collection and then retrieve more detailed information if needed.

GET /posts

[
    {
        "id": 123,
        "title": "Understanding REST APIs",
        "author": "Bob Doe",
        "link": "/posts/123"
    },
    {
        "id": 124,
        "title": "Introduction to HTTP Methods",
        "author": "Sally Smith",
        "link": "/posts/124"
    }
]

There's plenty of debate about how much detail you should put in your collections.

If you put everything in there and bloat the collections horrendously, wasting time, money, and carbon emissions stressing your servers sending massive JSON payloads around.

If you trim them down to the bare minimum then you force consumers to make more requests to get even the most basic data.

Some even go as far as putting no information at all in their collections because it can all be fetched directly from the resources, which mean if cached data does change, there's not a strange outcome of having a collection and a resource showing different data.

GET /posts

[
    {
        "link": "/posts/123"
    },
    {
        "link": "/posts/124"
    }
]

There is no one simple answer here, but if you are using a bit of common sense and talking to your consumers, you should be able to find something that works for you.

I generally strike a reasonable middle-ground, where "summary" data is in the collection: name, ID, status, and a few key bits of data that you know from talking to consumers are the most important bits they want access to when they're building an index of data.

Then if people want more data, they can go fetch it, but it's up to them. There's a lot we can do to make this more performant with sensible HTTP caching and better API design, but those are all topics for another day.

Collections linking to resources is helpful, letting clients follow various links throughout your API like a user browsing a website, but resources can link to other related resources and collections, which might be data but could also be considered "actions", all handled through the same conventions.

GET /posts/123

{
    "id": 123,
    "title": "Understanding REST APIs",
    "author": "Jane Doe",
    "content": "This is a detailed tutorial on REST APIs...",
    "datePublished": "2023-10-01",
    "links": {
        "self": "/posts/123",
        "author": "/authors/jane-doe",
        "comments": "/posts/123/comments"
    }
}

In this response:

  • The self link points to the resource itself, like a canonical URL, which is a handy convention for knowing where something came from even if you're just seeing a JSON blob of it or its available on multiple URLs.
  • The author link points to the resource representing the author of the post because it's quite likely you'll want to load that, but its also going to have its own caching rules and makes no sense to munge that data into the post resource.
  • The comments link points to a collection of comments related to this post if you want to load that, and any application loading that up is going to want to do it after it's got the post showing to users, so it doesn't matter if it loads later.

Splitting up API data into multiple endpoints that can be grabbed if needed is really handy, upgrading a REST API from basically a set of functions which grab some data, into an Object-Relational Mapping (ORM) where relationships can be navigated easily, but we can go a step further.

Later articles in the series will show you how to upgrade that ORM to a State Machine, so make sure you subscribe.

Don't Confuse Resource Design & Database Design

A key aspect of API design is not tying your resources and collections directly to the resources being designed. Your database needs to be able to change and evolve rapidly as data structures change, but your API needs to evolve slowly (or not at all), meaning the more tied your API customers are to your internal database structure the more they're going to have to rewrite their applications.

So, the customer might be showing up in the invoice resource even though its in a separate table, and could be INNER JOIN'ed in the background (for those using SQL). Then if that query starts to get really slow you could reduce a level of normalization and bung that customer name directly into the invoices, which is going to help if the customer changes their name, because then you have a history of invoices with names correct at the time.

There's lots to think about, but the quick point here is to avoid letting your database design influence your resource design too heavily. Your clients should always come first.

Real-World Examples

GitHub API

When retrieving a list of repositories, each repository item includes a url field that links to the full details of that repository.

GET /users/octocat/repos

[
  {
      "id": 1296269,
      "name": "Hello-World",
      "url": "https://api.github.com/repos/apisyouwonthate/Hello-World"
  }
]

Twitter API

When retrieving a user's timeline, each tweet includes a url that links to the specific tweet’s details.

GET /statuses/user_timeline.json?screen_name=apisyouwonthate

[
  {
      "created_at": "Wed Oct 10 20:19:24 +0000 2018",
      "id": 1050118621198921728,
      "text": "Just setting up my Twitter. #myfirstTweet",
      "url": "https://api.twitter.com/1.1/statuses/show/1050118621198921728.json"
  }
]

Stripe API

Stripe has a collection which is a bit different, instead of returning a JSON array directly in the response, it wraps it in an object with a data property:

{
  "object": "list",
  "url": "/v1/charges",
  "has_more": false,
  "data": [
    {
      "id": "ch_3MmlLrLkdIwHu7ix0snN0B15",
      "object": "charge",
      "amount": 1099,
      "amount_captured": 1099,
      "amount_refunded": 0,
      "application": null,
      "application_fee": null,
      "application_fee_amount": null,
      "balance_transaction": "txn_3MmlLrLkdIwHu7ix0uke3Ezy",
      "billing_details": {
        "address": {
          "city": null,
          "country": null,
          "line1": null,
          "line2": null,
          "postal_code": null,
          "state": null
        },
        "email": null,
        "name": null,
        "phone": null
      },
      "calculated_statement_descriptor": "Stripe",
      "captured": true,
      "created": 1679090539,
      "currency": "usd",
      "customer": null,
      ... snip because its HUGE...
    }
    {...}
    {...}
  ],
}

They do this so they can add in various other bits of metadata, but much of this metadata comes down to pagination which can be handled other ways (like popping pagination into Links headers), so this practice is somewhat dying out.

Summary

  • Use Consistent Naming: Stick to conventions like using plural nouns for collections. It shouldn't matter, but it drives people mad.
  • Keep it Simple: Start with basic endpoints and add complexity only when necessary. It's easier to add things to an API if they're needed later, than take them away once they're in production.
  • API model is not a database model: Do not try and recreate your database model over HTTP because it will be a big heaving waste of time and be almost immediately wrong making clients upset.

By understanding and applying these concepts, you'll be able to design and work with RESTful APIs effectively, ensuring that your API interactions are intuitive, efficient, and scalable.