You Might Not Need GraphQL

Phil Sturgeon

12 Aug 2017 — 10 min read

Do you like the look of GraphQL, but have an existing REST/RPC API that you don’t want to ditch? GraphQL definitely has some cool features and benefits. Those are all bundled in one package, with a nice marketing site, documenting how to do all the cool stuff, which makes GraphQL seem more attractive to many.

Obviously seeing as GraphQL was built by Facebook, makers of the RESTish Graph API, they’re familiar with various endpoint-based API concepts. Many of those existing concepts were used as inspiration for GraphQL functionality. Other concepts were carbon copied straight into GraphQL.

Facebook has experimented with various different approaches to sharing all their data between apps; remember FQL? Executing SQL-like syntax over a GET endpoint was a bit odd.

GET /fql?q=SELECT%2Buid2%2BFROM%2Bfriend%2BWHERE%2Buid1%3Dme()&access_token=...

Facebook got a bit fed up with having a one-endpoint-based approach to get data and this other totally different thing, as they both require different code. As such, GraphQL was created as a middle-ground between endpoint-based APIs and FQL, the latter being an approach most teams would never consider — or want.

That said, Facebook (and others like GitHub) switching from RESTish to GraphQL makes folks consider GraphQL as a replacement for REST. It is not. It is an alternative.

Whilst the use-cases for the sort of API you’d build in GraphQL are quite different from those you’d build with REST (more on this) or RPC, that does not stop some folks jumping on the hot new thing. If you can withhold the urge to jump on the shiny new thing but are interested in some of the functionality GraphQL has to offer, you can brush up your endpoint-based APIs with these excellent existing concepts.

Note: This article uses the term REST as defined in Roy Fielding’s dissertation, RESTish to mean it’s somewhere on the Richardson Maturity Model but didn’t make it to the top, and endpoint-based APIs to mean any REST/RESTish/RPC/etc. API that uses endpoints instead of POSTing to a single /graphql endpoint.

Sparse Fieldsets / Partials

GraphQL allows you to specify the fields you would like to be returned, allowing you to skip all data that is not relevant to your response. This makes the request a little bit faster to download over the network, as the tubes do not get quite so full.

REST certainly does not talk about this out of the box, because REST does not concern itself with such implementation specifics. The practice is however very common in the endpoint-based API world.

A common standard for REST APIs to implement is JSON-API, which talks about sparse fieldsets. The basic idea is that you can specify the fields in the request:

GET /articles?fields[articles]=title,body

In the past YouTube had some really whacky partial syntax for this sort of thing:

GET /feeds/api/users/default/uploads?fields=entry(title,gd:comments,yt:statistics)

Facebook also has this in their Graph API:

GET /frankcarter?fields=id,name,picture

Read more on building your own implementation of this if you’re interested, but I would stick to using existing solutions like ActiveModel::Serializer (Rails), tobscure/json-api (PHP), etc. which handle standard implementations for you.

Types / Schemas

Folks talking about GraphQL get really excited about the concept of Types. JSON can be a little vague when it comes to types. Requests require a lot of validation, and responses require clarification.

Some weakly typed languages might send a numeric string when they really meant to send an integer. A numeric JSON response field could look like an integer one minute, but then a wild decimal place appears.

GraphQL APIs are organized in terms of types and fields, not endpoints. Access the full capabilities of your data from a single endpoint. GraphQL uses types to ensure Apps only ask for what’s possible and provide clear and helpful errors.
Source: graphql.org

Many endpoint-based API developers solve this with HTTP documentation using tools like MSON to describe their data, but another approach is to use JSON Schema.

JSON Schema is very cool and lets you describe your JSON using a JSON metadata file. This metadata file can be linked into your main JSON outputs, letting JSON Schema-aware clients discover the metadata over the wire. One of the many benefits of this is allowing client apps (like a web frontend) to validate their form data using the exact same rules as the server-side, without needing to go over the wire. 😲

Taken from JSON Schema examples:

{
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string"
    },
    "lastName": {
      "type": "string"
    },
    "age": {
      "description": "Age in years",
      "type": "integer",
      "minimum": 0
    }
  },
  "required": [
    "firstName",
    "lastName"
  ]
}

Comically, all of these concepts (type systems, schemas, metadata in general, etc.) is exactly what most people hated about SOAP. Instead of just the payload, you had to mess about with a WSDL, and that lead folks to enjoy REST. Well, in REST, using something like JSON Schema, you have the choice of using them, as do your clients.

Aren’t interested in JSON? Look into Protocol Buffers like ProtoBuff or Cap’n Proto . They’re an identical concept to the GraphQL type system, and have been implemented in a bajillion languages too.

Evolution / Versioning

Versioning is a really big, muddy, awful topic in the world of APIs, and most of us agree that every approach is a minefield .

Some folks will version in the URI, making /v1/foos and /v1/bars, then make /v2/foos and /v2/bars. If nothing changed between v1 and v2 for bars, anyone using the API doesn't know that, and has to go read some documentation to find out before they can upgrade. That leads to slow uptake, and now you have two endpoints for ages.

Some folks will start the same, with /v1/foos and /v1/bars, but then only create /v2/foos, leaving /v1/bars in place. That can be slightly better, but if (for example) the Api::v1::AppController and Api::v2::AppController have ever-so-slightly different error formats and your client is not aware of that, then a v1 or v2 error might pop up when the client is only coded to support v2 or v1. I've seen this cause a JavaScript error in production that broke the app.

That whole mess of nonsense can be avoided by not versioning your API. As daft as this might initially sound, if you can avoid changing a contract, that means less work for clients. The server can handle converting data from one format to another, or a new representation can eventually be created to replace the old representation. Replacing representations and fields carefully over time as things change is called evolution .

At a previous company offering crowd-sourced carpooling, we switched from /matches to /riders, and internally those two representations shared a lot of code. "Matches" was deprecated, and clients started using the new "riders" concept. Over a few months, the internals changed to a much cleaner solution for riders. We eventually dropped the matches endpoint/serializers/logic entirely, without the riders contract changing. That helped us move fast on the code, but keep our contracts relevant for our clients until they didn't need them anymore.

That is not always possible in a rapid-application environment where things change drastically all the time. Startups love their "agile philosophies”, pivots, etc., which can throw evolution out the window. As somebody who has worked for those companies, we would simply make a new API (copy, paste, tweak) and throw the old one out once a new iPhone app was launched and usage dropped below acceptable levels. There was no interest in "keeping the service alive for decades”, which is a core tenant of REST, but that’s ok, we were RESTish.

As pointed out in my GraphQL vs REST: Overview , easily being able to track field usage by clients is something that gives GraphQL an edge on most endpoint-based APIs. If a client wants to use field "foo” and you want to remove it, you know that their app will break.

Earlier in the article, we looked at sparse fieldsets, which can help here. If the endpoint-based API offers sparse fieldsets as an option, and clients use them, there will be trackable insight into which clients are using specific fields just like GraphQL. One downside here is that only clients requesting ?fields= will be detectable, unless the API goes a step further and requires the use of ?fields=!

I don’t know if I would do it, but it’s an option.

Query Language

GraphQL is primarily a query language, but if you’d like an endpoint-based API to have a query language, you can give it one.

OData has a strong slogan: the best way to REST. That’s a big claim, but OData provides a lot of stuff that people seem to like about GraphQL. Not only does it offer machine-readable metadata similar to JSON Schema, and a powerful explorer similar to GraphiQL , but it provides a syntax and tooling to allow you to use a query language:

GET /Airports?$filter=contains(Location/Address, 'San Francisco')

Or even:

GET serviceRoot/People?$filter=Emails/any(s:endswith(s, 'contoso.com'))

OData has some other boring grown-up benefits, like allowing your endpoint-based API to be the source of truth for Salesforce, utilizing External Objects. Basically, instead of making janky sync logic between your applications and theirs, a custom object can be created to let the data live only in the one OData API, but still look like it’s sitting in Salesforce. Guess what I’m working on at the moment.

Data Inclusion / Compound Documents

JSON-API suggests an ability to include related data from multiple resources in a single HTTP request. That is a big pro for some, as it reduces the number of HTTP requests, which under the right conditions can often speed things up for the client. It could also slow it down a bunch, but usually, it’s a helper.

These includes are very similar to the nested field queries possible in GraphQL. When using JSON-API not only can you fetch related resource representations, but you can trim those related representations down using sparse fieldsets too:

GET /articles?include=author&fields[articles]=title,body&fields[people]=name

That will get you a list of articles with only the title and body fields, then the authors will be included with only their name.

Endpoint-based APIs offering compound documents handle it in a myriad of ways, but again the JSON-API approach is a common one. The JSON-API approach makes some clients sad because data is "side-loaded”, which basically means included resources are jammed into a single array.

Excuse the large JSON blob, but it’s important to understand the concept:

{
  "data": [
    {
      "type": "articles",
      "id": "1",
      "attributes": {
        "title": "JSON API paints my bikeshed!"
      },
      "links": {
        "self": "http://example.com/articles/1"
      },
      "relationships": {
        "author": {
          "links": {
            "self": "http://example.com/articles/1/relationships/author",
            "related": "[http://example.com/articles/1/author](http://example.com/articles/1/author)"
          },
          "data": {
            "type": "people",
            "id": "9"
          }
        },
        "comments": {
          "links": {
            "self": "http://example.com/articles/1/relationships/comments",
            "related": "http://example.com/articles/1/comments"
          },
          "data": [
            {
              "type": "comments",
              "id": "5"
            },
            {
              "type": "comments",
              "id": "12"
            }
          ]
        }
      }
    }
  ],
  "included": [
    {
      "type": "people",
      "id": "9",
      "attributes": {
        "first-name": "Dan",
        "last-name": "Gebhardt",
        "twitter": "dgeb"
      },
      "links": {
        "self": "http://example.com/people/9"
      }
    },
    {
      "type": "comments",
      "id": "5",
      "attributes": {
        "body": "First!"
      },
      "relationships": {
        "author": {
          "data": {
            "type": "people",
            "id": "2"
          }
        }
      },
      "links": {
        "self": "http://example.com/comments/5"
      }
    },
    {
      "type": "comments",
      "id": "12",
      "attributes": {
        "body": "I like XML better"
      },
      "relationships": {
        "author": {
          "data": {
            "type": "people",
            "id": "9"
          }
        }
      },
      "links": {
        "self": "http://example.com/comments/12"
      }
    }
  ]
}

That article "1” has two relationship types, author and comment. These relationships might have one name, and the actual resource type could be another, so author is the relationship name but people is the data type. Cool.

So, if they have "data": { "type": "people", "id": "9" } and { "type": "comments", "id": "5" }, { "type": "comments", "id": "12" }, that means if ?include=author,comments is in the query string, it will expand those relationships, "including" the data in the "included" section of the JSON body.

These included items are all just shoved into a single array, with no hierarchy or concept of how they relate to the article. Anyone who calls this API will not just be able to call the data:

response = client.get('/articles?include=author,comments') 
article = response.body 
comments = article.comments 
author = article.author

Instead, there needs to be some logic that pulled out all the comments and all the authors, then in a loop you could stitch them back together. Writing this is awful, but of course loads of people have built generic abstraction layers in various languages to allow you not to have to.

I used to really hate this, but respect the fact that it reduces data going over the wire, by de-duplicating the expanded forms of the same resource. E.g.: Instead of returning the same author details multiple times, it’s just there once.

Side-loading is a common convention for many endpoint-based APIs, but it has nothing to do with REST. That is simply how JSON-API happens to do things and is one common standard amongst a few. An endpoint-based API can easily nest data in a more relational way just like GraphQL. I built a tool to do this in PHP years back called Fractal (before going over to the JSON-API-side).

I don’t think people need to spend a huge amount of time trying to make their endpoint-based APIs do all this stuff just to be cool, but I do think there are concepts in GraphQL that resonate with people who might not know they’re already available to them. As impressed as I am with GraphQL, it is not always the shiny magical hammer-for-everything some people think it is, and making informed decisions is important.

Knowing about these tools and concepts can help people push back against an overzealous switch to a whole new system, because the folks advocating it were unaware that a lot of the concepts used in GraphQL are not entirely new or unique to GraphQL itself.

That said, I do of course think having all of these concepts implemented and documented in one single "package” like GraphQL is super handy. GraphQL removes the arguing or confusion about what is "the most RESTful way to do something”, as it has a spec and example implementations.

A lot of APIs that are merely RESTish could certainly be GraphQL, but an actual REST API with HATEOAS would not make sense trying to jam itself into the GraphQL paradigm. State Transfer and Query Languages are different things, but as I’ve shown, you can blur the lines a little if you’re interested.