Popular OpenAPI Bundling Tools Compared

Popular OpenAPI Bundling Tools Compared

When OpenAPI or JSON Schema documents get massive or repetitive, the contents can be split across multiple documents (on the filesystem, URLs, in memory somewhere) and joined together $ref. These split up API descriptions can then be joined back together as one document, with $ref pointing to an internal location instead of an external location. This is called "bundling".

Swagger CLI, JSON Schema Ref Parser, and Redocly CLI

Bundling has been struggling lately, with the most popular tools mostly being abandoned or taken in-house. I was begging for help with the popular JavaScript json-schema-ref-resolver for years, and was entirely ignored - despite it getting 2,395,495 downloads a week. Thankfully JonLuca De Caro has stepped in to take over, and that project is now under some amount of maintenance again.

The json-schema-ref-parser package is a low-level JavaScript/TypeScript library for bundling $ref and you nee to be writing JavaScript to use it as there is no CLI. Another popular package for that is swagger-cli, which wraps the bundle logic with a CLI layer so you don't need to write any code.

I was also responsible for maintaining this tool, but thankfully I was able to deprecate it after reviewing Redocly CLI and discovering it did a far better job of everything it set out to do. The approach it takes to bundling lined up far more closely with what most people seem to actually want.

Bundling Output Compared

Bundling can mean anything. Grab all the stuff and squish it into a file somehow. This behaviour being undefined leads to everyone being upset when their unique expectations are not matched, so let's have a look at what these tools output and see which you prefer.

openapi: 3.0.0
info:
  title: My API
  version: 1.0.0

paths:
  /things:
    get:
      responses: 
        '200':
          description: 'OK'
          content:
            application/json:
              schema:
                properties:
                  data:
                    type: array
                    items:
                      $ref: './schemas/thing.yaml'
          
  /things/{id}:
    get:
      parameters: 
        - name: id 
          in: path
          required: true
          schema: 
            type: string
            format: uuid
      responses: 
        '200':
          description: 'OK'
          content:
            application/json:
              schema:
                $ref: './schemas/thing.yaml'       

The json-schema-ref-parser library would bundle by pulling the content in from external references, replacing one of the $ref's with a subschema containing that, then point other matching references to the same thing to that location, wherever it happened to be.

Swagger CLI - Bundling

$ swagger-cli bundle -t yaml openapi.yaml
openapi: 3.0.0
info:
  title: My API
  version: 1.0.0
paths:
  /things:
    get:
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                properties:
                  data:
                    type: array
                    items:
                      $ref: '#/paths/~1things~1%7Bid%7D/get/responses/200/content/application~1json/schema'
  '/things/{id}':
    get:
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                type: object
                properties:
                  id:
                    type: string
                    format: uuid
                  name:
                    type: string
                  type:
                    type: string
                    enum:
                      - type1
                      - type2

This functionality was constantly reported as a bug despite working as intended. Having a $ref of '#/paths/~1things~1%7Bid%7D/get/responses/200/content/application~1json/schema' is generally fine for computers, it's perfectly valid, but if this is something humans are meant to look at it can be rather confusing. If tools are using that $ref to build a name, or you want your API Reference Documentation to have a "Components" or "Models" section, then it's not going to work.

Redocly CLI - Bundling

Let's compare how Redocly CLI handles the same API description document.

$ redocly bundle openapi.yaml
openapi: 3.0.0
info:
  title: My API
  version: 1.0.0
paths:
  /things:
    get:
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                properties:
                  data:
                    type: array
                    items:
                      $ref: '#/components/schemas/thing'
  /things/{id}:
    get:
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/thing'
components:
  schemas:
    thing:
      type: object
      properties:
        id:
          type: string
          format: uuid
        name:
          type: string
        type:
          type: string
          enum:
            - type1
            - type2

Lovely! It's created a components section, added a shared schema, given it a name (inferred from the file but without the .yaml), and pointed all instances of that same subschema to the shared schema. chef kiss.

Ye Oldé Dereferencing

Both tools also support "dereferencing", which is the rough alternative to bundling, where all $ref's are replaced with what they point to, regardless of whether they were pointing to external or internal locations. This makes a much much larger output, because its super repetative, and should only be used if you absolutely need to use some old busted tool which doesn't understand what a $ref is at all.

Swagger CLI - Dereferencing

openapi: 3.0.0
info:
  title: My API
  version: 1.0.0
paths:
  /things:
    get:
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                properties:
                  data:
                    type: array
                    items:
                      type: object
                      properties:
                        id:
                          type: string
                          format: uuid
                        name:
                          type: string
                        type:
                          type: string
                          enum:
                            - type1
                            - type2
  '/things/{id}':
    get:
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                type: object
                properties:
                  id:
                    type: string
                    format: uuid
                  name:
                    type: string
                  type:
                    type: string
                    enum:
                      - type1
                      - type2

Ooof yeah you can see why that can be problematic for large documents. It's literally going to repeat stuff every time. That doesn't just lead to large file sizes, but its rife for tripping over circular references that definitively cannot be represented this way.

Redocly CLI - Dereferencing

I was expecting more of the same for Redocly CLI but was pleasantly surprised to see they've got a trick to make this better.

$ redocly bundle -d openapi.yaml
openapi: 3.0.0
info:
  title: My API
  version: 1.0.0
paths:
  /things:
    get:
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                properties:
                  data:
                    type: array
                    items:
                      type: object
                      properties: &ref_0
                        id:
                          type: string
                          format: uuid
                        name:
                          type: string
                        type:
                          type: string
                          enum:
                            - type1
                            - type2
  /things/{id}:
    get:
      parameters:
        - name: id
          in: path
          required: true
          schema:
            type: string
            format: uuid
      responses:
        '200':
          description: OK
          content:
            application/json:
              schema:
                type: object
                properties: *ref_0
components:
  schemas:
    thing:
      type: object
      properties: *ref_0

HA! Using YAML anchors and aliases they've made this smaller, but again this is not nice for humans and circular references will probably just fail at a later point instead of breaking the dereference. Regardless, this is better for the majority of cases where you'd want to use $ref, so long as you're using YAML and the tool you're punting it into understands anchors/aliases.

Other Tools

Redocly CLI is, as the name suggests, CLI only, but the same logic is available in the openapi-core NPM package. It's still considered somewhat internal, but the safe bits are documented, and I hear they'll be putting some more effort to stabalise the API over time.

The Redocly CLI bundling strategy is very similar to the way the Export feature works in Stoplight Studio, meaning if you're using Stoplight Platform or still have a copy of the rug-pulled Studio Desktop then you can get a pretty similar experience by clicking buttons. Unfortunately the way they've implemented that was an undocumented in-house fork of json-schema-ref-parser, so once again it's a "hunt through code and figure it out" sort of situation.

I tried to find some other tools, but the sample code for udamir/api-ref-bundler was not working and I couldn't fix it.

Python has a OpenAPI CLI Tool but that was throwing exceptions running the hello world command.

If you know of any other bundling tools that you think people should know about, try them out on that sample OpenAPI and see what the output looks like. See if you can find any awkward edge cases and post a gist. Links in the comments, or post it on our Slack community.

Summary

As I said in the recent review, Redocly CLI is really impressive, and it's approach to bundling is a god send for users, and for me alike as I can deprecate Swagger CLI with a "hey check this out" instead of an abrupt rug-pull.

Generally it's lovely to see tooling vendors stepping up and making awesome, open, free, reusable tools that are free from walled gardens. It makes sense for the people being paid to make tools to be the ones maintaining tools, and so long as they can keep the right balance between open-source and paid stuff then they should be able to keep these tools stable and growing for years to come, instead of relying on open-source-only maintainers running themselves into the ground, suffocating under a deluge of obscure edge case demands.

Let me know how you get on with Redocly CLI in the comments, and if you're going to battle on with Swagger CLI do let me know why. 😅