TRUE Single Sourcing with YAML and AsciiDoc

Stephen Colbert from The Colbert Report inventing 'truthiness'.

Documentation and application should derive all key data from a true single source of truth (TSST) defined once and conveyed across all product and documentation builds.

This approach conveys more advantages than you might think at first gloss, including testability and cooperative design through Git-tracked specification and definition.

My method is to use AsciiDoc README.adoc and YAML files to assign all kinds of key product data, including structured data for interface definition and reference documentation.

This post is a doozie, so let’s TOC it out.

Table of Contents

The Example of OpenAPI
Why Stop There?
- About Those Other Interface Types
- The Advantages of YAML-based TSST
Single Sourcing in `README.adoc`
Generating Docs with Templating Engines
Truth and Purism

The Example of OpenAPI

There are few universally known technologies across all of programming and software technical writing, but one of them is OpenAPI Specification (OAS), a standardized data format used to “describe” or “define” server application interfaces that honor the standard RESTful HTTP architecture and protocol.

That is, one “language” can be used to detail just about everything anyone would need to know about how a given REST API is supposed to work, at least in terms of what endpoints do what with a given set of data and a given method (POST, GET, PUT, DELETE).

OAS is a great example of a “true single source of truth” (TSST) when used to define the API itself as well as the documentation downstream developers use to make informed connections to the API.

Defining relatively complex interfaces in YAML can apply to much more than just REST APIs.

If you’re having trouble recalling just what OpenAPI code looks like, here’s a simple example:

OpenAPI Example

openapi: 3.1.0
info:
  title: Sample API
  description: A simple API to illustrate OpenAPI concepts
  version: '2'
servers:
  - url: https://api.example.com/v1
paths:
  /items:
    get:
      summary: Retrieve a list of items
      responses:
        '200':
          description: A JSON array of items
          content:
            application/json:
              schema:
                type: array
                items:
                  type: object
                  properties:
                    id:
                      type: integer
                    name:
                      type: string
                    score:
                      type: integer
    post:
      summary: Create a new item
      requestBody:
        required: true
        content:
          application/json:
            schema:
              type: object
              properties:
                name:
                  type: string
                score:
                  type: integer

This code defines two operations on one endpoint of a hypothetical API, but it should serve to illustrate.

At its best, OpenAPI is a YAML data format that can be used to generate documentation, client libraries, server stubs, and more. At the very least, it can be used as an authoritative reference for API development and testing.

In fact, non-developers, such as technical writers and product managers, can use or even contribute to an OpenAPI document (OAD). The RESTful interface architecture is relatively simple, and anyone who comes to understand it can help design and define such an API using OAS. There is no reason at all to leave this to developers, though they may have critical feedback during the planning or implementation stages.

Why Stop There?

Defining relatively complex interfaces in YAML can apply to much more than just REST APIs.

Full-stack application development involves much more “coding” than most would consider actual “programming”. Consider interface design and database design, for instance. Both are best done in code, but neither is really programming, per se.

Now, I have admittedly never seen a non-developer design a database schema, but I can readily imagine savvy technical writers and product managers creating sensible YAML documents to convey structured data when a relational database would be overkill. And I certainly have seen non-developers contribute to REST API design via OAS.

So that non-programming coding category certainly includes editing YAML files, and I would imagine it even includes templating languages such as Jinja, Liquid, and Handlebars. These can involve data processing and logic, but they are relatively simple and purpose-built for text transformation.

Liquid was specifically designed for non-developers, and generative-AI coding tools are adept at writing templates in nearly all popular syntaxes. This means those savvy non-programmers can help author data files and help turn them into good, auto-generated reference documentation.

This non-programmer involvement is but one of the key advantages of stepping away from “native” (Python, Java, Rust, Javascript, Ruby, Golang, etc) programming code and into a truly cross-language, human-writeable data format like YAML.

While YAML is second only to JSON and XML in terms of current popularity, it is wildly more user- and Git-friendly than the leading formats. Meanwhile, YAML-like formats such as TOML, CSON, HJSON are lesser known alternatives.

About Those Other Interface Types

It turns out YAML is a terrific format for all kinds of interface definition coding.

It can be used for defining YAML/JSON configuration files. It can be used to define command-line interfaces (CLIs), HTML forms, file/directory structures, and much more, always allowing for extensive auxiliary metadata for each element so defined, whether it be a REST API endpoint or a form input field.

Here is an example of how I use YAML to define YAML-formatted configuration files for my Ruby applications:

properties:
  log_level:
    type: String
    desc: The logging level for the application.
    dflt: info
    opts: [debug, info, warn, error, fatal]
  output_format:
    type: String
    desc: The format for output data.
    opts: [json, yaml, xml]
    dflt: json
  max_retries:
    type: Integer
    desc: The maximum number of retry attempts for failed operations.
    span: '0..5'
    dflt: 3

This definition supports some automated validation, and it allows me to generate documentation directly from this very source.

`log_level`::
The logging level for the application.

[horizontal]
Default::: `info`
Options::: `debug`, `info`, `warn`, `error`, `fatal`

`output_format`::
The format for output data.

[horizontal]
Default::: `json`
Options::: `json`, `yaml`, `xml`

`max_retries`::
The maximum number of retry attempts for failed operations.

[horizontal]
Default::: `3`
Range::: `0-5`

The Advantages of YAML-based TSST

So what are the key advantages of using YAML-based TSST?

Enables truly cooperative design and definition. Non-programmers can contribute to the design and definition of interfaces, data structures, and more.
Sources documentation right where the interface is defined. Developers are used to this for REST and native APIs, though the latter is usually (sensibly) handled in the language’s official or dominant “inline” format. For less language-specific interfaces, YAML is a great way to define the interface and its documentation in one place.
Informs automated testing. Integration tests can ingest YAML definition data to test against a single data source maintained by all stakeholders.

For these reasons, YAML is my go-to source format for defining nearly all interfaces, as I will explore in this blog in future posts.

Single Sourcing in `README.adoc`

The other place I love to define global application data is in the root README.adoc file of the project. Only data that appears in the README is optimally stored here, since YAML is a more flexible and precise data-serialization format.

But user-defined AsciiDoc attributes are a great way to ensure that ALL documentation and even the product itself are deriving data from the same single source.

For example, all of my Ruby APIs and CLIs derive their canonical version number from an attribute in the README.adoc file. It’s called this_prod_vrsn, and I can express it anywhere in the documentation as {this_prod_vrsn}, as well as ingest it into the product at build time.

require 'asciidoctor'
doc = Asciidoctor.load_file('README.adoc', safe: :safe)
ATTRS = doc.attributes
VERSION = ATTRS['this_prod_vrsn']

AsciiDoc attributes unfortunately do not support nested data structures or even Arrays, but they are sufficient for core data such as default values, general product data, and anything else you might wish to report in your README itself as well as throughout the product and user documentation.

One big advantage of AsciiDoc attributes is that they are inheritable like native variables.

:product_base_url: https://example.org
:product_api_url: {product_base_url}/api

Generating Docs with Templating Engines

You may have been wondering how our YAML data turned into AsciiDoc source code in the previous examples.

The trick is a templating processor, such as those that parse syntaxes like Liquid and render textual output from the input data provided.

{% for property in properties %}
property[0]::
{{ property[1].desc }}
[horizontal]
Default::: `{{ property[1].dflt }}`
{% if property[1].opts %}
Options::: {% for opt in property[1].opts %}`{{ opt }}`{% unless forloop.last %}, {% endunless %}{% endfor %}
{% endif %}
{% if property[1].span %}
Range::: `{{ property[1].span | replace:".." , "-" }}`
{% endif %}
{% endfor %}

That is all the markup that is required to generate the AsciiDoc source code shown earlier, which I’ll repeat here again for convenience.

`log_level`::
The logging level for the application.

[horizontal]
Default::: `info`
Options::: `debug`, `info`, `warn`, `error`, `fatal`

`output_format`::
The format for output data.

[horizontal]
Default::: `json`
Options::: `json`, `yaml`, `xml`

`max_retries`::
The maximum number of retry attempts for failed operations.

[horizontal]
Default::: `3`
Range::: `0-5`

Truth and Purism

It’s a funny coincidence that “TSST” reads and sounds like a scolding. The very concept is strict and somewhat cold, I have to admit.

It is unwise to be a “purist” about nearly anything, especially in software development, so of course there may be exceptions where an instance of product datum has to be defined twice.

But it is a great principle to aim for, as it offers true benefits along the design → definition → documentation → validation pipeline.