Writing a DSL in Lua

DSLs, or domain specific languages, are programming languages that are designed to implement a set of features specific to a particular problem or field. An example could be Make, the build tool, which is a specially designed language for combining commands and files while managing dependencies.

A lot of modern programming languages have so much flexibility in their syntax that it’s possible to build libraries that expose their own mini-languages within the host language. The definition of DSL has broadened to include these kinds of libraries.

In this guide we'll build a DSL for generating HTML. It looks like this:

html {
  body {
    h1 "Welcome to my Lua site",
    a {
      href = "http://leafo.net",
      "Go home"
    }
  }
}

Before jumping in, here are some DSL building techniques:

Dropping the parenthesis

One of the cases for Lua as described in its initial public release (1996) is that it makes a good configuration language. That’s still true to this day, and Lua is friendly to building DSLs.

A unique part about Lua’s syntax is parenthesis are optional in some scenarios when calling functions. Terseness is important when building a DSL, and removing superfluous characters is a good way to do that.

When calling a function that has a single argument of either a table literal or a string literal, the parenthesis are optional.

print "hello" --> print("hello")
my_function { 1,2,3 } --> my_function({1,2,3})

-- whitespace isn't needed, these also work:

print"hello" --> print("hello")
my_function{ 1,2,3 } --> my_function({1,2,3})

This syntax has very high precedence, the same as if you were using parenthesis:

tonumber "1234" + 5 -- > tonumber("1234") + 5

Chaining

Parenthesis-less invocation can be chained as long as each expression from the left evaluates to a function (or a callable table). Here’s some example syntax for a hypothetical web routing framework:

match "/post-comment" {
  GET = function ()
    -- render the form
  end,

  POST = function ()
    -- save to database
  end
}

If it’s not immediately obvious what’s going on, writing the parenthesis in will clear things up. The precedence of the parenthesis-less invocation goes from left to right, so the above is equivalent to:

match("/post-comment")({ ... })

The pattern we would use to implement this syntax would look something like this:

local function match(path)
  print("match:", path)

  return function(params)
    print("params:", params)
    -- both path and params are now availble for use here
  end
end

Using a recursive function constructor it’s possible to make chaining work for any length.

Using function environments

When interacting with a Lua module you regularly have to bring any functions or values into scope using require. When working with a DSL, it’s nice to have all the functionality available without having to manually load anything.

One option would be to make all the functions and values global variables, but it’s not recommended as it might interfere with other libraries.

A function environment can be used to change how a function resolves global variable references within its scope. This can be used to automatically expose a DSL’s functionality without polluting the regular global scope.

For the sake of this guide I'll assume that setfenv exists in the version of Lua we're using. If you're using 5.2 or above you'll need to provide you own implementation: Implementing setfenv in Lua 5.2, 5.3, and above

Here’s a function run_with_env that runs another function with a particular environment.

local function run_with_env(env, fn, ...)
  setfenv(fn, env)
  fn(...)
end

The environment passed will represent the DSL:

local dsl_env = {
  move = function(x,y)
    print("I moved to", x, y)
  end,

  speak = function(message)
    print("I said", message)
  end
}

run_with_env(dsl_env, function()
  move(10, 10)
  speak("I am hungry!")
end)

In this trivial example the benefits might not be obvious, but typically your DSL would be implemented in another module, and each place you invoke it is not necessary to bring each function into scope manually, but rather activate the whole sscope with run_with_env.

Function environments also let you dynamically generate methods on the fly. Using the __index metamethod implemented as a function, any value can be programmatically created. This is how the HTML builder DSL will be created.

Implementing the HTML builder

Our goal is to make the following syntax work:

html {
  body {
    h1 "Welcome to my Lua site",
    a {
      href = "http://leafo.net",
      "Go home"
    }
  }
}

Each HTML tag is represented by a Lua function that will return the HTML string representing that tag with the correct attribute and content if necessary.

Although it would be possible to write code to generate all the HTML tag builder functions ahead of time, a function __index metamethod will be used to generate them on the fly.

In order to run code in the context of our DSL, it must be packaged into a function. The render_html function will take that function and convert it to a HTML string:

render_html(function()
  return div {
    img { src = "http://leafo.net/hi" }
  }
end) -- > <div><img src="http://leafo.net/hi" /></div>

The img tag is self-closing, it has no separate close tag. HTML calls these “void elements”. These will be treated differently in the implementation.

render_html might be implemented like this:

local function render_html(fn)
  setfenv(fn, setmetatable({}, {
    __index = function(self, tag_name)
      return function(opts)
        return build_tag(tag_name, opts)
      end
    end
  }))

  return fn()
end

The build_tag function is where all actual work is done. It takes the name of the tag, and the attributes and content as a single table.

This function could be optimized by caching the generated functions in the environment table.

The void elements, as mentioned above, are defined as a simple set:

local void_tags = {
  img = true,
  -- etc...
}

The most efficient way to concatenate strings in regular Lua is to accumulate them into a table then call table.concat. Many calls to table.insert could be used to append to this buffer table, but I prefer the following function to allow multiple values to be appended at once:

local function append_all(buffer, ...)
  for i=1,select("#", ...) do
    table.insert(buffer, (select(i, ...)))
  end
end

-- example:
--   local buffer = {}
--   append_all(buffer, "a", "b", c)
-- buffer now is {"a", "b", "c"}

append_all uses Lua’s built in function select to avoid any extra allocations by querying the varargs object instead of creating a new table.

Now the implementation of build_tag:

local function build_tag(tag_name, opts)
  local buffer = {"<", tag_name}
  if type(opts) == "table" then
    for k,v in pairs(opts) do
      if type(k) ~= "number" then
        append_all(buffer, " ", k, '="', v, '"')
      end
    end
  end

  if void_tags[tag_name] then
    append_all(buffer, " />")
  else
    append_all(buffer, ">")
    if type(opts) == "table" then
      append_all(buffer, unpack(opts))
    else
      append_all(buffer, opts)
    end
    append_all(buffer, "</", tag_name, ">")
  end

  return table.concat(buffer)
end

There are a couple interesting things here:

The opts argument can either be a string literal or a table. When it’s a table it takes advantage of the fact that Lua tables are both hash tables and arrays at the same time. The hash table portion holds the attributes of the HTML element, and the array portion holds the contents of the element.

Checking if the key in a pairs iteration is numeric is a quick way to approximate isolating array like elements. It’s not perfect, but will work for this case.

for k,v in pairs(opts) do
  if type(k) ~= "number" then
    -- access hash table key and values
  end
end

When the content of the tag is inserted into the buffer for the table based opts, the following line is used:

append_all(buffer, unpack(opts))

Lua’s built in function unpack converts the array values in a table to varargs. This fits perfectly into the append_all function defined above.

unpack is table.unpack in Lua 5.2 and above.

Closing

This simple implementation of an HTML builder that should give you a good introduction to building your own DSLs in Lua.

The HTML builder provided performs no HTML escaping. It’s not suitable for rendering untrusted input. If you're looking for a way to enhance the builder then try adding html escaping. For example:

local unsafe_text = [[<script type="text/javascript">alert('hacked!')</script>]]

render_html(function()
  return div(unsafe_text)
end)

-- should not return a functional script tag:
-- <div>&lt;script type=&quot;text/javascript&quot;&gt;alert('hacked!')&lt;/script&gt;</div>