Writing a DSL in Lua
DSLs, or domain specific languages, are programming languages that are designed to implement a set of features specific to a particular problem or field. An example could be Make, the build tool, which is a specially designed language for combining commands and files while managing dependencies.
A lot of modern programming languages have so much flexibility in their syntax that it’s possible to build libraries that expose their own mini-languages within the host language. The definition of DSL has broadened to include these kinds of libraries.
In this guide we'll build a DSL for generating HTML. It looks like this:
html {
body {
h1 "Welcome to my Lua site",
a {
href = "http://leafo.net",
"Go home"
}
}
}
Before jumping in, here are some DSL building techniques:
Dropping the parenthesis
One of the cases for Lua as described in its initial public release (1996) is that it makes a good configuration language. That’s still true to this day, and Lua is friendly to building DSLs.
A unique part about Lua’s syntax is parenthesis are optional in some scenarios when calling functions. Terseness is important when building a DSL, and removing superfluous characters is a good way to do that.
When calling a function that has a single argument of either a table literal or a string literal, the parenthesis are optional.
print "hello" --> print("hello")
my_function { 1,2,3 } --> my_function({1,2,3})
-- whitespace isn't needed, these also work:
print"hello" --> print("hello")
my_function{ 1,2,3 } --> my_function({1,2,3})
This syntax has very high precedence, the same as if you were using parenthesis:
tonumber "1234" + 5 -- > tonumber("1234") + 5
Chaining
Parenthesis-less invocation can be chained as long as each expression from the left evaluates to a function (or a callable table). Here’s some example syntax for a hypothetical web routing framework:
match "/post-comment" {
GET = function ()
-- render the form
end,
POST = function ()
-- save to database
end
}
If it’s not immediately obvious what’s going on, writing the parenthesis in will clear things up. The precedence of the parenthesis-less invocation goes from left to right, so the above is equivalent to:
match("/post-comment")({ ... })
The pattern we would use to implement this syntax would look something like this:
local function match(path)
print("match:", path)
return function(params)
print("params:", params)
-- both path and params are now availble for use here
end
end
Using a recursive function constructor it’s possible to make chaining work for any length.
Using function environments
When interacting with a Lua module you regularly have to bring any functions or
values into scope using require
. When working with a DSL, it’s nice to have
all the functionality available without having to manually load anything.
One option would be to make all the functions and values global variables, but it’s not recommended as it might interfere with other libraries.
A function environment can be used to change how a function resolves global variable references within its scope. This can be used to automatically expose a DSL’s functionality without polluting the regular global scope.
For the sake of this guide I'll assume that setfenv
exists in the version of
Lua we're using. If you're using 5.2 or above you'll need to provide you own
implementation: Implementing setfenv in Lua 5.2, 5.3, and above
Here’s a function run_with_env
that runs another function with a particular
environment.
local function run_with_env(env, fn, ...)
setfenv(fn, env)
fn(...)
end
The environment passed will represent the DSL:
local dsl_env = {
move = function(x,y)
print("I moved to", x, y)
end,
speak = function(message)
print("I said", message)
end
}
run_with_env(dsl_env, function()
move(10, 10)
speak("I am hungry!")
end)
In this trivial example the benefits might not be obvious, but typically your
DSL would be implemented in another module, and each place you invoke it is not
necessary to bring each function into scope manually, but rather activate the
whole sscope with run_with_env
.
Function environments also let you dynamically generate methods on the fly.
Using the __index
metamethod implemented as a function, any value can be
programmatically created. This is how the HTML builder DSL will be created.
Implementing the HTML builder
Our goal is to make the following syntax work:
html {
body {
h1 "Welcome to my Lua site",
a {
href = "http://leafo.net",
"Go home"
}
}
}
Each HTML tag is represented by a Lua function that will return the HTML string representing that tag with the correct attribute and content if necessary.
Although it would be possible to write code to generate all the HTML tag
builder functions ahead of time, a function __index
metamethod will be used
to generate them on the fly.
In order to run code in the context of our DSL, it must be packaged into a
function. The render_html
function will take that function and convert it to
a HTML string:
render_html(function()
return div {
img { src = "http://leafo.net/hi" }
}
end) -- > <div><img src="http://leafo.net/hi" /></div>
The
img
tag is self-closing, it has no separate close tag. HTML calls these “void elements”. These will be treated differently in the implementation.
render_html
might be implemented like this:
local function render_html(fn)
setfenv(fn, setmetatable({}, {
__index = function(self, tag_name)
return function(opts)
return build_tag(tag_name, opts)
end
end
}))
return fn()
end
The build_tag
function is where all actual work is done. It takes the name of
the tag, and the attributes and content as a single table.
This function could be optimized by caching the generated functions in the environment table.
The void elements, as mentioned above, are defined as a simple set:
local void_tags = {
img = true,
-- etc...
}
The most efficient way to concatenate strings in regular Lua is to accumulate
them into a table then call table.concat
. Many calls to table.insert
could be used to append to this buffer table, but I prefer the following
function to allow multiple values to be appended at once:
local function append_all(buffer, ...)
for i=1,select("#", ...) do
table.insert(buffer, (select(i, ...)))
end
end
-- example:
-- local buffer = {}
-- append_all(buffer, "a", "b", c)
-- buffer now is {"a", "b", "c"}
append_all
uses Lua’s built in functionselect
to avoid any extra allocations by querying the varargs object instead of creating a new table.
Now the implementation of build_tag
:
local function build_tag(tag_name, opts)
local buffer = {"<", tag_name}
if type(opts) == "table" then
for k,v in pairs(opts) do
if type(k) ~= "number" then
append_all(buffer, " ", k, '="', v, '"')
end
end
end
if void_tags[tag_name] then
append_all(buffer, " />")
else
append_all(buffer, ">")
if type(opts) == "table" then
append_all(buffer, unpack(opts))
else
append_all(buffer, opts)
end
append_all(buffer, "</", tag_name, ">")
end
return table.concat(buffer)
end
There are a couple interesting things here:
The opts
argument can either be a string literal or a table. When it’s a
table it takes advantage of the fact that Lua tables are both hash tables and
arrays at the same time. The hash table portion holds the attributes of the
HTML element, and the array portion holds the contents of the element.
Checking if the key in a pairs
iteration is numeric is a quick way to
approximate isolating array like elements. It’s not perfect, but will work for
this case.
for k,v in pairs(opts) do
if type(k) ~= "number" then
-- access hash table key and values
end
end
When the content of the tag is inserted into the buffer for the table based
opts
, the following line is used:
append_all(buffer, unpack(opts))
Lua’s built in function unpack
converts the array values in a table to
varargs. This fits perfectly into the append_all
function defined above.
unpack
istable.unpack
in Lua 5.2 and above.
Closing
This simple implementation of an HTML builder that should give you a good introduction to building your own DSLs in Lua.
The HTML builder provided performs no HTML escaping. It’s not suitable for rendering untrusted input. If you're looking for a way to enhance the builder then try adding html escaping. For example:
local unsafe_text = [[<script type="text/javascript">alert('hacked!')</script>]]
render_html(function()
return div(unsafe_text)
end)
-- should not return a functional script tag:
-- <div><script type="text/javascript">alert('hacked!')</script></div>