Inspired by reading Lua - an extensible extension language, specifically the section on using fallbacks to implement an expression parser, I wanted to see if I could do so in a safe manner. With safe meaning parsed expressions must not be able to execute arbitrary code.
Half an hour later, I have to say it is surprisingly easy! Less than 50 lines (including comments) is all it takes:
function parse(expression)
local op, create
-- Calculate each subexpression at most once, otherwise just return the cached
-- result. Note that this is not clever enough to take advantage of
-- commutativity, i.e. a*a*b and b*a*a are not optimized to the same temporary.
local results = {}
local n = 0
function op(a, b, o)
local i = tostring(a) .. o .. tostring(b)
if results[i] == nil then
n = n + 1
results[i] = create("t" .. n)
print(results[i].name .. " = " .. i)
end
return results[i]
end
-- Intercept arithmetic operations.
local ops = {
__add = function(a, b) return op(a, b, "+") end,
__sub = function(a, b) return op(a, b, "-")end,
__mul = function(a, b) return op(a, b, "*") end,
__div = function(a, b) return op(a, b, "/") end,
__tostring = function(self) return self.name end,
}
-- Create a new table representing a user defined symbol from the expression
-- or a temporary variable.
function create(n)
v = {name = n}
setmetatable(v, ops)
return v
end
-- Create an empty environment so the parsed expression can't call 'bad'
-- functions.
local env = {}
setmetatable(env, {
-- Catch all symbol lookups and return tables instead so arithmetic
-- operations can be intercepted.
__index = function(_, key) return create(key) end,
})
local f = load("return " .. expression, "=(load)", "t", env)
print("return " .. tostring(f()))
end
The main trick here are Lua's metatables, they allow us to provide default behavior for variable and method lookups when no value is found. By simple defining an empty environment and returning empty tables for the variables in the expression we can intercept everything, and from there on it's pretty easy. As an example,
parse("(a*a+b*b)*(a*a-b*b)*42/(a*a+b*b+c)+(a*(b*b)*c)")
prints
t1 = a*a
t2 = b*b
t3 = t1+t2
t4 = t1-t2
t5 = t3*t4
t6 = t5*42
t7 = t3+c
t8 = t6/t7
t9 = a*t2
t10 = t9*c
t11 = t8+t10
return t11
Note how e.g. the multiplications in the first two subexpressions are only executed once and their results are then reused.
I mainly looked into Lua because torch7, the machine learning framework, uses it, but so far I'm quite impressed by the language. Even if it doesn't have static typing and Monads.
Tags: programming, lua