Minimalist interpreters have always attracted me. First of all, they are a good fit for embedded systems whenever you need a custom domain-specific language.
If one needs a scripting language for a microcontroller - there is not so many choices. Instead there are lots of constraints, like the size of the compiled binary code, RAM usage and memory management. Of course the language itself is important, too.
Lua, a very stable, pragmatic and minimal language. It is easy to learn and to extend. But the compiled code size would be around 200KB, so it barely fits even on the most powerful ST ARMs. Also, Lua uses garbage collector, which might not be good for realtime systems. Still, Lua is a perfect choice if you can afford it. Apart from the original interpreter there’s also a good implementation for microcontrollers called eLua.
JavaScript. I still can’t take it seriously and it feels more like a modern IoT hype rather than a serious alternative to Lua. There are many implementations (Espruino, duktape, v7, TinyJS,MuJS, JerryScript), but most of them are not stable enough and the minimal binary size easily grows over 200KB, RAM usage is high and garbage collection is slow. But might be good for hobbyists, though.
Forth. Well, this is totally the opposite. Very minimal and simple, one can create his own implementation over the weekends. Memory usage is close to zero and code size is a couple of kilobytes. The performance should rock, too, if you implement it properly. But the language itself quickly becomes unreadable, as well as its concatenative nature is far from what people expect to see in traditional programming languages. Absolutely worth trying, but unlikely to be practical. I’m afraid the same applies for various Lisps.
Another once-so-popular scripting language is Tcl, and let’s have a closer look at it.
Tcl
An extendable language, born out of frustration with having to build custom command languages again and again, is in fact very simple.
A script is a sequence of commands, like “turn led on”, here “turn” is the command name, “led” and “on” are just words passed as parameters to the command.
There is no data types other than a string. Square brackets inside the
strings are interpolated, e.g. the expression inside the brackets is evaluated
and its result is put into the original string. For example, in puts [add 1
2]
the square brackets part gets substituted by 3
and then puts 3
is
executed.
This approach is definitely slow, but it is what makes Tcl interpreters so simple. All the language does is string substitution. The only place where no substitution happens is braces. That’s why the following is just a 5-word command, and not a special language syntax:
if {$x > 0} {
puts "Positive"
} else {
puts "Negative"
}
The command is if
, the second word is $x > 0
(which is substituted later by
the command internally), the third word is {puts Positive}
, then else
, then{puts Negative}
. If
substitutes the second word and evaluates the third or
the fifth one depending on the result. Loops, procedures etc - everything is
just a command operating with strings.
Existing interpreters
The most powerful one is tclsh
, and it probably comes with your OS. It’s too
big for microcontrollers, but it has a large standard library of commands and
has lots of documentation.
There is also lil, the Little Interpreted Language. In fact, I got a chance to use it on a MIPS controller once, it was a pleasant experience, the source code is easy to understand. I met the performance bottleneck very quickly, but I’ve rewritten some critical procedures as C commands, so it’s not a big problem. Too bad the author have removed Git repos and his site looks broken now.
Finally, there’s picol made by Antirez (the guy who created Redis and JimTcl). It’s worth reading - only 600 lines of amazingly simple code.
Partcl
Inspired by these projects, I decided to build my own Tcl interpreter over the weekend. My goals were:
- Extreme minimalism. It should fit on an MCU with 16K of NAND.
- It should be easy to extend by writing your own commands in C.
- All parts should be isolated and should be easy to replace/customize.
- Default implementation should prefer size over performance.
- Lexer should tell when the end of the command is met so that we could read user input byte by byte and execute command only when it’s fully read.
- All parts of the interpreter should be covered with tests.
It took Antirez three hours to finish his Picol, it took me 3 days to finish my project. Apparently, I’m not that smart. Still, I’m very satisfied with the results - I ran my TCL interpreter on a STM32F051 microcontroller and firmware size was only 10KB!
Although partcl
is a toy language, you might want to give it a try in your
next project, so let me explain how it works.
Lexer
Any symbol can be a part of the partcl word, except for the following special symbols:
- Whitespace symbols: space and tab, used to delimit words
- Command terminators: newline, semicolon or EOF
- Grouping or substitution: square brackets, dollar sign, quotes, braces
Partcl lexer has special helper functions for these char classes:
static int tcl_is_space(char c);
static int tcl_is_end(char c);
static int tcl_is_special(char c, int q);
tcl_is_special
behaves differently depending on the quoting mode (q
parameter) because semicolon, braces and new line lose their special meaning
and become regular printable characters inside the quoted strings.
The whole lexer is implemented in a single function:
int tcl_next(const char *s, size_t n, const char **from, const char **to, int *q);
tcl_next
function finds the next token in the string s
. from
and to
are
set to point to the token start/end, q
denotes the quoting mode and is
changed if double quote is met. It’s very efficient because it doesn’t allocate
any memory and it doesn’t mutate the original string (making it possible to
store scripts in ROM).
A special macro tcl_each(s, len, skip_error)
can be used to iterate over all
the tokens in the string. If skip_error
is true - then unexpected EOF is not
considered to be an error. This allows to validate the input string without
evaluating it and detect when a full command has been read.
A good use case is reading commands from the serial port byte by byte. You can accumulate data in a buffer without executing it until the lexer reports the end of command. Then you can execute the command and reset the buffer.
Lexer allocates no memory, it even doesn’t use any of the libc functions. It also has 100% test coverage.
Data types
Picol uses char *
as the only data type. LIL uses a special structure that
keeps numeric values separately from string values and yet another data type
for lists. The first approach is very simple, the second one is more efficient,
but takes precious memory space.
Partcl has a special tcl_value_t
type and a number of functions to work with
it. By default it’s just a char pointer, but one can rewrite ~100 lines to
optimize it for their needs (e.g. use a pool of strings, or handle lists
separately). In other words, we have an abstract type and the implementation
may vary.
Here’s a full list of operations done with Partcl values:
/* Raw string values */
tcl_value_t *tcl_alloc(const char *s, size_t len);
tcl_value_t *tcl_dup(tcl_value_t *v);
tcl_value_t *tcl_append(tcl_value_t *v, tcl_value_t *tail);
int tcl_length(tcl_value_t *v);
void tcl_free(tcl_value_t *v);
/* Helpers to access raw string or numeric value */
int tcl_int(tcl_value_t *v);
const char *tcl_string(tcl_value_t *v);
/* List values */
tcl_value_t *tcl_list_alloc();
tcl_value_t *tcl_list_append(tcl_value_t *v, tcl_value_t *tail);
tcl_value_t *tcl_list_at(tcl_value_t *v, int index);
int tcl_list_length(tcl_value_t *v);
void tcl_list_free(tcl_value_t *v);
The default implementation uses malloc/free and keeps lists as raw strings with the items quoted. Despite the simplicity, it may break if you put some badly escaped strings into a list. So probably avoid using unpaired braces in strings if you use the default list implementation. Or rewrite it using proper linked lists.
Environments
Tcl interpreter uses a stack of environments where variables are stored. Environment is handled by 3 functions and it can be customized if needed (e.g. use hash maps to speed up variable lookup):
static struct tcl_env *tcl_env_alloc(struct tcl_env *parent);
static struct tcl_var *tcl_env_var(struct tcl_env *env, tcl_value_t *name);
static struct tcl_env *tcl_env_free(struct tcl_env *env);
Environments have links from child to parent, making it possible to implementupeval
and other commands.
Commands
The default set of commands includes “subst”, “set”, “while”, “if”, “proc”, “return”, “break” and “continue”. There is also “puts” which can be disabled if your target system has no stdout. Also there are some mathematical operations taken from Picol that can be optionally disabled if you’re going to use Partcl as a command shell and not as a programming language.
A typical command is just a C function of the following look:
static int tcl_cmd_puts(struct tcl *tcl, tcl_value_t *args, void *arg) {
tcl_value_t *text = tcl_list_at(args, 1);
puts(tcl_string(text));
return FNORMAL; /* continue to the next command normally */
}
It takes an interpreter object, a list of arguments and an optional pointer to some context. Some commands have fixed arity, then the interpreter controls it. For zero arity the command must control it internally, which makes it possible to use variadic arguments.
What’s next?
It was fun to write a real Tcl interpreter. It was even more fun to actually use it on a real MCU. Good test coverage and no memory leaks (according to valgrind) make it a good candidate for low-end scripting.
The whole interpreter is just a single file with no other dependencies than
libc (for <strings.h>
and malloc/free).
I’m not sure what to do next with it, but here are some ideas:
- Run some benchmarks to see how fast the lexer and the interpterer are.
- Make an alternative implementation for
tcl_value_t
that uses real lists and caches numbers. - Make a library of commands for lists, hash maps, strings,
for
loops etc.
If you find it interesting - feel free to contribute or report issues athttps://github.com/zserge/partcl
Posted on 2016-08-11