nullprogram.com/blog/2016/11/13/
As part of a demonstration in an upcoming article, I wrote a simpletrie implementation. A trie is a search tree where the keys are a sequence of symbols (i.e. strings). Strings with a common prefix share an initial path down the trie, and the keys themselves are stored implicitly by the structure of the trie. It’s commonly used as a sorted set or, when values are associated with nodes, an associative array.
This wasn’t my first time writing a trie. The curse of programming in C is rewriting the same data structures and algorithms over and over. It’s the problem C++ templates are intended to solve. This rewriting isn’t always bad since each implementation is typically customized for its specific use, often resulting in greater performance and a smaller resource footprint.
Every time I’ve rewritten a trie, my implementation is a little bit better than the last. This time around I discovered an approach for traversing, both depth-first and breadth-first, an arbitrarily-sized trie without memory allocation. I’m definitely not the first to discover something like this. There’sDeutsch-Schorr-Waite pointer reversal for binary graphs (1965) — which I originally learned from reading the Scheme 9 from Outer Space garbage collector source — and Morris in-order traversal (1979) for binary trees. The former requires two extra tag bits per node and the latter requires no modifications at all.
What’s a trie?
But before I go further, some background. A trie can come in many shapes and sizes, but in the simple case each node of a trie has as many pointers as its alphabet. For illustration purposes, imagine a trie for strings of only four characters: A, B, C, and D. Each node is essentially four pointers.
#define TRIE_ALPHABET_SIZE 4
#define TRIE_STATIC_INIT {.flags = 0}
#define TRIE_TERMINAL_FLAG (1U << 0)structtrie{structtrie*next[TRIE_ALPHABET_SIZE];unsignedflags;};
It includes a flags
field, where a single bit tracks whether or not
a node is terminal — that is, a key terminates at this node. Terminal
nodes are not necessarily leaf nodes, which is the case when one key
is a prefix of another key. I could instead have used a 1-bit
bit-field (e.g. int is_terminal : 1;
) but I don’t like bit-fields.
A trie with the following keys, inserted in any order:
Looks like this (terminal nodes illustrated as small black squares):
The root of the trie is the empty string, and each child represents a trie prefixed with one of the symbols from the alphabet. This is a nice recursive definition, and it’s tempting to write recursive functions to process it. For example, here’s a recursive insertion function.
inttrie_insert_recursive(structtrie*t,constchar*s){if(!*s){t->flags|=TRIE_TERMINAL_FLAG;return1;}inti=*s-'A';if(!t->next[i]){t->next[i]=malloc(sizeof(*t->next[i]));if(!t->next[i])return0;*t->next[i]=(structtrie)TRIE_STATIC_INIT;}returntrie_insert_recursive(t->next[i],s+1);}
If the string is empty (!*s
), mark the current node as terminal.
Otherwise recursively insert the substring under the appropriate
child. That’s a tail call, and any optimizing compiler would optimize
this call into a jump back to the beginning of of the function
(tail-call optimization), reusing the stack frame as if it were a
simple loop.
If that’s not good enough, such as when optimization is disabled for debugging and the recursive definition is blowing the stack, this is trivial to convert to a safe, iterative function. I prefer this version anyway.
inttrie_insert(structtrie*t,constchar*s){for(;*s;s++){inti=*s-'A';if(!t->next[i]){t->next[i]=malloc(sizeof(*t->next[i]));if(!t->next[i])return0;*t->next[i]=(structtrie)TRIE_STATIC_INIT;}t=t->next[i];}t->flags|=TRIE_TERMINAL_FLAG;return1;}
Finding a particular prefix in the trie iteratively is also easy. This would be used to narrow the trie to a chosen prefix before iterating over the keys (e.g. find all strings matching a prefix).
structtrie*trie_find(structtrie*t,constchar*s){for(;*s;s++){inti=*s-'A';if(!t->next[i])returnNULL;t=t->next[i];}returnt;}
Depth-first traversal is stack-oriented. The stack represents the path through the graph, and each new vertex is pushed into this stack as it’s visited. A recursive traversal function can implicitly use the call stack for storing this information, so no additional data structure is needed.
The downside is that the call is no longer tail-recursive, so a large trie will blow the stack. Also, the caller needs to provide a callback function because the stack cannot unwind to return a value: The stack has important state on it. Here’s a typedef for the callback.
typedefvoid(*trie_visitor)(constchar*key,void*arg);
And here’s the recursive depth-first traversal function. The top-level
caller passes the same buffer for buf
and bufend
, which must be at
least as large as the largest key. The visited key will be written to
this buffer and passed to the visitor.
voidtrie_dfs_recursive(structtrie*t,char*buf,char*bufend,trie_visitorv,void*arg){if(t->flags&TRIE_TERMINAL_FLAG){*bufend=0;v(buf,arg);}for(inti=0;i<TRIE_ALPHABET_SIZE;i++){if(t->next[i]){*bufend='A'+i;trie_dfs_recursive(t->next[i],buf,bufend+1,v,arg);}}}
Heap-allocated Traversal Stack
Moving the traversal stack to the heap would eliminate the stack overflow problem and it would allow control to return to the caller. This is going to be a lot of code for an article, but bear with me.
First define an iterator object. The stack will need two pieces of
information: which node did we come from (p
) and through which
pointer (i
). When a node has been exhausted, this will allow return
to the parent. The root
field tracks when traversal is complete.
structtrie_iter{structtrie*root;char*buf;char*bufend;struct{structtrie*p;inti;}*stack;};
A special value of -1 in i
means it’s the first visit for this node
and it should be visited by the callback if it’s terminal.
The iterator is initialized with trie_iter_init
. The max
indicates
the maximum length of any key. A more elaborate implementation could
automatically grow the stack to accommodate (e.g. realloc()), but I’m
keeping it as simple as possible.
inttrie_iter_init(structtrie_iter*it,structtrie*t,size_tmax){it->root=t;it->stack=malloc(sizeof(*it->stack)*max);if(!it->stack)return0;it->buf=it->bufend=malloc(max);if(!it->buf){free(it->stack);return0;}it->stack->p=t;it->stack->i=-1;return1;}voidtrie_iter_destroy(structtrie_iter*it){free(it->stack);it->stack=NULL;free(it->buf);it->buf=NULL;}
And finally the complicated part. This uses the allocated stack to explore the trie in a loop until it hits a terminal, at which point it returns. A further call continues the traversal from where it left off. It’s like a hand-coded generator. With the way it’s written, the caller is obligated to follow through with the entire iteration before destroying the iterator, but this would be easy to correct.
inttrie_iter_next(structtrie_iter*it){for(;;){structtrie*current=it->stack->p;inti=it->stack->i++;if(i==-1){/* Return result if terminal node. */if(current->flags&TRIE_TERMINAL_FLAG){*it->bufend=0;return1;}continue;}if(i==TRIE_ALPHABET_SIZE){/* End of current node. */if(current==it->root)return0;// back at root, doneit->stack--;it->bufend--;continue;}if(current->next[i]){/* Push on next child node. */*it->bufend='A'+i;it->stack++;it->bufend++;it->stack->p=current->next[i];it->stack->i=-1;}}}
This is much nicer for the caller since there’s no control inverse.
structtrie_iterit;trie_iter_init(&it,&trie_root,KEY_MAX);while(trie_iter_next(&it)){// ... do something with it.buf ...}trie_iter_destroy(&it);
There are a few downsides to this:
Initialization could fail (not checked in the example) since it allocates memory.
Either the caller has to keep track of the maximum key length, or the iterator grows the stack automatically, which would mean iteration could fail at any point in the middle.
In order to destroy the trie, it needs to be traversed: Freeing memory first requires allocating memory. If the program is out of memory, it cannot destroy the trie to clean up before handling the situation, nor to make more memory available. It’s not good for resilience.
Wouldn’t it be nice to traverse the trie without memory allocation?
Modifying the Trie
Rather than allocate a separate stack, the stack can be allocated
across the individual nodes of the trie. Remember those p
and i
fields from before? Put them on the trie.
structtrie_v2{structtrie_v2*next[TRIE_ALPHABET_SIZE];structtrie_v2*p;inti;unsignedflags;};
This automatically scales with the size of the trie, so there will always be enough of this stack. With the stack “pre-allocated” like this, traversal requires no additional memory allocation.
The iterator itself becomes a little simpler. It cannot fail and it doesn’t need a destructor.
structtrie_v2_iter{structtrie_v2*current;char*buf;};voidtrie_v2_iter_init(structtrie_v2_iter*it,structtrie_v2*t,char*buf){t->p=NULL;t->i=-1;it->current=t;it->buf=buf;}
The iteration function itself is almost identical to before. Rather
than increment a stack pointer, it uses p
to chain the nodes as a
linked list.
inttrie_v2_iter_next(structtrie_v2_iter*it){for(;;){structtrie_v2*current=it->current;inti=it->current->i++;if(i==-1){/* Return result if terminal node. */if(current->flags&TRIE_TERMINAL_FLAG){*it->buf=0;return1;}continue;}if(i==TRIE_ALPHABET_SIZE){/* End of current node. */if(!current->p)return0;it->current=current->p;it->buf--;continue;}if(current->next[i]){/* Push on next child node. */*it->buf='A'+i;it->buf++;it->current=current->next[i];it->current->p=current;it->current->i=-1;}}}
During traversal the iteration pointers look something like this:
This is not without its downsides:
Traversal is not re-entrant nor thread-safe. It’s not possible to run multiple in-place iterators side by side on the same trie since they’ll clobber each other.
It uses more memory — O(n) rather than O(max-key-length) — and sits on this extra memory for its entire lifetime.
Breadth-first Traversal
The same technique can be used for breadth-first search, which isqueue-oriented rather than stack-oriented. The p
pointers are
instead chained into a queue, with a head
and tail
pointer
variable for each end. As each node is visited, its children are
pushed into the queue linked list.
This isn’t good for visiting keys by name. buf
was itself a stack
and played nicely with depth-first traversal, but there’s no easy way
to build up a key in a buffer breadth-first. So instead here’s a
function to destroy a trie breadth-first.
voidtrie_v2_destroy(structtrie_v2*t){structtrie_v2*head=t;structtrie_v2*tail=t;while(head){for(inti=0;i<TRIE_ALPHABET_SIZE;i++){structtrie_v2*next=head->next[i];if(next){next->p=NULL;tail->p=next;tail=next;}}structtrie_v2*dead=head;head=head->p;free(dead);}}
During its traversal the p
pointers link up like so:
Further Research
In my real code there’s also a flag to indicate the node’s allocation type: static or heap. This allows a trie to be composed of nodes from both kinds of allocations while still safe to destroy. It might also be useful to pack a reference counter into this space so that a node could be shared by more than one trie.
For a production implementation it may be worth packing i
into theflags
field since it only needs a few bits, even with larger
alphabets. Also, I bet, as in Deutsch-Schorr-Waite, the p
field
could be eliminated and instead one of the child pointers is
temporarily reversed. With these changes, this technique would fit
into the original struct trie
without changes, eliminating the extra
memory usage.
Update: Over on Hacker News, psi-squared has interesting suggestions such as leaving the traversal pointers intact, particularly in the case of a breadth-first search, which, until the next trie modification, allows for concurrent follow-up traversals.