At TailorDev, we are polyglot developers and scientists: we are comfortable with numerous programming languages, and we use our Le lab sessions to play with new programming languages and alternative paradigms. Our respective backgrounds are also helpful to discuss with researchers because we understand underlying science as well as their needs.
In the following, we describe the xper-tools project, in collaboration with a research team (LIS team) hosted in the French National Museum of Natural History in Paris. The XPER tools are a set of very old programs (~30 tools), written in C, which are used for taxonomy purpose. By old, we mean almost 30 years old. You read it well, that is older than the first Standard C published by ANSI!
VOIDmain(argc,argv)intargc;char**argv;{VOIDloadkb();VOIDfreekb();VOIDhelp();VOIDevalxp();char*gets();fprintf(stderr,"EVALXP V1.02 (02/08/1987) J. LEBBE & R. VIGNES.\n");...}
Even though these programs work pretty well and are still used on a daily basis, there are some limitations. First, these software have been written forMS-DOS and therefore require a rather old computer to use them. This leads to two more issues: not everyone can easily use them, and it is nearly impossible to interface them with other software.
We have been asked to build a Proof Of Concept (POC) to transform these programs into web services in a week (it was a nice to have at the end of a bigger project we will likely present in another blog post). Challenge accepted!
Analysis
We started by analyzing the different programs and decided on a first program to
use in our POC. The source code of these numerous tools also bundle differentMakefile
and some documentation. Luckily, these programs are well written even
though some parts are cryptic. All tools are designed to be run from the
command line, use the same set of data as input (knowledge bases), and some of
them have options (flags) with the following DOS-like syntax: /B
. In addition,
all programs respond to the /H
help option, providing interesting information
for each program:
$ bin/chkbase
CHKBASE V1.06 (22/05/1988) J. LEBBE & R. VIGNES.
Syntax: CHKBASE name-of-base [/H] [/V]
/H Help
/V Verbose mode
Nom de fichier absent
Every time we work for/with a customer, we make sure that what we produce is easily reusable afterwards. In this context, we designed the POC as the foundation of a production-ready software, which could leverage all the existing programs. Hence, we decided to focus on two main tasks:
- being able to compile and run the programs on different platforms;
- proposing a unified solution to expose the programs over HTTP.
Instead of having to deal with many Makefile
and other files to build the
different tools, why not using a common tool that would do most of the job for
us? Wouldn’t be super cool if we would only have to run make
to build all the
tools at once? The Autotools
(not to be confused with the Autobots)
are the solution!
If you do not know what the Autotools are, you may already have installed software from source with the following commands:
$ ./configure
$ make
$ (sudo) make install
The first line executes a shell script to, first, determine if all requirements
are met to build the software, and second, to create a Makefile
based on a
template (Makefile.in
). If a mandatory dependency is missing on your system,
the script will abort, forcing you to install that dependency. That is very
useful to ensure reproducibility. The configure
script has not been written by
hand, but generated by autoconf
, using yet another template file namedconfigure.ac
:
AC_INIT([xper-tools], [1.0.0], [author@example.org])
AM_INIT_AUTOMAKE # use `automake` to generate a `Makefile.in`
AC_PROG_CC # require a C compiler
AC_CONFIG_FILES([Makefile]) # create a `Makefile` from `Makefile.in`
AC_OUTPUT # output the script
The Makefile.in
template is also generated thanks to automake
and aMakefile.am
template. That is also why we had to use the AM_INIT_AUTOMAKE
directive in the configure.ac
file above.
A Makefile.am
template usually starts by defining the layout of the project,
which should beforeign
if you are not using the standard layout of a GNU project (which is likely the
case). In the example below, we provide global compiler flags with theAM_CFLAGS
and AM_LDFLAGS
directives. Next, we tell automake
that theMakefile
should build the different programs using the bin_PROGRAMS
directive:
# Makefile.am
AUTOMAKE_OPTIONS = foreign
# Global flags
AM_CFLAGS = -W -Wall -ansi -pedantic
AM_LDFLAGS =
# Target binaries
bin_PROGRAMS = chkbase \
makey \
mindescr
...
The bin
prefix tells automake
to “install” the listed files into the
directory defined by the variable bindir
, which should point to/usr/local/bin
by default (/usr/local
being the “prefix” directory).
The PROGRAMS
suffix is called a primary and tells automake
which properties
the listed files own. For instance, PROGRAMS
are compiled. Hence, we must tellautomake
where to find the source files (we also add per-program compilation
flags):
# Makefile.am
...
# -- chkbase
chkbase_CFLAGS = -D LINT_ARGS
chkbase_SOURCES = xper.h det.h loadxp.c detool.c chkbase.c
By adding more similar lines to the Makefile.am
, we can support all the
existing programs, leveraging a simple and uniform way to build all the tools.
Now that the configuration templates/files have been written, we can use the
Autotools to generated the ready-to-use files. Let’s start with the configure
script:
$ autoreconf --verbose --install --force
Various files have been generated, but the most important one is the configure
script, which will be useful to generate the final Makefile
. You can pass
some options to this script such as --prefix
to specify the prefix directory.
For instance, to install all the files into your current directory, you could
run:
$ ./configure --prefix=$(pwd)
We can run make
to compile all the tools at once, and make install
to
“install” the binaries into the <PREFIX>/bin
folder. But we also get a
distribution solution for free by using make dist
. This target builds a
tarball of the project containing all of the files we need to distribute. End
users could download this archive and run the commands below without having to
worry about the Autotools:
$ ./configure
$ make
$ (sudo) make install
After having successfully ported one tool to this new(-ish) build system, we
wrote a procedure to port the other programs and we tested it by asking someone
else to port another program. Fortunately, compiling these programs was not too
difficult as soon as we figured out which encoding was used (hello CP
850), found all the required
header files, and performed minor code changes such as adding proper exit codes
and removing a case '/':
line used for parsing the (DOS-style) options because
it caused an incompatibility with UNIX absolute paths.
Naturally, we added some smoke tests to ensure the compiled binaries were behaving correctly (based on the outputs given by the old computer in the research team’s lab) and automated the building and testing phases with GitLab CI. With little effort, the different XPER tools can now be compiled and executed on any new system. The first goal is therefore satisfied and we can now present how we designed an API to expose these tools over HTTP in the next section.
RPC-style HTTP API
The different source codes are very application-oriented and not library-oriented, which prevented us to compile C libraries that we could have imported in Go orPython. Hence, we decided to “wrap” the C tools to integrate them with the API code. We chose the Python programming language as it is usually a good choice in Academia (and also because it is fast).
We wrote a generic yet smart wrapper that is able to:
- execute any C program and return its output thanks to the Python
subprocess
module; - determine the options of any C program it wraps by invoking the program with
the help (
-H
) flag (cf. the Analysis section); - validate the supplied options. Since the wrapper knows which options a program can accept, it can easily reject invalid options and prevent invalid calls;
- provide a nice and simple programmatic API:
fromapi.wrappersimportToolWrappermakey=ToolWrapper('makey')cp=makey.invoke('/path/to/data',B=True)# cp.stdout contains the output result
Hat tip to Julien for this clever wrapper. Once we were able to call a XPER tool from Python, we started to write a HTTP API using a Python web framework such as Flask. At TailorDev, we like to write pragmatic HTTP APIs, and we always adopt a documentation-first approach. Apiary and API Blueprint are our favorite tools for that.
We drafted a HTTP API that speaks JSON and exposes two main endpoints:
/knowledge-bases
to manage the data for the different XPER tools;/tools.run
to call the XPER tools.
The former responds to the GET
and POST
methods to return a set of data (a
knowledge base) and create such knowledge bases respectively. The latter is aRemote Procedure
Call
(RPC) endpoint, which is perfectly fine for representing what we want to
achieve: calling a function (over HTTP).
Each knowledge base is identified by aUUID, and the bases
are persisted on the filesystem (which may evolve in the future). With both the
tools ready to be executed and the data on the server, we only had to glue them
thanks to the /tools.run
endpoint, which can be triggered by the POST
method:
POST /tools.run/chkbase
Content-Type: application/json
Accept: application/json
{
"knowledge_base_id": "27d99b56-9327-4d28-a69c-31229bf971aa"
}
Nevertheless, the different programs do not output JSON content but formatted plain text. In order to reach interoperability, it was not conceivable to keep the output as is, hence the concept of parsers. Each program gets its own parser for transforming the plain text output into a Python data structure we can later serialize as we wish. Using this approach, we were able to write a lot of unit tests based on different realistic outputs, and guarantee enough flexibility into the application. We then created a configuration file for the supported tools and their associated parsers and options:
from.parsersimportMindescrParser,ChkbaseParsersupported_tools={'mindescr':{'parser':MindescrParser(),'options':[]},'chkbase':{'parser':ChkbaseParser(),'options':[('verbose','V'),]}}
The controller logic behind the /tools.run/<name>
relies on this configuration
to determine which tools (and options) are allowed, but also which parser to
use. When all conditions are met, it runs the program with the knowledge base as
input thanks to the wrapper, it parses the output with the appropriate parser,
and returns the result as a JSON response.
Adding support for a new program only requires to write a parser for the output
of that program and update the configuration. As you may have noticed, theoptions
array contains tuples (option_name, tool_option)
that map more
meaningful option names (e.g., verbose
) to their corresponding tool options
(e.g., -V
). That way, we can completely hide the program details behind the
API, which might also be handy in the future.
We ended this part by writing a small Node.jsCLI to demonstrate how this API could be used, but also to give non-technical people a way to consume this API and understand what has been done.
Conclusion
Tackling technical challenges is usually not a problem. In this case, the most interesting yet complicated task was to strike a happy medium between a good software architecture and an easy way to upgrade all the existing XPER tools. All in all, it took us 8 days to design, implement, test and document this solution, including the CLI. We ported three programs to the new build system, and exposed two different tools on the HTTP API.
This project was awesome because we felt really proud of giving a second life to these very old C programs. It was challenging to come up with a production-ready Proof Of Concept that could be easily improved in the future, in a short amount of time.
That is the kind of things we do and like to do!