Hacker News

What are computers for?

Historically, different answers to this question – that is, different visions of computing – have helped inspire and determine the computing systems humanity has ultimately built. Consider the early electronic computers. ENIAC, the world’s ﬁrst general-purpose electronic computer, was commissioned to compute artillery ﬁring tables for the United States Army. Other early computers were also used to solve numerical problems, such as simulating nuclear explosions, predicting the weather, and planning the motion of rockets. The machines operated in a batch mode, using crude input and output devices, and without any real-time interaction. It was a vision of computers as number-crunching machines, used to speed up calculations that would formerly have taken weeks, months, or more for a team of humans.

In the 1950s a different vision of what computers are for began to develop. That vision was crystallized in 1962, when Douglas Engelbart proposed that computers could be used as a way of augmenting human intellect. In this view, computers weren’t primarily tools for solving number-crunching problems. Rather, they were real-time interactive systems, with rich inputs and outputs, that humans could work with to support and expand their own problem-solving process. This vision of intelligence augmentation (IA) deeply inﬂuenced many others, including researchers such as Alan Kay at Xerox PARC, entrepreneurs such as Steve Jobs at Apple, and led to many of the key ideas of modern computing systems. Its ideas have also deeply inﬂuenced digital art and music, and ﬁelds such as interaction design, data visualization, computational creativity, and human-computer interaction.

Research on IA has often been in competition with research on artiﬁcial intelligence (AI): competition for funding, competition for the interest of talented researchers. Although there has always been overlap between the ﬁelds, IA has typically focused on building systems which put humans and machines to work together, while AI has focused on complete outsourcing of intellectual tasks to machines. In particular, problems in AI are often framed in terms of matching or surpassing human performance: beating humans at chess or Go; learning to recognize speech and images or translating language as well as humans; and so on.

This essay describes a new ﬁeld, emerging today out of a synthesis of AI and IA. For this ﬁeld, we suggest the name artiﬁcial intelligence augmentation (AIA): the use of AI systems to help develop new methods for intelligence augmentation. This new ﬁeld introduces important new fundamental questions, questions not associated to either parent ﬁeld. We believe the principles and systems of AIA will be radically different to most existing systems.

Our essay begins with a survey of recent technical work hinting at artiﬁcial intelligence augmentation, including work on generative interfaces– that is, interfaces which can be used to explore and visualize generative machine learning models. Such interfaces develop a kind of cartography of generative models, ways for humans to explore and make meaning from those models, and to incorporate what those models “know” into their creative work.

Our essay is not just a survey of technical work. We believe now is a good time to identify some of the broad, fundamental questions at the foundation of this emerging ﬁeld. To what extent are these new tools enabling creativity? Can they be used to generate ideas which are truly surprising and new, or are the ideas cliches, based on trivial recombinations of existing ideas? Can such systems be used to develop fundamental new interface primitives? How will those new primitives change and expand the way humans think?

Using generative models to invent meaningful creative operations

Let’s look at an example where a machine learning model makes a new type of interface possible. To understand the interface, imagine you’re a type designer, working on creating a new fontWe shall egregiously abuse the distinction between a font and a typeface. Apologies to any type designers who may be reading.. After sketching some initial designs, you wish to experiment with bold, italic, and condensed variations. Let’s examine a tool to generate and explore such variations, from any initial design. For reasons that will soon be explained the quality of results is quite crude; please bear with us.

Of course, varying the bolding (i.e., the weight), italicization and width are just three ways you can vary a font. Imagine that instead of building specialized tools, users could build their own tool merely by choosing examples of existing fonts. For instance, suppose you wanted to variy the degree of seriﬁng on a font. In the following, please select 5 to 10 sans-serif fonts from the top box, and drag them to the box on the left. Select 5 to 10 serif fonts and drag them to the box on the right. As you do this, a machine learning model running in your browser will automatically infer from these examples how to interpolate your starting font in either the serif or sans-serif direction:

In fact, we used this same technique to build the earlier bolding italicization, and condensing tool. To do so, we used the following examples of bold and non-bold fonts, of italic and non-italic fonts, and of condensed and non-condensed fonts:

To build these tools, we used what’s called a generative model; the particular model we use was trained by James Wexler. To understand generative models, consider that a priori describing a font appears to require a lot of data. For instance, if the font is $64$ by $64$ pixels, then we’d expect to need $64 \times 64 = 4,096$ parameters to describe a single glyph. But we can use a generative model to ﬁnd a much simpler description.

We do this by building a neural network which takes a small number of input variables, called latent variables, and produces as output the entire glyph. For the particular model we use, we have $40$ latent space dimensions, and map that into the $4,096$ -dimensional space describing all the pixels in the glyph. In other words, the idea is to map a low-dimensional space into a higher-dimensional space:

The generative model we use is a type of neural network known as a variational autoencoder (VAE). For our purposes, the details of the generative model aren’t so important. The important thing is that by changing the latent variables used as input, it’s possible to get different fonts as output. So one choice of latent variables will give one font, while another choice will give a different font:

You can think of the latent variables as a compact, high-level representation of the font. The neural network takes that high-level representation and converts it into the full pixel data. It’s remarkable that just $40$ numbers can capture the apparent complexity in a glyph, which originally required $4,096$ variables.

The generative model we use is learnt from a training set of more than $50$ thousand fonts Bernhardsson scraped from the open web. During training, the weights and biases in the network are adjusted so that the network can output a close approximation to any desired font from the training set, provided a suitable choice of latent variables is made. In some sense, the model is learning a highly compressed representation of all the training fonts.

In fact, the model doesn’t just reproduce the training fonts. It can also generalize, producing fonts not seen in training. By being forced to ﬁnd a compact description of the training examples, the neural net learns an abstract, higher-level model of what a font is. That higher-level model makes it possible to generalize beyond the training examples already seen, to produce realistic-looking fonts.

Ideally, a good generative model would be exposed to a relatively small number of training examples, and use that exposure to generalize to the space of all possible human-readable fonts. That is, for any conceivable font – whether existing or perhaps even imagined in the future – it would be possible to ﬁnd latent variables corresponding exactly to that font. Of course, the model we’re using falls far short of this ideal – a particularly egregious failure is that many fonts generated by the model omit the tail on the capital “Q” (you can see this in the examples above). Still, it’s useful to keep in mind what an ideal generative model would do.

Such generative models are similar in some ways to how scientiﬁc theories work. Scientiﬁc theories often greatly simplify the description of what appear to be complex phenomena, reducing large numbers of variables to just a few variables from which many aspects of system behaviour can be deduced. Furthermore, good scientiﬁc theories sometimes enable us to generalize to discover new phenomena.

As an example, consider ordinary material objects. Such objects have what physicists call a phase– they may be a liquid, a solid, a gas, or perhaps something more exotic, like a superconductor or Bose-Einstein condensate. A priori, such systems seem immensely complex, involving perhaps $10^{23}$ or so molecules. But the laws of thermodynamics and statistical mechanics enable us to ﬁnd a simpler description, reducing that complexity to just a few variables (temperature, pressure, and so on), which encompass much of the behaviour of the system. Furthermore, sometimes it’s possible to generalize, predicting unexpected new phases of matter. For example, in 1924, physicists used thermodynamics and statistical mechanics to predict a remarkable new phase of matter, Bose-Einstein condensation, in which a collection of atoms may all occupy identical quantum states, leading to surprising large-scale quantum interference effects. We’ll come back to this predictive ability in our later discussion of creativity and generative models.

Returning to the nuts and bolts of generative models, how can we use such models to do example-based reasoning like that in the tool shown above? Let’s consider the case of the bolding tool. In that instance, we take the average of all the latent vectors for the user-speciﬁed bold fonts, and the average for all the user-speciﬁed non-bold fonts. We then compute the difference between these two average vectors:

We’ll refer to this as the bolding vector. To make some given font bolder, we simply add a little of the bolding vector to the corresponding latent vector, with the amount of bolding vector added controlling the boldness of the resultIn practice, sometimes a slightly different procedure is used. In some generative models the latent vectors satisfy some constraints– for instance, they may all be of the same length. When that’s the case, as in our model, a more sophisticated “adding” operation must be used, to ensure the length remains the same. But conceptually, the picture of adding the bolding vector is the right way to think.:

This technique was introduced by Larsen et al, and vectors like the bolding vector are sometimes calledattribute vectors. The same idea is use to implement all the tools we’ve shown. That is, we use example fonts to creating a bolding vector, an italicizing vector, a condensing vector, and a user-deﬁned serif vector. The interface thus provides a way of exploring the latent space in those four directions.

The tools we’ve shown have many drawbacks. Consider the following example, where we start with an example glyph, in the middle, and either increase or decrease the bolding (on the right and left, respectively):

Examining the glyphs on the left and right we see many unfortunate artifacts. Particularly for the rightmost glyph, the edges start to get rough, and the serifs begin to disappear. A better generative model would reduce those artifacts. That’s a good long-term research program, posing many intriguing problems. But even with the model we have, there are also some striking beneﬁts to the use of the generative model.

To understand these beneﬁts, consider a naive approach to bolding, in which we simply add some extra pixels around a glyph’s edges, thickening it up. While this thickening perhaps matches a non-expert’s way of thinking about type design, an expert does something much more involved. In the following we show the results of this naive thickening procedure versus what is actually done, for Georgia and Helvetica:

As you can see, the naive bolding procedure produces quite different results, in both cases. For example, in Georgia, the left stroke is only changed slightly by bolding, while the right stroke is greatly enlargened, but only on one side. In both fonts, bolding doesn’t change the height of the font, while the naive approach does.

As these examples show, good bolding is not a trivial process of thickening up a font. Expert type designers have many heuristics for bolding, heuristics inferred from much previous experimentation, and careful study of historical examples. Capturing all those heuristics in a conventional program would involve immense work. The beneﬁt of using the generative model is that it automatically learns many such heuristics.

For example, a naive bolding tool would rapidly ﬁll in the enclosed negative space in the enclosed upper region of the letter “A”. The font tool doesn’t do this. Instead, it goes to some trouble to preserve the enclosed negative space, moving the A’s bar down, and ﬁlling out the interior strokes more slowly than the exterior. This principle is evident in the examples shown above, especially Helvetica, and it can also be seen in the operation of the font tool:

The heuristic of preserving enclosed negative space is not a priori obvious. However, it’s done in many professionally designed fonts. If you examine examples like those shown above it’s easy to see why: it improves legibility. During training, our generative model has automatically inferred this principle from the examples it’s seen. And our bolding interface then makes this available to the user.

In fact, the model captures many other heuristics. For instance, in the above examples the heights of the fonts are (roughly) preserved, which is the norm in professional font design. Again, what’s going on isn’t just a thickening of the font, but rather the application of a more subtle heuristic inferred by the generative model. Such heuristics can be used to create fonts with properties which would otherwise be unlikely to occur to users. Thus, the tool expands ordinary people’s ability to explore the space of meaningful fonts.

The font tool is an example of a kind of cognitive technology. In particular, the primitive operations it contains can be internalized as part of how a user thinks. In this it resembles a program such as Photoshop or a spreadsheet or 3D graphics programs. Each provides a novel set of interface primitives, primitives which can be internalized by the user as fundamental new elements in their thinking. This act of internalization of new primitives is fundamental to much work on intelligence augmentation.

The ideas shown in the font tool can be extended to other domains. Using the same interface, we can use a generative model to manipulate images of human faces using qualities such as expression, gender, or hair color. Or to manipulate sentences using length, sarcasm, or tone. Or to manipulate molecules using chemical properties:

Such generative interfaces provide a kind of cartography of generative models, ways for humans to explore and make meaning using those models.

We saw earlier that the font model automatically infers relatively deep principles about font design, and makes them available to users. While it’s great that such deep principles can be inferred, sometimes such models infer other things that are wrong, or undesirable. For example, White points out the addition of a smile vector in some face models will make faces not just smile more, but also appear more feminine. Why? Because in the training data more women than men were smiling. So these models may not just learn deep facts about the world, they may also internalize prejudices or erroneous beliefs. Once such a bias is known, it is often possible to make corrections. But to ﬁnd those biases requires careful auditing of the models, and it is not yet clear how we can ensure such audits are exhaustive.

More broadly, we can ask why attribute vectors work, when they work, and when they fail? At the moment, the answers to these questions are poorly understood.

For the attribute vector to work requires that taking any starting font, we can construct the corresponding bold version by adding the same vector in the latent space. However, a priori there is no reason using a single constant vector to displace will work. It may be that we should displace in many different ways. For instance, the heuristics used to bold serif and sans-serif fonts are quite different, and so it seems likely that very different displacements would be involved:

Of course, we could do something more sophisticated than using a single constant attribute vector. Given pairs of example fonts (unbold, bold) we could train a machine learning algorithm to take as input the latent vector for the unbolded version and output the latent vector for the bolded version. With additional training data about font weights, the machine learning algorithm could learn to generate fonts of arbitrary weight. Attribute vectors are just an extremely simple approach to doing these kinds of operation.

For these reasons, it seems unlikely that attribute vectors will last as an approach to manipulating high-level features. Over the next few years much better approaches will be developed. However, we can still expect interfaces offering operations broadly similar to those sketched above, allowing access to high-level and potentially user-deﬁned concepts. That interface pattern doesn’t depend on the technical details of attribute vectors.

Interactive Generative Adversarial Models

Let’s look at another example using machine learning models to augment human creativity. It’s the interactive generative adversarial networks, or iGANs, introduced by Zhu et al in 2016.

One of the examples of Zhu et al is the use of iGANs in an interface to generate images of consumer products such as shoes. Conventionally, such an interface would require the programmer to write a program containing a great deal of knowledge about shoes: soles, laces, heels, and so on. Instead of doing this, Zhu et al train a generative model using $50$ thousand images of shoes, downloaded from Zappos. They then use that generative model to build an interface that lets a user roughly sketch the shape of a shoe, the sole, the laces, and so on:

Excerpted from Zhu et al.

The visual quality is low, in part because the generative model Zhu et al used is outdated by modern (2017) standards– with more modern models, the visual quality would be much higher.

But the visual quality is not the point. Many interesting things are going on in this prototype. For instance, notice how the overall shape of the shoe changes considerably when the sole isﬁlled in – it becomes narrower and sleeker. Many small details are ﬁlled in, like the black piping on the top of the white sole, and the red colouring ﬁlled in everywhere on the shoe’s upper. These and other facts are automatically deduced from the underlying generative model, in a way we’ll describe shortly.

The same interface may be used to sketch landscapes. The only difference is that the underlying generative model has been trained on landscape images rather than images of shoes. In this case it becomes possible to sketch in just the colors associated to a landscape. For example, here’s a user sketching in some green grass, the outline of a mountain, some blue sky, and snow on the mountain:

Excerpted from Zhu et al.

The generative models used in these interfaces are different than for our font model. Rather than using variational autoencoders, they’re based on generative adversarial networks (GANs). But the underlying idea is still to ﬁnd a low-dimensional latent space which can be used to represent (say) all landscape images, and map that latent space to a corresponding image. Again, we can think of points in the latent space as a compact way of describing landscape images.

Roughly speaking, the way the iGANs works is as follows. Whatever the current image is, it corresponds to some point in the latent space:

Suppose, as happened in the earlier video, the user now sketches in a stroke outlining the mountain shape. We can think of the stroke as a constraint on the image, picking out a subspace of the latent space, consisting of all points in the latent space whose image matches that outline:

The way the interface works is to ﬁnd a point in the latent space which is near to the current image, so the image is not changed too much, but also coming close to satisfying the imposed constraints. This is done by optimizing an objective function which combines the distance to each of the imposed constraints, as well as the distance moved from the current point. If there’s just a single constraint, say, corresponding to the mountain stroke, this looks something like the following:

We can think of this, then, as a way of applying constraints to the latent space to move the image around in meaningful ways.

The iGANs have much in common with the font tool we showed earlier. Both make available operations that encode much subtle knowledge about the world, whether it be learning to understand what a mountain looks like, or inferring that enclosed negative space should be preserved when bolding a font. Both the iGANs and the font tool provide ways of understanding and navigating a high-dimensional space, keeping us on the natural space of fonts or shoes or landscapes. As Zhu et al remark:

[F]or most of us, even a simple image manipulation in Photoshop presents insurmountable difﬁculties… any less-than-perfect edit immediately makes the image look completely unrealistic. To put another way, classic visual manipulation paradigm does not prevent the user from “falling off” the manifold of natural images.

Like the font tool, the iGANs is a cognitive technology. Users can internalize the interface operations as new primitive elements in their thinking. In the case of shoes, for example, they can learn to think in terms of the difference they want to apply, adding a heel, or a higher top, or a special highlight. This is richer than the traditional way non-experts think about shoes (“Size 11, black” etc). To the extent that non-experts do think in more sophisticated ways – “make the top a little higher and sleeker” – they get little practice in thinking this way, or seeing the consequences of their choices. Having an interface like this enables easier exploration, the ability to develop idioms and the ability to plan, to swap ideas with friends, and so on.

Two models of computation

Let’s revisit the question we began the essay with, the question of what computers are for, and how this relates to intelligence augmentation.

One common conception of computers is that they’re problem-solving machines: “computer, what is the result of ﬁring this artillery shell in such-and-such a wind [and so on]?”; “computer, what will the maximum temperature in Tokyo be in 5 days?”; “computer, what is the best move to take when the Go board is in this position?”; “computer, how should this image be classiﬁed?”; and so on.

This is a conception common to both the early view of computers as number-crunchers, and also in much work on AI, both historically and today. It’s a model of a computer as a way of outsourcing cognition. In speculative depictions of possible future AI, this cognitive outsourcing model often shows up in the view of an AI as an oracle, able to solve some large class of problems with better-than-human performance.

But a very different conception of what computers are for is possible, a conception much more congruent with work on intelligence augmentation.

To understand this alternate view, consider our subjective experience of thought. For many people, that experience is verbal: they think using language, forming chains of words in their heads, similar to sentences in speech or written on a page. For other people, thinking is a more visual experience, incorporating representations such as graphs and maps. Still other people mix mathematics into their thinking, using algebraic expressions or diagrammatic techniques, such as Feynman diagrams and Penrose diagrams.

In each case, we’re thinking using representations invented by other people: words, graphs, maps, algebra, mathematical diagrams, and so on. We internalize these cognitive technologies as we grow up, and come to use them as a kind of substrate for our thinking.

For most of history, the range of available cognitive technologies has changed slowly and incrementally. A new word will be introduced, or a new mathematical symbol. More rarely, a radical new cognitive technology will be developed. For example, in 1637 Descartes published his “Discourse on Method”, explaining how to represent geometric ideas using algebra, and vice versa:

This enabled a radical change and expansion in how we think about both geometry and algebra.

Historically, lasting cognitive technologies have been invented only rarely. But modern computers are a meta-medium enabling the rapid invention of many new cognitive techologies. Consider a relatively banal example, such as Photoshop. Adept Photoshop users routinely have formerly impossible thoughts such as: “let’s apply the clone stamp to the such-and-such layer.”. That’s an instance of a more general class of thought: “computer, [new type of action] this [new type of representation for a newly imagined class of object]”. When that happens, we’re using computers to expand the range of thoughts we can think.

It’s this kind of cognitive transformation model which underlies much of the deepest work on intelligence augmentation. Rather than outsourcing cognition, it’s about changing the operations and representations we use to think; it’s about changing the substrate of thought itself. And so while cognitive outsourcing is important, this cognitive transformation view offers a much more profound model of intelligence augmentation. It’s a view in which computers are a means to change and expand human thought itself.

Historically, cognitive technologies were developed by human inventors, ranging from the invention of writing in Sumeria and Mesoamerica, to the modern interfaces of designers such as Douglas Engelbart, Alan Kay, and others.

Examples such as those described in this essay suggest that AI systems can enable the creation of new cognitive technologies. Things like the font tool aren’t just oracles to be consulted when you want a new font. Rather, they can be used to explore and discover, to provide new representations and operations, which can be internalized as part of the user’s own thinking. And while these examples are in their early stages, they suggest AI is not just about cognitive outsourcing. A different view of AI is possible, one where it helps us invent new cognitive technologies which transform the way we think.

In this essay we’ve focused on a small number of examples, mostly involving exploration of the latent space. There are many other examples of artiﬁcial intelligence augmentation. To give some ﬂavour, without being comprehensive: the sketch-rnn system, for neural network assisted drawing; the Wekinator, which enables users to rapidly build new musical instruments and artistic systems; TopoSketch, for developing animations by exploring latent spaces; machine learning models for designing overall typographic layout; and a generative model which enables interpolation between musical phrases. In each case, the systems use machine learning to enable new primitives which can be integrated into the user’s thinking. More broadly, artiﬁcial intelligence augmentation will draw on ﬁelds such as computational creativity and interactive machine learning.

Finding powerful new primitives of thought

We’ve argued that machine learning systems can help create representations and operations which serve as new primitives in human thought. What properties should we look for in such new primitives? This is too large a question to be answered comprehensively in a short essay. But we will explore it brieﬂy.

Historically, important new media forms often seem strange when introduced. Many such stories have passed into popular culture: the near riot at the premiere of Stravinsky and Nijinksy’s “Rite of Spring”; the consternation caused by the early cubist paintings, leading The New York Timesto comment: “What do they mean? Have those responsible for them taken leave of their senses? Is it art or madness? Who knows?”

Another example comes from physics. In the 1940s, different formulations of the theory of quantum electrodynamics were developed independently by the physicists Julian Schwinger, Shin’ichirō Tomonaga, and Richard Feynman. In their work, Schwinger and Tomonaga used a conventional algebraic approach, along lines similar to the rest of physics. Feynman used a more radical approach, based on what are now known as Feynman diagrams, for depicting the interaction of light and matter:

Image by Joel Holdsworth), licensed under a Creative Commons Attribution-Share Alike 3.0 Unported license

Initially, the Schwinger-Tomonaga approach was easier for other physicists to understand. When Feynman and Schwinger presented their work at a 1948 workshop, Schwinger was immediately acclaimed. By contrast, Feynman left his audience mystiﬁed. As James Gleick put it in his biography of Feynman:

It struck Feynman that everyone had a favorite principle or theorem and he was violating them all… Feynman knew he had failed. At the time, he was in anguish. Later he said simply: “I had too much stuff. My machines came from too far away.”

Of course, strangeness for strangeness’s sake alone is not useful. But these examples suggest that breakthroughs in representation often appear strange at ﬁrst. Is there any underlying reason that is true?

Part of the reason is because if some representation is truly new, then it will appear different than anything you’ve ever seen before. Feynman’s diagrams, Picasso’s paintings, Stravinsky’s music: all revealed genuinely new ways of making meaning. Good representations sharpen up such insights, eliding the familiar to show that which is new as vividly as possible. But because of that emphasis on unfamiliarity, the representation will seem strange: it shows relationships you’ve never seen before. In some sense, the task of the designer is to identify that core strangeness, and to amplify it as much as possible.

Strange representations are often difﬁcult to understand. At ﬁrst, physicists preferred Schwinger-Tomonaga to Feynman. But as Feynman’s approach was slowly understood by physicists, they realized that although Schwinger-Tomonaga and Feynman were mathematically equivalent, Feynman was more powerful. As Gleick puts it:

Schwinger’s students at Harvard were put at a competitive disadvantage, or so it seemed to their fellows elsewhere, who suspected them of surreptitiously using the diagrams anyway. This was sometimes true… Murray Gell-Mann later spent a semester staying in Schwinger’s house and loved to say afterward that he had searched everywhere for the Feynman diagrams. He had not found any, but one room had been locked…

These ideas are true not just of historical representations, but also of computer interfaces. However, our advocacy of strangeness in representation contradicts much conventional wisdom about interfaces, especially the widely-held belief that they should be“user friendly”, i.e., simple and immediately useable by novices. That most often means the interface is cliched, built from conventional elements combined in standard ways. But while using a cliched interface may be easy and fun, it’s an ease similar to reading a formulaic romance novel. It means the interface does not reveal anything truly surprising about its subject area. And so it will do little to deepen the user’s understanding, or to change the way they think. For mundane tasks that is ﬁne, but for deeper tasks, and for the longer term, you want a better interface.

Ideally, an interface will surface the deepest principles underlying a subject, revealing a new world to the user. When you learn such an interface, you internalize those principles, giving you more powerful ways of reasoning about that world. Those principles are the diffs in your understanding. They’re all you really want to see, everything else is at best support, at worst unimportant dross. The purpose of the best interfaces isn’t to be user-friendly in some shallow sense. It’s to be user-friendly in a much stronger sense, reifying deep principles about the world, making them the working conditions in which users live and create. At that point what once appeared strange can instead becomes comfortable and familiar, part of the pattern of thoughtA powerful instance of these ideas is when an interface reiﬁes general-purpose principles. An example is an interface one of us developed based on the principle of conservation of energy. Such general-purpose principles generate multiple unexpected relationships between the entities of a subject, and so are a particularly rich source of insights when reiﬁed in an interface..

What does this mean for the use of AI models for intelligence augmentation?

Aspirationally, as we’ve seen, our machine learning models will help us build interfaces which reify deep principles in ways meaningful to the user. For that to happen, the models have to discover deep principles about the world, recognize those principles, and then surface them as vividly as possible in an interface, in a way comprehensible by the user.

Of course, this is a tall order! The examples we’ve shown are just barely beginning to do this. It’s true that our models do sometimes discover relatively deep principles, like the preservation of enclosed negative space when bolding a font. But this is merely implicit in the model. And while we’ve built a tool which takes advantage of such principles, it’d be better if the model automatically inferred the important principles learned, and found ways of explicitly surfacing them through the interface. (Encouraging progress toward this has been made by InfoGANs, which use information-theoretic ideas to ﬁnd structure in the latent space.) Ideally, such models would start to get at true explanations, not just in a static form, but in a dynamic form, manipulable by the user. But we’re a long way from that point.

Do these interfaces inhibit creativity?

It’s tempting to be skeptical of the expressiveness of the interfaces we’ve described. If an interface constrains us to explore only the natural space of images, does that mean we’re merely doing the expected? Does it mean these interfaces can only be used to generate visual cliches? Does it prevent us from generating anything truly new, from doing truly creative work?

To answer these questions, it’s helpful to identify two different modes of creativity. This two-mode model is over-simpliﬁed: creativity doesn’t ﬁt so neatly into two distinct categories. Yet the model nonetheless clariﬁes the role of new interfaces in creative work.

The ﬁrst mode of creativity is the everyday creativity of a craftsperson engaged in their craft. Much of the work of a font designer, for example, consists of competent recombination of the best existing practices. Such work typically involves many creative choices to meet the intended design goals, but not developing key new underlying principles.

For such work, the generative interfaces we’ve been discussing are promising. While they currently have many limitations, future research will identity and ﬁx many deﬁciencies. This is happening rapidly with GANs: the original GANs had many limitations, but models soon appeared that were better adapted to images, improved the resolution, reduced artifactsSo much work has been done on improving resolution and reducing artifacts it seems unfair to single out any small set of papers, and to omit the many others., and so on. With enough iterations it’s plausible these generative interfaces will become powerful tools for craft work.

The second mode of creativity aims toward developing new principles that fundamentally change the range of creative expression. One sees this in the work of artists such as Picasso or Monet, who violated existing principles of painting, developing new principles which enabled people to see in new ways.

Is it possible to do such creative work, while using a generative interface? Don’t such interfaces constrain us to the space of natural images, or natural fonts, and thus actively prevent us from exploring the most interesting new directions in creative work?

The situation is more complex than this.

In part, this is a question about the power of our generative models. In some cases, the model can only generate recombinations of existing ideas. This is a limitation of an ideal GAN, since a perfectly trained GAN generator will reproduce the training distribution. Such a model can’t directly generate an image based on new fundamental principles, because such an image wouldn’t look anything like it’s seen in its training data.

Artists such as Mario Klingemann and Mike Tyka are now using GANs to create interesting artwork. They’re doing that using “imperfect” GAN models, which they seem to be able to use to explore interesting new principles; it’s perhaps the case that bad GANs may be more artistically interesting than ideal GANs. Furthermore, nothing says an interface must only help us explore the latent space. Perhaps operations can be added which deliberately take us out of the latent space, or to less probable (and so more surprising) parts of the space of natural images.

Of course, GANs are not the only generative models. In a sufﬁciently powerful generative model, the generalizations discovered by the model may contain ideas going beyond what humans have discovered. In that case, exploration of the latent space may enable us to discover new fundamental principles. The model would have discovered stronger abstractions than human experts. Imagine a generative model trained on paintings up until just before the time of the cubists; might it be that by exploring that model it would be possible to discover cubism? It would be an analogue to something like the prediction of Bose-Einstein condensation, as discussed earlier in the essay. Such invention is beyond today’s generative models, but seems a worthwhile aspiration for future models.

Our examples so far have all been based on generative models. But there are some illuminating examples which are not based on generative models. Consider the pix2pix system developed by Isola et al. This system is trained on pairs of images, e.g., pairs showing the edges of a cat, and the actual corresponding cat. Once trained, it can be shown a set of edges and asked to generate an image for an actual corresponding cat. It often does this quite well:

When supplied with unusual constraints, pix2pix can produce striking images:

This is perhaps not high creativity of a Picasso-esque level. But it is still surprising. It’s certainly unlike images most of us have ever seen before. How does pix2pix and its human user achieve this kind of result?

Unlike our earlier examples, pix2pix is not a generative model. This means it does not have a latent space or a corresponding space of natural images. Instead, there is a neural network, called, confusingly, a generator – this is not meant in the same sense as our earlier generative models – that takes as input the constraint image, and produces as output the ﬁlled-in image.

The generator is trained adversarially against a discriminator network, whose job is to distinguish between pairs of images generated from real data, and pairs of images generated by the generator.

While this sounds similar to a conventional GAN, there is a crucial difference: there is no latent vector input to the generatorActually, Isola et al experimented with adding such a latent vector to the generator, but found it made little difference to the resulting images.. Rather, there is simply an input constraint. When a human inputs a constraint unlike anything seen in training, the network is forced to improvise, doing the best it can to interpret that constraint according to the rules it has previously learned. The creativity is the result of a forced merger of knowledge inferred from the training data, together with novel constraints provided by the user. As a result, even relatively simple ideas – like the bread- and beholder-cats – can result in striking new types of images, images not within what we would previously have considered the space of natural images.

Conclusion

It is conventional wisdom that AI will change how we interact with computers. Unfortunately, many in the AI community greatly underestimate the depth of interface design, often regarding it as a simple problem, mostly about making things pretty or easy-to-use. In this view, interface design is a problem to be handed off to others, while the hard work is to train some machine learning system.

This view is incorrect. At its deepest, interface design means developing the fundamental primitives human beings think and create with. This is a problem whose intellectual genesis goes back to the inventors of the alphabet, of cartography, and of musical notation, as well as modern giants such as Descartes, Playfair, Feynman, Engelbart, and Kay. It is one of the hardest, most important and most fundamental problems humanity grapples with.

As discussed earlier, in one common view of AI our computers will continue to get better at solving problems, but human beings will remain largely unchanged. In a second common view, human beings will be modiﬁed at the hardware level, perhaps directly through neural interfaces, or indirectly through whole brain emulation.

We’ve described a third view, in which AIs actually change humanity, helping us invent new cognitive technologies, which expand the range of human thought. Perhaps one day those cognitive technologies will, in turn, speed up the development of AI, in a virtuous feedback cycle:

It would not be a Singularity in machines. Rather, it would be a Singularity in humanity’s range of thought. Of course, this loop is at present extremely speculative. The systems we’ve described can help develop more powerful ways of thinking, but there’s at most an indirect sense in which those ways of thinking are being used in turn to develop new AI systems.

Of course, over the long run it’s possible that machines will exceed humans on all or most cognitive tasks. Even if that’s the case, cognitive transformation will still be a valuable end, worth pursuing in its own right. There is pleasure and value involved in learning to play chess or Go well, even if machines do it better. And in activities such as story-telling the beneﬁt often isn’t so much the artifact produced as the process of construction itself, and the relationships forged. There is intrinsic value in personal change and growth, apart from instrumental beneﬁts.

The interface-oriented work we’ve discussed is outside the narrative used to judge most existing work in artiﬁcal intelligence. It doesn’t involve beating some benchmark for a classiﬁcation or regression problem. It doesn’t involve impressive feats like beating human champions at games such as Go. Rather, it involves a much more subjective and difﬁcult-to-measure criterion: is it helping humans think and create in new ways?

This creates difﬁculties for doing this kind of work, particularly in a research setting. Where should one publish? What community does one belong to? What standards should be applied to judge such work? What distinguishes good work from bad?

We believe that over the next few years a community will emerge which answers these questions. It will run workshops and conferences. It will publish work in venues such as Distill. Its standards will draw from many different communities: from the artistic and design and musical communities; from the mathematical community’s taste in abstraction and good deﬁnition; as well as from the existing AI and IA communities, including work on computational creativity and human-computer interaction. The long-term test of success will be the development of tools which are widely used by creators. Are artists using these tools to develop remarkable new styles? Are scientists in other ﬁelds using them to develop understanding in ways not otherwise possible? These are great aspirations, and require an approach that builds on conventional AI work, but also incorporates very different norms.

Joint work with Greg Rosenblatt.

He thinks less than he talks, and slower; yet he can see through a brick wall in time (as they say in Bree).

– Gandalf the Grey, about Barliman Butterbur.

From The Lord of the Rings: The Fellowship of the Ring by J.R.R. Tolkien

Barliman News

4 December 2016

Barliman overview

Barliman is a prototype "smart editor" that performs real-time program synthesis to try to make the programmer's life a little easier. Barliman has several unusual features:

Given a set of tests for some function foo, Barliman tries to "guess" how to fill in an partially-specified definition of foo to make all of the tests pass;
Given a set of tests for some function foo, Barliman tries to prove that a partially-specified definition of foo is inconsistent with one or more of the tests;
Given a fully or mostly-specified definition of some function foo, Barliman will attempt to prove that a partially-specified test is consistent with, or inconsistent with, the definition of foo.

Barliman is general enough to handle multiple programming languages. In fact, the user can even specify their own programming language, or change the semantics or syntax of one of the default languages that come with Barliman. The default language for Barliman is a small but Turing-complete subset of side-effect-free Scheme that supports recursion, list operations, higher-order functions, multi-argument and variadic functions, and a few other features.

Purpose of Barliman

Test the hypothesis that even very modest or slow program synthesis can be useful, if it is part of an interactive conversation with a programmer.
Explore utility of interactive synthesis.
Explore design for such an interactive tool.
Try to inspire others to build similar tools, perhaps using radically different implementation techniques.

Barliman in action

Here are a few screenshots of Barliman, using the Mac implementation as of June 4, 2016:

Update: I've added a few newer screenshots from June 16, 2016. Once Barliman stops changing as rapidly I'll update all the screenshots.

Update – 10 October 2016: Please see the interesting_examples directory for more recent examples and screenshots.

Update – 4 December 2016: The Clojure/conj 2016 talk is currently the best source of up-to-date information on Barliman.

The following descriptions are based on an older and far less capable version. TODO: update them.

The first screenshot shows the main editor window. The Scheme Definition edit pane contains the complete (fully instantiated) and correct definition of append, the list concatenation function in Barliman's default "miniScheme" language. append will be our simple running example in these screenshots. The edit window also contains three tests; each test contains an input expression, and the expected value of that expression. The Best Guess pane, which is not editable by the user, contains the same fully instantiated definition of append as in the Scheme Definition edit pane.

All the text in the editor window is black, indicating that all the information in the editor is consistent and valid. The definition of append is a valid symbolic expression (s-expression), and is indeed a syntactically valid miniScheme definition. The test expressions and expected values are syntactically valid, and consistent with each other.

The editor window displayed in this first screeenshot is similar in spirit to a modern integrated development environment (IDE) that runs tests whenever the code to be tested is modified.

Let's see how we might have gotten to the final version of append using Barliman.

Screenshot 1: Fully instantiated definition of append:

Screenshot 2 shows the empty main editor window, immediately after starting Barliman. We know we want to define append, so in true test-drived development style we begin by writing our tests cases.

Screenshot 2:

Screenshot 3 shows the main editor window with our three tests. The first test says that if we append the empty list to the empty list, we should get back the empty list. You should be able to figure out the other two tests.

The text for all three tests are red, indicating that none of the tests pass. This isn't surprising, perhaps, since we haven't started to define the append function.

(From an interface design standpoint, whether to use colors, which colors to use, which text fields to hilight, etc., are all open questions in my mind. Over time I hope to make it much more clear exactly which part of the code is failing, and why.)

Screenshot 3:

Screenshot 4 shows the main editor window after we have begun defining append in the Scheme Definition edit pane. Our parentheses are not balanced – we haven't yet typed a closing parenthesis for the define form. Because of the missing parenthesis, the definition is not a legal Scheme s-expression. The tests cannot pass, of course, since append isn't even an s-expression. Barliman recognizes this, and turns the text in the Scheme Definition edit pane, and the text in the test edit fields, a sickly green color.

(Future versions of Barliman should include a structured editor that will automatically insert balanced parentheses.)

Screenshot 4:

Screenshot 5 shows the main editor window after we have added the closing parenthesis in our partial definition of append in the Scheme Definition edit pane. The partial definition of append is now a legal s-expression. However, the definition of append is not syntactically valid according to the rules of miniScheme. Of course, this invalid definition of append causes all the tests to fail as well. Barliman recognizes this, and turns the text in the Scheme Definition edit pane, and the text in the test edit fields, red.

(Currently Barliman doesn't actually check that definitions are grammatically correct. Rather, Barliman uses evaluation of the tests to check whether code is semantically legal, rather than syntactically legal. Future versions of Barliman will probably include explicit grammars that are checked, in addition to semantic rules.) (Update: Barliman now includes a relational parser for the miniScheme language, as is shown in the newer screenshots at the end.)

Screenshot 5:

In screenshot 6 we can see that the programmer has partially specified the defintion of append. The definition is a syntactally-correct s-expression, and indeed is a syntactically correct use of miniScheme's define form. Importantly, the definition of append is only partially specified, and contains four (logic) variables (A, B, C, and D) representing unknown subexpressions. In Barliman variables representing unknown subexpressions are single-letter upper-case variables A through Z. (Note to Schemers: The comma (,) that usually occurs before these letters is necessary because the code in the Scheme Definition edit pane is implicitly quasiqoted.)

Given the partially-specified defintion of append in the Scheme Definition edit pane, along with the three tests, Barliman is able to correctly "guess" the code corresponding to these variables. The correct and complete definition of append is displayed in the Best Guess pane. Barliman guesses the correct code in this case in a second or less. All of the text in the main editor window is black, indicating that all of the code is syntactically correct, and that all three tests pass given the completed definition of append shown in the Best Guess pane.

Screenshot 6:

Screenshot 7 shows an incorrect partial definition of append. As in the previous screenshot, the partial definition of append contains variables representing unknown subexpressions (the A and B and C). However, in this case the first argument to cons is incorrect. The first argument to cons should be (car l), as shown in screenshot 1. Alternatively, the first argument to cons could be an incomplete expression containing a variable representating an unknown subexpression, such as (car ,B) from screenshot 6, provided that this incomplete expression is consistent with the expression (car l). Here, however, the first argument to cons is the expression (cdr l). The red text for tests 2 and 3 indicate that these tests are incompatible with the partial definition of append in the Scheme Definition edit pane. That is, there are no legal miniScheme expressions that could be substituted for the variables A, B, and C that would make tests 2 and 3 pass.

The spinning progress indicator to the upper-right of the Best Guess pane indicates that Barliman is trying to find expressions for variables A, B, and C that will make all of the tests pass. Of course this is impossible – Barliman should be a little smarter, and cut off the Best Guess computation when one of the individual tests fails.

The important thing about this example is that Barliman was able to prove that the partial definition of append is incorrect, without append being fully defined. More precisely, the partial definition of append is inconsistent with respect to tests 1 through 3 and the semantics of miniScheme (which can be edited by the programmer).

Screenshot 7:

Screenshot 8 shows another partially-instantiated, but incorrect, definition of append. The base case of append should be s instead of l, yet all the text is in black, indicating that the individual tests are compatible with the definition so far. The problem is that we don't have a test that exposes that this partial definition is wrong. We'll fix this in the next screenshot.

This is one danger of using tests for feedback, of course – in general, no finite number of tests is sufficient to prove our definition is correct. I hope that future versions of Barliman will include other ways to specify the behavior of programs, which might include specifying program properties, or providing a "reference" implementation of a function that is being redefined to perform faster, etc.

Screenshot 8:

In screenshot 9 we add a new test, test 4, that shows that the base case of append is incorrect. Sure enough, test 4's text immediately turns red, indicating it is incompatible with our partial definition.

Screenshot 9:

Screenshot 10 shows a limitation of Barliman's program synthesis. Here the partially-specified definition of append contains only a single variable, A, representing an unknown subexpression. Ideally Barliman would quickly figure out that A should be the expression (cdr l). However, for this example Barliman seems to get "stuck" – we can see the spinning progress indicators to the upper-right of the Best Guess pane and the Test 2 and Test 3 edit fields, indicating that Barliman is still "thinking". I let Barliman run for a minute or two, but it didn't find a value for A in that time. (Adding the notion of "parsimony" to Barliman, so it tries to generate the smallest terms first, might help with this example.)

We could allow Barliman to keep thinking – perhaps it would find the answer in five minutes, or in an hour (provided our computer has enough RAM!). However, in practice we would probably try filling in A manually. If we were to type (cdr ,B) in place of ,A, Barliman would immedialy guess in the correct, trivial subexpression l for the variable B.

This example shows how program synthesis in Barliman can be much slower than we might hope in certain cases. However, since Barliman is a text editor, and since multicore computers with lots of RAM are now ubiquitous, I see these examples from a "glass half full" perspective. Sometimes Barliman can help you, either by guessing the rest of your incomplete definition, or by proving that there is no completion for your partially-specified definition that is consistent with your tests. In this case you win. Sometimes Barliman can't help you, in which case you use it like a regular text editor. In this case you use more CPU cyles and RAM on your machine, but otherwise edit text normally.

Of course, Barliman isn't currently a particularly good text editor, especially compared to Emacs with paredit mode, to take one example. This problem is only a matter of engineering – in fact, Barliman-like functionality could be added to Emacs, or to another editor. Or Barliman could get more sophisticated editing abilities.

A bigger drawback is that the semantics for the language you are writing in must be specified in miniKanren. This is fine if you are writing in a minimal Scheme subset, such as miniScheme. This isn't so great if you want to program in full Clojure or Racket or Javascript or Ruby. Finding ways to scale this technology is an open problem. The solution may not be miniKanren or constraint logic programming, but rather another synthesis approach. I don't know. I do hope, however, that Barliman will make people think about how synthesis capabilities can be integrated into editors, especially for dynamic languages.

Screenshot 10:

Update: Here are a few newer screenshots, as of June 16, 2016, that show off the relational parser for miniScheme that I added a couple of days ago.

Screenshot 11 is an updated version of screenshot 5, showing the new relational parser at work. The definition of append is syntactically incorrect, which is represented by the purple text in the Scheme Definition edit pane. The tests are syntactically correct, but fail, and are therefore shown in red text.

Screenshot 11:

Screenshot 12, like screenshot 11, shows a syntactically incorrect definition of append. In this case the keyword lambda appears as the body of a lambda expression. In miniScheme, as in Scheme, lambda is a special form rather than a function; the keyword lambda cannot appear "by itself". Hence the purple text in the Scheme Definition edit pane.

Once again, the tests are syntactically valid, but fail, and so are shown in red text.

Screenshot 12:

Screenshot 13 is identical to screenshot 12, except that the formal parameter to the lambda expression in the Scheme Definition edit pane has been changed from x to lambda. This formal parameter shadows the lambda keyword, allowing lambda to appear by itself in the body of the lambda expression.

Once again, the tests are syntactically valid, but fail, and so are shown in red text.

This example shows that the relational parser keeps track of the environment, variable scope, and shadowing.

Screenshot 13:

Screenshot 14 shows a syntactically legal partial definition of append in the Scheme Definition edit pane. Three of the tests are syntactically legal, and are (individually) consistent with the partial definition of append; therefore, the text for these tests are shown in black.

The third test, however, is syntactically incorrect. This is because in miniScheme, as in Scheme, and is a special form rather than a function, and therefore cannot be passed into the call to append. Since the third test is syntactically illegal, it is shown in purple text.

Screenshot 14:

Advantages and limitations of Barliman

Advantages of Barliman

Barliman is flexible. Barliman can handle operational semantics for various programming languges. Users can add their own semantics, or modify the semantics for languages that are included with Barliman. Barliman does not require the language be statically typed, or that the user has supplied enough tests to fully synthesize the function being defined.

Barliman is interactive. Any change to the definition of a function, the corresponding tests, or even the semantics immediately re-triggers the program synthesis solver.

Limitations of Barliman

Barliman can be extremely slow when it comes to program synthesis, and can easily get "stuck", possibly taking hours and tens of gigabytes of RAM to synthesize small fragments of code. Since the default "miniScheme" language is dynamically typed, Barliman cannot take advantage of types to limit the space of programs to be considered during synthesis. There are other synthesis tools that can synthesize the complete definition of append, for example, given append's type signature along with tests that properly cover the behavior of append. (In fact, Michael Ballantyne has been able to synthesize append by integrating types into a tiny Scheme-like languge, which I'd like to explore in the context of Barliman.)

To me this is a tradeoff. Barliman is very flexible in its handling of languages and synthesis problems. At the same time, Barliman's synthesis is slow, which is why the tool is designed to work interactively with a programmer. I think this is a reasonable tradoff to explore, since there are plenty of dynamically-typed languages in use (Javascript, Python, Ruby, Scheme/Racket/Clojure/Lisp, etc.). Also, Barliman doesn't require that the user specify every test necessary to synthesize the complete definition of the function being considered, which reduces the burden on the programmer.

In short, Barliman is flexible, and can handle Turing-complete dynamically-typed higer-order languages, and under-specified synthesis problems, but the tradeoff is that Barliman's synthesis is slow.

Barliman works best for big-step operational semantics. It is possible to implement small-step semantics in Barliman. However, the synthesis features of Barliman are likely to work poorly compared with semantics written in a big-step style.

Similarly, Barliman works best for side-effect-free languages, such as a pure subset of Scheme. Once again, Barliman can handle languages with side effects, such as variable mutation. However, Barliman's synthesis abilities are likely to suffer as a result.

I do not know how large a language, or how large a definition, Barliman can handle in practice. I will be experimenting with this...

Barliman can be resource hungry. Given six example programs and a definition, Barliman will launch eight instances of Chez Scheme, all running in parallel. Barliman tries to kill these processes when they are not needed, but it is possible for these processes to run for long periods of time (like, forever) and take up unbounded amounts of RAM.

Barliman currently isn't a very good at standard text editing. For example, anyone used to paredit or structured text editing will miss those features in Barliman, at least for now. I do want to add these features to Barliman, especially since I expect they will make the synthesis aspects easier to explore.

Barliman currently doesn't support saving or loading files, definitions, tests, or anything else. I plan to add this feature soon.

Barliman is changing quickly, and definitely contains errors and interface quirks. To the best of my knowledge none of these problems are inherent in the design of Barliman, or the technology being used for synthesis. Still, since this is a rapidly evolving prototype, I expect I will be introducing errors about as quickly as I remove them, at least for a while.

How Barliman works

Barliman uses miniKanren (http://minikanren.org/), and a relational Scheme interpreter written in miniKanren (Byrd, Holk, and Friedman, 2012, http://dl.acm.org/citation.cfm?id=2661105 or http://webyrd.net/quines/quines.pdf), to provide real-time feedback to the code editor using program synthesis.

Chez Scheme in turn uses the miniKanren implementation and relational interpreter implementation contained in the mk-and-rel-interp directory.

The default "miniScheme" language

(give grammar and semantics for the default language)

As in Scheme, in miniScheme duplicate variable names of definitions at the same scoping level, or duplicate lambda or letrec bindings, are illegal. However, Barliman does not currently detect these violations. For example, Barliman will not complain about the expression ((lambda (x x) x) 3 4), the behavior of which is unspecified. Probably the parser should enforce that the variable names are distinct.

The lambda and letrec forms do not contain an implicit begin.

The lambda form supports multiple arguments, (lambda (x y z) y), and a single "variadic" argument, (lambda x x), but currently doesn't support the full Scheme variadic syntax, (lambda (x y . z) x).

Barliman implementation details

The cocoa version of the editor is written in Swift, and has been tested under OS X 10.11.4 and XCode 7.3.1. Eventually the editor will be crossplatform. I'm starting with cocoa since I'm developing on a Mac, and I want to make sure I don't box myself into a corner with the user interface/performance as I experiment with the design and the interface. The cocoa version of Barliman calls out to Chez Scheme (https://github.com/cisco/ChezScheme), which must be installed separately, and which is assumed to reside in /usr/local/bin/scheme.

IMPORTANT: The cocoa version of Barliman does its best to clean up the Scheme processes it launches. However, it is wise to run top -o cpu from the terminal after playing with Barliman, to make sure errant Scheme processes aren't running in the background. If these tasks are running, you can kill them using kill -9 <pid>, where <pid> is the process identifier listed from the top command.

To use the cocoa version of Barliman, first build and launch the application from XCode. The application currently has a single window, with a single editable pane. Enter Scheme expressions that run in the relational Scheme interpreter, such as:

((lambda (x) x) 5)

Each time the text changes, a Chez Scheme process will be launched, attempting to evaluate the expression in the relational Scheme interpreter. If the expression is not a valid Scheme s-expression, Chez will complain, an Baliman will display the code in red. If the expression is a legal s-expression, and the corresponding miniKanren query to the relational interpeter succeeds, the value of the query will be displayed below the editable text box. If the query fails, the empty list will be displayed in the text box.

For more interesting answers, you can use the logic variables A through G, upper-case. Make sure to unquote the logic variables:

((lambda (x) ,A) ,B)

Acknowledgements and thanks

Thanks to Michael Ballantyne, Kenichi Asai, Alan Borning, Nada Amin, Guannan Wei, Pierce Darragh, Alex Warth, Michael Adams, Tim Johnson, Evan Czaplicki, Stephanie Weirich, Molly Feldman, Joe Osborn, Nehal Patel, Andrea Magnorsky, Reid McKenzie, Emina Torlak, Chas Emerick, Martin Clausen, Devon Zuegel, Daniel Selifonov, Greg Rosenblatt, Michael Nielsen, David Kahn, Brian Mastenbrook, Orchid Hybrid, Rob Zinkov, Margaret Staples, Ziyao Wei, Matt Hammer, Hunter Hutchinson, Bryan Joseph, Cesar Marinho, Michael Bernstein, Bodil Stokke, Dan Friedman, Ron Garcia, Rich Hickey, Phil Wadler, Tom Gilray, Dakota Fisher, Gavin Whelan, Devon Zeugel, Jonas Kölker, Matt Might, participants of my 2016 PEPM tutorial on miniKanren, and particants of the 'As We May Thunk' group (http://webyrd.net/thunk.html), for suggestions, encouragement, and inspiration.

Thanks to Kent Dybvig, Andy Keep, and Cisco Systems for releasing Chez Scheme under an open source license.

The definition of letrec in the main interpreter is based based on Dan Friedman's code, using the "half-closure" approach from Reynold's definitional interpreters.

Greg Rosenblatt has been improving the search and the miniScheme interpreter to improve synthesis performance, greatly improving performance on many of the synthesis problems.

Barliman is intended to be an improved version of the very crude 'miniKanren playground' I showed at my 2016 PEPM tutorial on miniKanren: https://github.com/webyrd/minikanren-playground

Barliman TODOs and limitations

TODO:

Matt Might suggests using properties like (append (cons a l) s) == (cons a (append l s)) for synthesizing append.
Devon Zeugel suggests using a monospace font, and perhaps swapping the tests and the definitions in the main window. Devon suggests using the number of tests that pass to guide search for auto-repair or other synthesis. Or, to try to maximize the number of tests that pass, then have the user take over. Maybe use the number of tests that pass as a score/heuristic for stochastic search.
Tom Gilray suggests being able to hover over a ,A logic variable to select/approve a suggested/guessed value for that particular subexpression. Michael Ballantyne and other people have suggested similar notions, including a scrubber for scrubbing over/selecting a generated/guessed value for a subexpression.
replace test input/output edit fields with multi-line edit capabilities similar to that of the 'Definitions' pane
add paren hilighting to editor
add "smart delete" of parens
add add auto-indent
add forward/backward s-expression
add transpose s-expression
add pretty printing of "Best Guess" definitions
add smart editing/auto insert of gensyms in the test edit panes, similar to how smart editing/auto insert of logic variables works in the Definitions edit pane
for 'syntax error' and 'illegal sexpression' messages for a test, potentially show whether the input, the output, or both is the problem (could be complicated in that the output might be an illegal sexpression, while the input is a syntax error, for example)
have Barliman attempt to guess the result of a test, as the programmer types in the test (thanks Ziyao Wei!)
show the definition guessed for each individual successful test
show reified test inputs and outputs upon success, for all tests (would allow queries like 'quines')
mouse hover over ,A variable should display the variable's "Best Guess" value
allow resizing of Barliman main window
add let and cond.
add better error message for 'invalid syntax', at least indicating whether there is an unexpected paren/missing end paren
Possibly replace list call in the "best quess" query with nested cons calls instead. (Need to time this again with Greg's new improvements to the search.) This can be an order of magnitude faster in some cases, according to my testing (variadic application is more expensive than 'cons' in the current miniScheme interpreter, apparently: see times for append-gensym-synthesis-with-cons-1 versus append-gensym-synthesis-with-list-1 tests in test-interp.scm).
add an implicit begin to lambda, letrec, and let forms.
parser should enforce that the variable names are distinct in lambda formals, letrec bindings and formals, and define's within the same scope.
create a version of Barliman on an open platform (Electron, Clojurescript, Lighttable, whatever). Any help would be appreciated! :)
consider using ulimit or some other capability for keeping the running Scheme processes under control/keep them from using all the RAM and CPU cycles
consider adding fields for seeing the ground results of partially-instantiated test inputs/outputs
add full variadic syntax: (lambda (x y . z) x)
consider turning the background of the "guess" pane green, or otherwise indicting the user, when a guess can be made. Could also potentially change the code in the main definition edit pane, although this may not be friendly.
add STLC as an example, complete with type inferencer
perhaps be able to drag and drop subexpressions from the best guess pane onto variables in the definition pane. And also be able to replace an extort subexpression in the definition pane with a logic variable.
think about contextual menus/right click and also drag and shift-drag. What should these do?
make sure Semantics and the main Barliman windows can be reopened if the user closes them! Currently there doesn't seem to be a way to get the window back. Perhaps allow the user to hide the windows, but not close them? What is the preferred Mac way?
for the case in which a simple function is being used to generate test inputs and answers for a more complex version of the same function, may need or want a grounder to make sure answers are fully ground. May also want a grounder for code, esp for the best guess pane. Although grounding code may not be necessary or ideal.
would be smart to only re-run Scheme processes when the Scheme code actually changes– for example, white space characters outside of an S-expr shouldn't trigger re-evaluation. One way would be to compare "before" and "after" S-exprs to see if anything has changed. Could run a single Scheme instance and call equal? to see if the code has actually changed. This could be a big win for expensive computations.
add ability to save and load examples/tests/semantics, and include interesting examples, such as a tiny Scheme interpreter written in Scheme, state machine using mutual recursion, examples from pearls, etc.
add structured editor for semantics and for type inferencer (as an alternative to/in addition to the free-form editor)
possibly move as much work as possible into NSTasks, such as loading files.
possibly add pairs of tests as processes, once individual tests complete successfully
add syntax-directed auto-indentation of code
figure out how to do syntax-directed hilighlighting, and precise hilighting of syntax errors. May not be as important if I go the structured editor route. Although perhaps this should be an option, either way.
add documentation/tutorial
add paper prototype for desired features
move 'barliman-query.scm' temporary file to a more suitable location than 'Documents' directory, or get rid of the temp file entirely
experiment with store passing style and small step interpreters
get rid of hardcoded path to Chez executable
add input/output examples
find a cleaner and more flexible way to construct the program sent to Chez
add "accept suggested completion" button
would be smarter/less resource intense to not launch all the tests again when the text in a single test changes. Only that test and allTests need be re-run, in theory. Getting the UI to display the state of everything properly may be a little subtle, though.
differential relational interpreters
use a meta-interpreter to let the programmer know the deepest part of the search path upon failure, to try to give a better hint as to what went wrong (thanks Nada! and halp! :))

LONGER TERM:

Devon Zeugel suggested looking at Mutant (https://github.com/mbj/mutant).
mousing over a failing test should highlight subexpressions in the Definitions pane that are incompatible with that test.
mousing over a subexpression should hilight any tests which would be incompatible with the definitions, were a logic variable to be substututed for the expression being moused over. (perhaps do this only if a modifier key is help down)
improve editor so that typing '(' 'cons' auto completes to '(cons ,A ,B)', based on arity of cons (unless cons is shadowed).
consider placing each of the 'definition' forms in its own edit window, with 'uses mutattion', 'uses call/cc', 'well-typed' checkboxes for each definition (inspired by Kenichi Asai's tool for teaching functional programming).
try adding contracts/properties/specs. For example, for append, could add the property that the sum of (length l) and (length s) must be equal to (length (append l s)). This could work with randomized testing, even for partially-instantiated definitions. In the case of length, would either need to use Oleg numbers, or CLP(FD).
related to properties, might want generators, such as a loso that generates flat lists of symbols, for example, or lovo, that generates flat lists of values, or treevo, that generates trees of values. Could use these generators for specifying and testing properties. One simple, "type" property is that append should work on any two lovos, and, in this case, return of lovo. Could extend this to talk about the lengths of the lovos, etc. Could then either enumerate or randomly generate lovos QuickCheck style to try to find counter-examples with respect to the current partial (or complete) definition, or perhaps to help with synthesizing the actual code.
automatic test generation/fuzzing
add arithmetic to the main interpreter
explore incremental computing with the editor
add type inferencer
test generation of typed test programs
partial evaluation of the interpreter to speed up evaluation
add support for macros
explore predicates/generators/QuickCheck-like functionality
explore other synthesis techniques, model checking, etc., as alternatives or additions to the miniKanren-based program synthesis in Barliman
add tree automata support to support grammars
add abstract interpretation for miniKanren to speed up the synthesis
use stochastic/probabilistic extensions to miniKanren to improve synthesis capabilities. For example, see:

Eric Schkufza, Rahul Sharma, and Alex Aiken. 2013. Stochastic superoptimization. In Proceedings of the eighteenth international conference on Architectural support for programming languages and operating systems (ASPLOS '13). ACM, New York, NY, USA, 305-316. DOI=http://dx.doi.org/10.1145/2451116.2451150 https://cs.stanford.edu/people/sharmar/pubs/asplos291-schkufza.pdf

POSSIBLE USE CASES:

write simple implementation of a function, generate test from that function, then use those tests to guide the more sophisticated implementation. Or more generally, continually test the partially-implemented function vs the fully implemented but perhaps less efficient function.
Matt Might suggests as a use case, "automatic program repair after bug discovery," and points to work by Stephanie Forrest. I really like this idea. Here's how the use case might work:

You write tests and code. The tests pass. Later you find an error in the code, so you go back and add more tests, which fail.

Click a Barliman 'auto-repair' button. Barliman tries, in parallel, removing each subexpression and trying synthesis to fill in the rest.

If Barliman could use a Amazon server with dozens of hardware cores and 2TB RAM (like the new X1 server on AWS), this really could be done in parallel.

Or run locally until there's a timeout, then run again with the holes in other places. Could even try pairs of holes to keep the synthesis problem as small as possible.

Or, perhaps more practical short term until Barliman's synthesis improves...

Have Barliman try removing each subexpression and then check if any of the tests still fail. Then hilight these known bad subexpressions to help guide the user.

Greg Rosenblatt's suggestion for auto-repair: "The user may also want to mark some regions of the code as suspect, which would prioritize the area searched for problematic sub-expressions. If the user is right, the fix could be found much sooner."

SUSPECT IDEAS:

could just call out to Scheme one the program becomes grounded. However, the semantics and even the grammar may not match that of the interpreter used by miniKanren, so this seems difficult or impossible to do properly. However, could call a (non-relational) interpreter for miniScheme.

INTERESTING IDEAS:

Tom Gilray suggests using a simplified intermediate representation (IR) that disallows shadowing, has if but not cond, etc. Could have the IR be the macro expanded code. Could possibly reverse engineer/infer macro calls that could have produced the IR.
Tom Gilray suggests changing the interface to just have a single editor window, which allows for definitions and for test calls/results. I suspect this is the right way to go, although it will involve significant changes to Barliman. Tom also suggests having arrows to the right of each logic variable, showing the current value of each variable.
perhaps use delayed goals to implement arithmetic over floating point numbers, and other tricky operations. If the arguments do not become instantiated enough, Barliman should be non-commital (can't synthesize code, and can't prove tests are not consistent with the code until the code is more instantiated).
Greg Ronsenblatt suggests dividing tests into a 'training' set and a 'test' set, as is done in machine learning to avoid overfitting. Of course this could also lead into 'propety-based testing', generators, etc.
Jonas Kölker suggests synthesizing multiple definitions, or perhaps even all of the Haskell Prelude-style list functions, simultaneously, based on the relationships between the functions, and their Quickcheck-style properties. He also suggests using properties like reverse (xs++ys) == reverse ys ++ reverse xs and map f (xs++ys) == map f xs ++ map f ys for synthesis.

KNOWN LIMITATIONS:

The Best Guess pane cannot be resized vertically, which sometimes cuts off text.
Non-specific error indication (the color changes for the text, but does not show which part of the text caused the error)
Currently the UI only supports 6 tests. Should allow more tests to be added.
Test inputs and outputs are NSTextFields rather than NSTextViews, which makes writing longer and more complicated tests awkward.

KNOWN ERRORS:

let expressions are not parsed properly. In particular (let ((x)) y) parses. Probably other expressions are also parsed a little too laxly.
Shadowing of syntax is no longer working. (and and #t) => #t should result in a syntax error, unless the Definitions pane contains a defn named 'and': (define and (lambda x x)). In which case, (and and #t) should be legal syntax. (Wonder if we broke this when we messed with the environment in evalo.)
It is possible, rarely, to exit Barliman and still have a Scheme process running in the background. Need a way to better track which processes have been started and make sure to kill them. Or potentially use something like ulimit when launching a process.
The miniKanren queries constructed by Barliman expose several local variable names and a number of global variable names that could accidentally or intentionally be used by the programmer. Need to tighten this up.
closing one of the windows means the window cannot be reopened! Oops. I'm not going to worry about this until I decide what to do with the Semantics window.
sometimes the spinners stop spinning but are still visible

DONE (features on the TODO list implemented since the original release of Barliman)

Implemented monospace font, as recommended by Devon Zeugel.
Fixed error: An illegal s-expression in the 'Definitions' edit pane will make test input/output expressions that are legal expressions appear to be illegal.
Fixed error (by removing auto-insert of right parens): auto-insert of right parens and auto-insert of logic variables breaks 'undo'. Guess I need to learn how 'undo' works in Cocoa...
Fixed error: The '0' key no longer seems to work.
Fixed error: (lambda (,A ,B) ...) should always produce a disequality constraint between A and B.
Fixed error: Will modified the miniKanren reifier to remove unnecessary constraints involving gensyms. Alas, he goofed it, and the reifier removes too many constraints, including some =/= and absento constraints in play when gensyms are used.
Fixed error: scoping problem introduced by optimizing variable lookup.
Fixed error: Inserting a new logic variable with Control- doesn't replace highlighted text. Instead, it adds the variable and keeps the hilighted text.
changed reifier and main edit window to display constraints separately from "best guess" definition(s).
fixed: quickly tab-deleting a test with a syntax error (for example) can leave the 'syntax error message' on an empty test
added automatic addition of right parens, and auto-addition of logic variables (thanks Michael Ballantyne, Guannan Wei, Pierce Darragh, Michael Adams, for discussions on how this might work)
changed reifier so that constraints involving gensym are not displayed.
Fixed error: even if the "best guess" query terminates with success, individual test processes may still keep running, since those tests don't "know" that the best guess involving all the tests has succeed. If the best guess query terminates with success, the individual test processes should be killed, and marked as successful (black text, stop the progress spinner).
updated letrec to allow for zero or more bindings, and updated begin to allow for zero or more definitions; this allows the creation of mutually-recursive functions.
define grammar for microScheme (and the other languages) as a miniKanren relation, and use this grammar to separately check and report whether the definition is grammatically correct.
cancel the allTests operation if any single test fails, since in that case allTests cannot possibly succeed
wait part of a second to see if there are more keystrokes before launching Scheme processes. Sort of like XCode (I assume XCode is doing this). Would be more resource friendly, less distracting, and would make typing quickly more responsive. Could probably do this using an timer.
add ability to change the evaluator rules and perhaps an explicit grammar as well

Syntax	Meaning	Example	matches
a	Matches the specified character literal	q	q
*	Matches 0 or more of the previous character	a*	“”, a, aa, aaa
?	Matches 0 or 1 of the previous character	a?	“”, a
.	Matches any character literal	.	a, b, c, d, e …
^	Matches the start of a string	^c	c, ca, caa, cbb …
$	Matches the end of a string	a$	ba, baaa, qwerta …

State	Average deduction	2016 margin
New York	$84,964	Clinton +22%
California	$64,771	Clinton +30%
Connecticut	$61,997	Clinton +14%
New Jersey	$51,259	Clinton +14%
Washington, D.C.	$51,202	Clinton +87%
Minnesota	$46,591	Clinton +2%
Oregon	$44,928	Clinton +11%
Massachusetts	$42,890	Clinton +27%
Vermont	$41,641	Clinton +26%
Maryland	$41,622	Clinton +26%
Wisconsin	$40,962	Trump +0.8%
Rhode Island	$39,394	Clinton +16%
Maine	$38,089	Clinton +3%
Hawaii	$37,219	Clinton +32%
Idaho	$36,641	Trump +32%
Illinois	$36,374	Clinton +17%
Nebraska	$35,957	Trump +25%
Montana	$35,823	Trump +20%
Arkansas	$35,755	Trump +27%
Ohio	$35,666	Trump +8%
Missouri	$35,459	Trump +19%
Kentucky	$35,172	Trump +30%
Iowa	$32,939	Trump +9%
Georgia	$32,881	Trump +5%
North Carolina	$32,574	Trump +4%
Delaware	$31,950	Clinton +11%
Pennsylvania	$31,708	Trump +0.7%
Virginia	$31,643	Clinton +5%
West Virginia	$31,387	Trump +42%
Utah	$30,959	Trump +18%
Michigan	$30,204	Trump +0.2%
South Carolina	$29,916	Trump +14%
Indiana	$29,044	Trump +19%
Kansas	$27,867	Trump +20%
Oklahoma	$27,798	Trump +36%
Colorado	$27,644	Clinton +5%
Florida	$25,974	Trump +1.2%
Wyoming	$25,952	Trump +46%
Arizona	$25,855	Trump +4%
Mississippi	$24,109	Trump +18%
New Hampshire	$23,495	Clinton +0.4%
New Mexico	$23,471	Clinton +8%
Nevada	$23,300	Clinton +2%
Louisiana	$23,220	Trump +20%
North Dakota	$21,454	Trump +36%
Alabama	$21,218	Trump +28%
Texas	$18,214	Trump +9%
Washington	$16,392	Clinton +16%
South Dakota	$15,943	Trump +30%
Tennessee	$15,886	Trump +26%
Alaska	$9,131	Trump +15%

The Requirements

The Solution

Kubernetes

Google Cloud Dataflow

Costs

Summary

“After bathing, I have the feeling that I have been on vacation for a week,” said Natalia Karpushkina, a 42-year-old who runs a local walrus club.

“When they dug the foundations of the buildings, they found the bones of mammoths,” said Vladimir Larin, a scientist who lives in Norilsk.

Most of the work and leisure takes place indoors, particularly in the winter period of perpetual darkness.

Instant highlights

Sub-second microtrends and metric correlation

Intuitive drill-down-oriented workflow

All the details when you need them

Conclusion

What’s next?

What are computers for?

Using generative models to invent meaningful creative operations

Interactive Generative Adversarial Models

Two models of computation

Finding powerful new primitives of thought

Do these interfaces inhibit creativity?

Conclusion

README.md

Barliman News

Barliman overview

Purpose of Barliman

Barliman in action

Screenshot 1: Fully instantiated definition of append:

Screenshot 2:

Screenshot 3:

Screenshot 4:

Screenshot 5:

Screenshot 6:

Screenshot 7:

Screenshot 8:

Screenshot 9:

Screenshot 10:

Screenshot 11:

Screenshot 12:

Screenshot 13:

Screenshot 14:

Advantages and limitations of Barliman

Advantages of Barliman

Limitations of Barliman

How Barliman works

The default "miniScheme" language

Barliman implementation details

Acknowledgements and thanks

Barliman TODOs and limitations

The Problem

Matching One Character

Matching Same Length Strings

The $ Character

The ^ Character

Matches Starting Anywhere

The ? Character

The * Character

Refactoring

Conclusion

Installation

Motivation to learn Prolog (or any other language)

Logical programming

Imperative vs declarative programming

About Prolog

Language Constructs

Basic facts and queries

Unification

Lists

The Prolog shell

Recursion

Four colour theorem

What Next?

Further resources

Footnotes

Getting Started

Retrieving Statistics

Dynamically Scaling Backend Servers

Updating Maps

Updating GeoIP Databases

Updating Stick Tables