The capabilities of artificial intelligence have grown by leaps and bounds in the past half-decade. Some of this is driven by improvements in algorithm design, some by hardware, but the results are on the Internet for just about anyone to see: Facebook’s face recognition and Apple’s autocomplete are both supported by neural networks. And AI seems to be breaking new ground daily: thrashing Go champions, managing funds, and even writing news stories.
So is fiction next on the automation agenda? Like most of us on Team Human, I’m going to say no, at least for now. Here’s why.
Neural networks do have a legitimately eerie ability to mimic the surface features of text, where “surface” actually goes fairly deep. Andrej Karpathy’s modern classic on deep learning, “The Unreasonable Effectiveness of Recurrent Neural Networks,”, presents the results of a Shakespeare generator that creates remarkably plausible formatting, vocabulary, character names, grammar, and even meter:
Why, Salisbury must find his flesh and thought
That which I am not aps, not a man and in fire,
To show the reining of the raven and the wars
To grace my hand reproach within, and not a fair are hand,
That Caesar and my goodly father’s world…
In my spare time, I trained a “Chaucerbot” on the Canterbury Tales to do something similar, with similar results:
With herte holy lotinge of the bagere
His wordes in my fekken it verealesage
Of the somm we good us, able noon ale up oyn,
wondo nat see clepte, in the pers, but,
See him proude, and doon the poina the of ese the boles.
No free mater, som a bren wef comes s hath it onle to lighge.
Chaucerbot didn’t train very long, so its Middle English isn’t as convincing as Karpathy’s Shakespearean English, but you get the idea.
Let’s check out another couple of examples. Robin Sloan (author of Mr. Penumbra’s 24-Hour Bookstore) talks about his experiment [training a neural network to generate 20th-century science fiction magazine stories and hooking it into a text editor. You can see the results in his GIFs—it’s the same sort of stuff; give it a chunk of “seed” text, it returns something back. What comes back, here, is grammatical modern English with a distinctly science-fictional choice of words (although “the high goathemaker” is a little desperate even for the pulps). But it’s not necessarily a sensible choice of words—what do servo-robots have to do with two men’s enmity? What do a woman’s (grinning?) arms have to do with sunrise?
One more. The “not a poet” Ross Goodwin has a couple of articles on art and machine-generated text that are worth reading in their entirety, but let’s focus on the second one, which begins with a brief disquisition on Sunspring, a science fiction screenplay produced by a neural network trained on science fiction screenplays. Here’s a stage direction from Sunspring and Goodwin’s commentary on its interpretation by the crew:
He is standing in the stars and sitting on the floor. He takes a seat on the counter and pulls the camera over to his back. He stares at it. He is on the phone. He cuts the shotgun from the edge of the room and puts it in his mouth. He sees a black hole in the floor leading to the man on the roof.
The machine dictated that Middleditch’s character should pull the camera. However, the reveal that he’s holding nothing was a brilliant human interpretation, informed by the production team’s many years of combined experience and education in the art of filmmaking…
Which is as good a place as any to stop and talk about where we are.
It’s tricky to say where, in language, form stops and meaning begins. But wherever that boundary may be, it’s easy to see that AI hasn’t made it all the way across. These networks can learn spelling, formatting, some grammar and meter, even diction and vocabulary. But they don’t understand how events go together. It doesn’t understand that you can’t stand and sit at the same time, that Salisbury and Caesar don’t go together, that you can’t take your eyes out of your mouth.
Francois Chollet, the author of a popular deep learning software library, recently posted an essay on “The limitations of deep learning” that gets at the heart of this. Chollet really understands the math in play, and I don’t, but his basic insight is this: All neural networks do is learn to warp points in one high-dimensional space into points in another high-dimensional space. That’s it. Any association that can’t be represented this way can’t be learned by a neural network—and it is very hard to represent reasoning and abstraction this way.
Related: People are good at making long-range connections, seeing similarities in things that are superficially very different. This is arguably one of the core elements of creativity. In contrast, neural networks have trouble with inputs that aren’t close to things they’ve trained on. If a neural network doesn’t see rap and musicals and the American Revolution in close proximity in its training data, it’s never going to produce them together when you let it off the leash, and that means it’s never writing Hamilton unless it’s already seen Hamilton (which, at this point in history, is presumably unavoidable). And, of course, actual humans synthesize so much more than just fiction when we produce fiction; we have relationships, emotions, and experiences, and even sensory processing streams (smell, taste, touch) that have nothing resembling analogues in computers right now.
AI is more than neural networks, of course, and there’s no reason to think we’ll stop getting better at what we can program computers to do. But, in my mind at least, the current state of the art in machine learning doesn’t threaten or even have a path to threatening human narrative creativity. We might be better off instead, as Ross Goodwin suggests, using the strange productions of computers as inspiration—not in the spirit of a muse, but in the spirit of the I Ching or the tarot, a way of throwing the mind open to a logic it can’t produce on its own, where eyes come out of mouths and Salisbury shares a stage with Caesar and men sit on the floor while standing in the stars.