When StarCraft came out in 1998, DeepMind artificial intelligence (AI) researcher Oriol Vinyals was a child in Spain. He fell in love with the game and rose through the competitive ranks, even placing third at the World Cyber Games (WCG) qualifiers. Since then, he’s become one of the foremost minds in AI research. His current project? Developing a bot, or “agent,” that can master StarCraft II.
Oriol: I’m Oriol Vinyals, a research scientist at DeepMind. I lead the StarCraft II effort. My focus is to advance the state of the art of artificial intelligence. StarCraft is a very nice vehicle for research—in fact, I did part of my PhD at Berkeley on StarCraft.
Oriol: A friend of mine said, “there’s a group of people at Berkeley who are going to enter this competition, AI versus AI, and since you used to play the game in a quite intense way, it would be great if you could go and see what was up with their approach.”
They were building a bot based on the Zerg Mutalisk. I started playing against the agent . . . I think they named me the “coach.”
That project’s approach to the game was based on expert-based rule systems. “We’re going to build a lot of Mutalisks, and we need to have a build order that hopefully will be resilient against a bunch of early rushes, and expands enough to keep up with production,” and so forth. That approach was very programmatic, though the actual Mutalisk micro was learned. We actually . It was a lot of fun.
Oriol: DeepMind is building what people call “AGI”—artificial intelligence. You’re not specifically building an agent to play one game, but you want to understand what the learning paradigm is, so that this agent could play game without much prior knowledge. I thought it would be very challenging, and quite fun to build a bot where, instead of writing the rules, we just provide the agent the screen. “Here is the mouse and a keyboard. Go ahead, start interacting with the game, try to get better at it.”
Oriol: The game definitely offers certain challenges for AI. In Go, you always see the whole board, whereas in StarCraft you don’t, so you have to scout . . . and then of course the interface, it’s a great testbed to see if you can have an agent that can interact with a game in a point-and-click way, versus the fourteen actions we had on Atari. It’s quite a nice challenge.
Oriol: There are definitely things that, because of how you train these models, are very apparent. Might seem almost obvious in retrospect. For instance, one action is to move the camera, to look around the map. It turns out that random agents will move the camera away from their base and never go back to see what they should be looking at, which is their base, to build buildings and so on.
Just something as simple as that, for humans—the concept of the camera, that it helps to look in this minimap in the bottom left—those agents were all over the place, clicking the minimap and of course not getting anywhere, and suddenly they would be lucky enough to click back on their base, but their next action would be: select all the workers and send them away somewhere.
It's almost painful to see. From here they really need to start getting some signal, some reward. Hopefully they get lucky sometimes and they do something that is good, and then—only then—they can start learning. Unlike Atari games, where you’ll very quickly do something reasonable, StarCraft has such an exponential action space that it is quite hard to get off the ground, especially in the unrestricted full-game setup.
We released a set of mini-games—simplified versions of StarCraft. We sliced certain aspects of the game into maps that consist of, let’s say, “expand and build a lot of workers,” or “move units around and try to cover as much of the map as possible,” and so on. In the minigames, we have agents that were able to learn the basics of moving units around, things like combat situations…
Oriol: There’s this map where there are two Marines, and their mission is picking up minerals spread around the map. One thing that is surprisingly difficult for the agent to figure out is that it should use the Marines independently. But what the agent able to learn, was to move the Marines with Patrol. And I didn’t know this, but Patrol keeps the distance between the Marines consistent, allowing them to pick up additional minerals while still being controlled simultaneously. That’s the first time I said, okay, I just learned something new about .
Oriol: I don’t know. I definitely think the approach itself is very scalable. If you build the bot in the 2010 way that we did at Berkeley, the bot will do one build order, or two or three, but it doesn’t scale very much. At the end of the day, someone can study how it plays, and expose weaknesses. What I like about our approach is that, if it all works out, the agent has learned a very large variety of tactics and counters that couldn’t possibly be programmed, in the same way you couldn’t program a very good Go player.
In terms of beating the very best—I don’t know. That’s to be seen. I cannot predict whether we will be able to beat them or not.