← Writing

Using ML to Play Pokémon Showdown

A Deep-COVID Project

December 2020

The Pandemic

It was sticky. It was hot and constrictive and relentless. It was the summer of 2020. I was trapped in a tiny New York apartment without very good air-conditioning, aging. I needed a win. America needed a win.

I played a lot of Pokémon Showdown during the pandemic. It's a website that lets you build your own team of Pokémon and battle others. You can experience all the fun of Pokémon battling with different teams without doing all the work to build the teams in-game. You can also opt to just be assigned a random team and battle others with random teams, which is what I normally do.

When I took my first machine learning class in college, I joked about training my computer to make me better at Showdown. Now, in my darkest hour, playing the game constantly, the idea returned to me — half-seriously this time. Did I finally have the programming and machine learning skills to make this happen? What if I failed? Could I really be the one to pull this off? I decided to risk it all and go for it.

The API

My goal for this project was always to stand up a bot capable of autonomously playing a generation 1 (simplest) Pokémon battle that used ML to make decisions. Simple ML would be fine. I would make the program modular, so changing the ML algorithm would be no harder than setting the project up with the new algorithm initially. It would be slightly harder to make new predictors, but I was still fine with a manageable set of predictors in the beginning.

The most difficult part, therefore, became dealing with the Showdown interface. I found an incomplete API wrapper and used that to start — thank you, ckw017. The wrapper was good at initiating battle rooms and sending messages, but it was not able to parse more complex battle information. When it didn't know what to do with some piece of information, it just dumped the JSON into the terminal. I wrote my own child object in Python to extend the API. I played a lot of battles, and every time something fell through the cracks I wrote another few lines of code to deal with that outcome. Now I think I have a wrapper that can parse any outcome that can happen in a gen 1 Pokémon battle. I also wrote an object that contains the entire state of the battle at any time, which my wrapper manipulates. Finally, I just needed to write a method that would tell the wrapper what move to make next given the state of the battle. I also had to learn how to use async/await properly.

The ML

What was I really trying to optimize for? A win. But usually any single move does not immediately lead to a win. I needed a simple (at least at first) metric that could apply to any specific move. Inspired by finance, I created the "net present win" metric, which is zero if no win is achieved, 1 if a turn results in a win, and decreases with each additional turn it takes to win. So after each battle, for each turn, I can calculate the net present win score.

I also wanted to represent the state of the game at each turn in a small number of columns, so it would take less data for the model to start to get oriented. I lost a lot of detail in the process, but I broke each turn down into things like expected damage done, damage received, status probability, outspeed probability, and so on. So now, given a game state and a set of possible moves, I can calculate those predictors for each possible action, use a model based on previous battles to get the predicted net present win score of each move, and select the move with the highest predicted score — functionally optimizing for the fastest win. The model currently uses KNN to make its predictions, but it's very easy to swap out the model object for anything else, such as regression or forests. My friend Jake helped me design and code some of the predictors by opening PRs into the repo.

Other Bells and Whistles

The tool reconnects and reorients itself if anything disrupts the connection, and it can handle an arbitrary number of battles at once with no cross-talk. It also has a "training mode" where it defers to a human operator on all decisions and learns from them. This is really the only way to build up a good kernel of data — otherwise it just loses every battle and never starts identifying actions associated with a good net present win score. Training mode has also been how I've found the tool most useful. Not that many people play gen 1 battles, and I don't want to inconvenience anyone by having the model play endless battles sub-optimally, so I prefer to battle, have the model learn, and then battle the model myself. This sometimes helps me learn things about my own battle strategy, though it's not that nuanced, due to the information loss I mentioned above.

Video

Here's what it looks like to battle against the model. Here is the code.