Index | Archive | Tags | Atom Feed | RSS Feed

Learning and Planning in Complex Action Spaces

We've been working on making MuZero as easy to use as possible, allowing it to be applied to any RL problem of interest. First and foremost, this required us to make some algorithmic extensions.

The original MuZero algorithm achieved good results in discrete action environments with a small action space - classical board games and Atari, with up to a few thousand different actions. However, many interesting problems have action spaces that cannot be easily enumerated, or are not discrete at all - just think of controlling a robot arm with many degrees of freedom, smoothly moving each joint. So far, MuZero …

Complexity Budget

No matter how smart, there's a limit to how much you can keep in mind at once before you start forgetting something. In the context of software engineering, a common term for this is complexity budget: the maximum complexity of a system you can understand at once.

Just as fitting an entire task into the cache of your CPU is orders of magnitude faster than paging in from RAM (or worse, disk!), fitting the entire problem you are working on into your brain can make you many times more productive.

Every line of code you write, every feature you add …

Rates of Growth

I've been enjoying [cached]A Map that Reflects the Territory, a collection of the best LessWrong essays from 2018. While in general the essays are thought provoking and interesting, one set of essays gave me pause: [cached]Hyperbolic Growth and related chapters on fast vs slow takeoff.

The discussion repeats tropes that are common in the rationalist and futurist community, describing how economic and technological [cached]growth have been accelerating and suggesting that they will soon increase so quickly that we will be unable to follow:

world gdp, growing exponentially ([cached]data source, csv)

However, we have to be careful not to mix up …

Stack Overflow Podcast

Last week, I had a chance to chat with [cached]Ben Popper and [cached]Cassidy Williams about MuZero and AI on the [cached]Stack Overflow podcast.

I had a lot of fun, thanks to Ben and Cassidy for being such great hosts!

MuZero Intuition

To celebrate the publication of our MuZero paper in [cached]Nature ([cached]full-text), I've written a high level description of the MuZero algorithm. My focus here is to give you an intuitive understanding and general overview of the algorithm; for the full details please read the paper. Please also see our [cached]official DeepMind blog post, it has great animated versions of the figures!

MuZero is a very exciting step forward - it requires no special knowledge of game rules or environment dynamics, instead learning a model of the environment for itself and using this model to plan. Even though it …

MuZero talk - ICAPS 2020

I gave a detailed talk about MuZero at ICAPS 2020, at the workshop "Bridging the Gap Between AI Planning and Reinforcement Learning".

In addition to giving an overview of the algorithm in general, I also went into more detail about reanalyse - the technique that allows MuZero to use the model based search to repeatedly learn more from the same episode data.

I hope you find the talk useful! I've also uploaded my slides for easy reference.

Interactive Voting System Simulator

Ka-Ping Yee's [cached]blog post about election methods and how to visualize them has long been one of my favourites. As always, a well chosen diagram or picture is much easier to understand than a verbose description, and can make corner cases directly leap into our face. The human visual system is a powerful pattern detector, we should make use of it whenever we can!

Inspired by Yee's blog post, I made an interactive simulator to allow you to explore the four main voting systems directly in your browser: Plurality (aka first past the post), Approval, Borda and Instant-runoff (Hare …

Static & Dynamic Typing

I've written and been paid to write code in a wide variety of languages - functional and imperative, statically and dynamically typed, verbose and concise. Over time, I came to appreciate the benefits of certain language features.

The one I want to talk about today is static typing. My programs in statically typed languages seem to be systematically more likely to work correctly on the first try, more robust against bugs introduced by refactoring or adding new features, and easier to reason about. In fact, the benefits are so large that I now use pytype even for Python scripts with just …

A simple way to run Rust WebAssembly in a browser

The [cached]Rust WebAssemply book has a detailed introduction to WebAssembly in Rust; unfortunately it's example setup is somewhat complicated and requires the use of npm just to run show a simple Hello World! message in the browser.

Luckily, there's a simpler way to get started if you don't care about npm modules.

First, follow the [cached]setup instructions to install the rust toolchain, wasm-pack and cargo-generate. You don't need npm.

Clone the example project template:

cargo generate --git https://github.com/rustwasm/wasm-pack-template

which will prompt you for a project name, in the following we'll assume you used …

Getting into Machine Learning - 2020

My previous Getting into Machine Learning post is one of my most popular; since then much has changed.

There's a new kid on the block: [cached]JAX. A thin, but powerful layer over Autograd and XLA, it makes it easy to concisely express algorithms with the same syntax as numpy while getting the full performance of TPUs and GPUs.

Especially in combination with higher level libraries such as [cached]Haiku, JAX makes it fun and easy to try new ideas. I've migrated all my own research to JAX and can only recommend it!

The resources I recommended in my previous …