Index | Archive | Tags | Atom Feed | RSS Feed

The Strong Turing Test

In the conventional [cached]Turing test (aka imitation game), an investigator tries to distinguish between a human and a computer solely by interacting with them.

This is an interesting setup and has inspired much research, but it doesn't immediately translate into practical usefulness - a computer system may pass as human, but may still not be able to help me accomplish any task.

Instead, I find I'm mostly interested in a stricter variety: in each interaction the investigator chooses a preferred response; the goal of the computer system is to be chosen as the preferred side as many times as possible …

Saving Aranet4 data to Raspberry Pi

Recently I was curious about the amount of CO2 present while sleeping and working; I ended up buying an [cached]Aranet4 monitor. It's great to see the reading live, but I would also like to compare trends over time. The Aranet4 itself can only store a maximum of 7 days of data, so for longer term analysis I need to periodically export the data.

This is where a Raspberry Pi comes in handy! First, install the necessary packages:

sudo apt install bluetooth pi-bluetooth bluez blueman

Use bluetoothctl to pair the Aranet4:

sudo bluetoothctl
> scan on …

Discovering Matrix Multiplication Algorithms with AlphaTensor

Matrix multiplication is at the foundation of modern machine learning - whether transformers or convolutional networks, diffusion models or GANs, they all boil down to matrix multiplications, executed efficiently on GPUs and TPUs. So far the best known algorithms have been discovered manually by humans, often optimized for specific use cases.

The most famous is probably the [cached]Strassen algorithm to multiply two 2x2 matrices using only 7 instead of the naive 8 multiplications:

illustration of the strassen matmul algorithm

Through clever addition and subtraction of the individual elements of the a and b matrices this algorithm is able to combine the intermediate results into the elements …

Planning in Stochastic Environments with a Learned Model

After extending to arbitrary action spaces and offline RL, we recently published our next step in generalizing MuZero: [cached]Stochastic MuZero, which learns and plans with a stochastic model of the environment that can model not only a single deterministic outcome of our actions, but rather a full set of possibilities.

Previous approaches such as AlphaZero and MuZero have achieved very good results in fully-observed deterministic environments, but this type of environment is insufficiently general to model many problems and real world tasks. Beyond the stochasticity inherent in many domains (think roll of the dice or draw of the cards …

Our lives are measured in memorable moments

The reason our childhood years seemed to pass so much more slowly than our adulthood is that they were filled with novelty and excitement, memorable first experiences: learning to ride a bike, entering a new school, making new friends.

Events, experiences, things are memorable not by some inherent quality, but rather become memorable by virtue of being different from our everyday life. What matters is the magnitude, not the sign: both the best and the worst times in your life will stick out in your memory.

Memorable experiences come in all sizes: cooking a new dish, meeting up with old …

MuZero does YouTube

If you've been watching YouTube lately, the encoding settings for the video you watched might have been selected by MuZero - using less of your bandwidth for the same quality. The task here is rate control: selecting the quantization parameters within the VP9 codec to maximize the quality at a specified bitrate.

This is a constrained RL problem, requiring us to optimize two conflicting objectives of variable difficulty at the same time. To deal with this challenge we introduce a self-competition based reward mechanism, where the reward depends on how successful other recent episodes were at maximizing the quality while staying …

Competitive Programming with AlphaCode

I'm excited to share more on our latest project, AlphaCode - a system to write programs, able to reach roughly median human performance in coding competitions. And that's median competitive programmer performance, not median programmer or median human! In addition to our paper and [cached]official blog post, you can also find my personal take below.

Problem Setup

Coding competitions are difficult even for experienced programmers. Before writing the first character of a program, the first step is to understand the natural language problem description: often spanning several paragraphs in length, problem descriptions do not directly describe the required algorithm, but …

Mastering Atari Games with Limited Data

Another interesting paper based on MuZero was published at NeurIPS 2021: Mastering Atari Games with Limited Data, aka EfficientZero. This paper by Weirui Ye, Shaohuai Liu, Thanard Kurutach, Pieter Abbeel and Yang Gao focuses on the application of MuZero to very low data tasks, such as Atari 100k (only two hours of gameplay!) or DMControl 100k.

To tackle these tasks, the author propose three main techniques:

First they introduce a Self-Supervised Consistency Loss, to ensure that the embeddings produced by MuZero's dynamics function are consistent with the embeddings from the representation function. This loss is insipired by [cached]SimSiam-style …

Online and Offline Reinforcement Learning by Planning with a Learned Model

After extending to arbitrary action spaces, our next step in generalizing MuZero was to work on data efficiency, and to extend it all the way to the offline RL case. We've now published this work as MuZero Unplugged at NeurIPS, below I will give a brief summary of the main ideas.

Environment interactions are often expensive or have safety considerations, while existing datasets frequently already demonstrate good behaviour. We want to learn efficiently from any such data source, without being restricted by off-policy issues or limited to the performance of the policy that generated the data: as always, we want …

Values, Pointers and References in C++

If you've primarily used high level languages like Python, you may not be used to explicitly thinking about the ownership or representation of your values in memory.1 In system languages like C++ or Rust, we have direct control over these aspects, and are able to use the type system to explicitly represent when a function takes ownership of a value, vs when it only takes a (temporary) reference.2

First, different types of ownership, in order of preference:

  • T t. A normal owned value of type T, uniqlue owned. If declared as a variable it is stored on the …

© Julian Schrittwieser. Built using Pelican. Theme by Giulio Fidente on github. .