Unravelling the Emergence Debate in AI

I’m currently preparing material for my next essay which will discuss ‘The Emergence Debate’.

What is the ‘Emergence Debate’?

‘Emergence’ refers to complex behaviours arising from simpler rules or systems which weren’t specifically programmed.

One side (team pro-emergence) argues that certain abilities of current language models (like ChatGPT) are emerging naturally as the models scale up in size.

The other team (team anti-emergence) argue that these abilities are not truly emergent, but are the result of extensive training on lots of different data.

There is a bunch of interesting research supporting both sides of the argument.

To make this more easy to grasp, let’s create an example.

Here, I ask ChatGPT to ‘write me a humorous poem about a mundane task, like doing the laundry’.

Here’s the response.

Team anti-emergence might argue that the poem is simply the result of recognising and reproducing patterns. Through training, the model has been exposed to numerous poems and humorous texts.

Team pro-emergence might argue that GPTs ability to write a humorous and clever poem about the laundry could only be explained by a deep understanding of the complexities of human emotion, humour and creative intelligence.

I have an inkling that the winner of the debate here, is far less interesting than the fact we’re having the debate at all..

An artificially designed (or discovered?) intelligence, can communicate, in human language, in real-time, in ways that are indistinguishable from human creativity? And we don’t know if this behaviour emerges naturally from the complexity of the training, or if it has been learned through huge amounts of data in training?

Holy shitballs.

At this point, you might be thinking – but language models are ‘just trained to predict the next word in a sentence’ … can’t be that hard?’

Turns out, predicting the next word in a sentence to the degree of maintaining coherent human-like conversation (essentially teaching a ‘robot to talk’) is insanely complex. Whether emergent or not, this behaviour requires a deep understanding of the human condition.

To better understand ‘The Emergence Debate’, here are the questions I’m trying to answer:

  • How’d we get here? Exploring the evolution from early neural networks to the development of current transformer models.
  • How do the current models work? Understanding the mechanics of Transformer Architecture and Large Language Models (LLMs).
  • What do we actually know about intelligence? Investigating how intelligence functions in humans and comparing it to AI.

And here are the resources I’m exploring:

A conversation with David Silver – one of the early pioneers of AlphaGo, Alpha Zero and Deep Reinforcement learning.

David Silver: AlphaGo, AlphaZero, and Deep Reinforcement Learning | Lex Fridman Podcast #86

A more recent animated video by ‘Art of the Problem’ exploring the journey of AI language models, from their modest beginnings through the development of OpenAI’s GPT models.

ChatGPT: 30 Year History | How AI Learned to Talk

Two new(ish) videos by 3Blue1Brown explaining attention mech in transformers and how GPT’s work.

Attention in transformers, visually explained | Chapter 6, Deep Learning

But what is a GPT? Visual intro to transformers | Chapter 5, Deep Learning

Andre Karpathy’s conversation with Lex Fridman.

Andrej Karpathy: Tesla AI, Self-Driving, Optimus, Aliens, and AGI | Lex Fridman Podcast #333

Jason Wei’s Stanford presentation – Intuitions on Language Models

Stanford CS25: V4 I Jason Wei & Hyung Won Chung of OpenAI

The Dwarkesh conversation with Sholto Douglas & Trenton Bricken

Sholto Douglas & Trenton Bricken – How to Build & Understand GPT-7’s Mind

If you have any other recommended resources, please let me know.