How GPT-3 Works

Apoorv Upreti

Read more posts by this author.

Apoorv Upreti

31 Jul 2020 • 1 min read

At some point I'll get around to reading the paper, but meanwhile Jay Alammar has managed to explain how GPT-3 works in a short Twitter thread.

It doesn't seem all that different from a typical language model. It's trained to predict the next word given a sequence of words. The sequence of words serves as context to decide what to generate next, and can be up to 2048 words long.

The main innovation seems to be the scale: 175B parameters, $4.6M to train, supports 2048 words of context. Really impressive (and scary) that such a crude technique can be scaled up to deliver such impressive results.

How GPT-3 Works

Apoorv Upreti

Apoorv Upreti

Peruvian Potato

How The Incas Did Accounting

The Humboldt Current

Gilgamesh, Performed

Colorized Roman Emperors

Subscribe to Observations

Subscribe to Observations