Aussie AI
Chapter 2. The Bitter Lesson
-
Book Excerpt from "The Sweetest Lesson: Your Brain vs AI"
-
by David Spuler, Ph.D.
Chapter 2. The Bitter Lesson
“AI is a marathon, not a sprint.”
— Demis Hassabis.
What is The Bitter Lesson?
Although the previous sections should have you on a high about your amazing brain, this is where we have to take a long, hard look at the history of analyses like that in the computer industry. It’s very bitter.
The idea of the “bitter lesson” was coined in 2019 by Rich Sutton in relation to intelligent computing. The core of the idea is that a simple algorithm combined with brute-force computer power will always eventually outperform apparently smarter algorithms.
The reason that this idea is “bitter” is that human researchers tend to assume that making computers more like them, by using tricky human-like rules of thumb (called “heuristics” by boffins), will be the best way. It usually works for a while, but then there’s the reality that these advanced research algorithms get overrun by much simpler and dumber algorithms, without any fancy logic, hooked up to a very fast computer.
When presented with a new problem, human researchers will try to solve it like humans. Hence, they will use things based on human-like intelligence, such as:
1. Heuristics
2. Logic
Heuristics are human-like “tricks” or “shortcuts” in solving a problem that are then coded up into a computer algorithm. Logic is part of this, where possible solutions are analyzed according to some rational metrics. So, this is the general approach:
How do we find some of the possible solutions? — Heuristics.
What’s the best solution? — Logic.
These types of methods often do well in the short-term in solving a problem, even on fast computers at the time. However, over time, computers get even faster, and the best algorithms often turn out to involve:
1. Brute-force computations, and
2. Very simple comparisons.
It’s kind of like that movie Everything Everywhere All at Once, only not quite as funny.
You don’t need heuristics to find the best solution. Instead, just get a faster computer and try all of them. The general approach changes to:
How do we find the possible solutions? — Try every single one.
What’s the best solution? — Compare them all.
When this solves a problem better than the early heuristic versions, it’s a “bitter lesson” for the initial human researchers. And I mean, at a level of bad taste in the mouth beyond what you get from a dandelion sandwich. Researchers think the solution should be something clever, but there’s nothing smart about this brute-force approach, and it’s just a dumb box cranking through the whole solution space. It shouldn’t work so well, but it does.
Bitterness ensues. Lots of musty research papers get torn up by librarians and tossed in the trash.
It’s happened in the computer industry, over and over again, and yet when researchers are presented with a new problem to solve, they still tend to try heuristics and logic. Why is it so hard to learn from these mistakes, rather than learn that raw compute always wins? Maybe, it’s because of what it implies at the deepest levels of the soul:
Computers are better than humans.
Surely, it can’t be true?
Bitter Chess
Chess-playing computers are the best example of the bitter lesson. The best chess computer in the world is named Stockfish, and it has a rating of over 3,000 ELO points. The best grandmasters are “only” around 2,700 points on this rating scale.
The way it started was that programmers tried to copy how humans play chess. There are also sorts of rules of thumb that you learn at Chess Club, such as:
- Center your pieces.
- Queens are worth nine pawns.
- Make a breathing space for your king.
A computer can play a reasonably strong game of chess if you code up these sorts of rules. It’s rating is maybe around 1,600 if you do this. My favorite reference on this is the 1988 book by David Levy.
The first major improvement came shortly after. If you combine these rules with a very powerful computer that scans through lots of possible moves, it does better. Not smarter rules, but just more grunt to scan all the whole tree of possible moves that each player can make. This was IBM’s “Deep Blue” computer that played chess, and it was about the level of the World Chess Champion at the time, Garry Kasparov, who beat Deep Blue in the first match in 1996, but lost the rematch a year later.
Not smarter, just faster, but still using human techniques. Half bitter.
What followed was even worse. It turns out that computers don’t even need the human rules of thumb. In 2017, an AI company called DeepMind created an even more massive computer called AlphaZero, that was better at chess. This would be fine, except for the way that it did so.
All of those rules of thumb about how to play chess well, crafted by humans over centuries of careful analysis: worthless. AlphaZero only needed the basic rules of chess and a lot of GPU chips to run the AI learning algorithm. Rather than requiring software code for these heuristic rules of thumb, AlphaZero just played lots of random games against itself, watching for what works, and what doesn’t. The computer played better just by figuring out what to do itself, learning all of the types of patterns in the game, using methods that humans cannot even understand.
Now that’s bitter.
Intelligence and the Bitter Lesson
Will the achievement of human-level intelligence be another case of the bitter lesson? If the best models are at 2 trillion weights and the human brain has about 100 trillion, maybe the problem is just how to brute-force the AI engines with 50 times more power. Is the failure to reach true intelligence just that technology companies haven’t yet figured out how to run 100 trillion weights so as to match the true size of the human neural network?
Reasoning could be another bitter lesson. After all, the big kerfuffle around DeepSeek in early 2025 was mainly that reasoning could be done just with training. If you showed the LLM enough mathematical proofs in training, it got better at proving things.
If the AI companies put their minds to it, they’ll find training examples for every type of reasoning. Proof by mathematical induction, by contradiction, by algebraic transformations? It’s all be done a thousand times in all the research papers. But those are just proofs, and that’s just one way to do reasoning.
Are there really that many ways to do reasoning?
And if we crank up our 100 trillion weights in an ultra-massive AI model, and show it all those different examples of reasoning, maybe that’s job done. The AI engine has then seen countless examples of the hundreds or thousands of different ways to do reasoning, and it’s great at parroting them. Generalization, who needs it?
Is truly intelligent AI going to be the bitter lesson, all over again, one last time?
References
Articles and research papers on the “bitter lesson” include:
- Rich Sutton, March 13, 2019, The Bitter Lesson, http://www.incompleteideas.net/IncIdeas/BitterLesson.html, PDF: https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson.pdf
- Mojtaba Yousefi, Jack Collins, 12 Oct 2024, Learning the Bitter Lesson: Empirical Evidence from 20 Years of CVPR Proceedings, https://arxiv.org/abs/2410.09649
- Alberto Romero, Feb 19, 2025, Grok 3: Another Win For The Bitter Lesson: Congratulations to the xAI team—and the advocates of the scaling laws, https://www.thealgorithmicbridge.com/p/grok-3-another-win-for-the-bitter
- lucalp, June 24, 2025, The Bitter Lesson is coming for Tokenization: a world of LLMs without tokenization is desirable and increasingly possible, https://lucalp.dev/bitter-lesson-tokenization-and-blt/
- Leon Wu, July 2025 (accessed), The Bitter Lesson: How Your Intuition About AI Is Probably Wrong, https://leonwu.tech/posts/bitter-lesson
- Michal Nauman, Michał Bortkiewicz, Piotr Miłoś, Tomasz Trzciński, Mateusz Ostaszewski, Marek Cygan, 19 Jun 2024 (v2), Overestimation, Overfitting, and Plasticity in Actor-Critic: the Bitter Lesson of Reinforcement Learning, https://arxiv.org/abs/2403.00514
- Jesse Jing, Apr 12, 2023, The Bitter Lesson: What direction to avoid in the field of neural symbolic AI?, https://medium.com/towards-nesy/the-bitter-lesson-1a1d282ae1b9
- Hassaan Naeem, May 10, 2022, Thoughts: Sutton’s The Bitter Lesson: Ponderings on Richard Sutton’s the bitter lesson, https://hassaann.medium.com/thoughts-suttons-the-bitter-lesson-6248c2d7e8c2
- Minghao Wu, Weixuan Wang, Sinuo Liu, Huifeng Yin, Xintong Wang, Yu Zhao, Chenyang Lyu, Longyue Wang, Weihua Luo, Kaifu Zhang, 22 Apr 2025, The Bitter Lesson Learned from 2,000+ Multilingual Benchmarks, https://arxiv.org/abs/2504.15521
- Dulhan Jayalath, Gilad Landau, Brendan Shillingford, Mark Woolrich, Oiwi Parker Jones, 2 Jun 2025 (v5), The Brain’s Bitter Lesson: Scaling Speech Decoding With Self-Supervised Learning, https://arxiv.org/abs/2406.04328
- Zhen Huang, Haoyang Zou, Xuefeng Li, Yixiu Liu, Yuxiang Zheng, Ethan Chern, Shijie Xia, Yiwei Qin, Weizhe Yuan, Pengfei Liu, 25 Nov 2024, O1 Replication Journey -- Part 2: Surpassing O1-preview through Simple Distillation, Big Progress or Bitter Lesson?, https://arxiv.org/abs/2411.16489
- Julian Aron Prenner, Romain Robbes, 6 Mar 2025, Extracting Fix Ingredients using Language Models, https://arxiv.org/abs/2503.04214
- Warren Morningstar, Alex Bijamov, Chris Duvarney, Luke Friedman, Neha Kalibhat, Luyang Liu, Philip Mansfield, Renan Rojas-Gomez, Karan Singhal, Bradley Green, Sushant Prakash, 8 Mar 2024, Augmentations vs Algorithms: What Works in Self-Supervised Learning, https://arxiv.org/abs/2403.05726
- Martin Riedmiller, Tim Hertweck, Roland Hafner, 14 Dec 2023, Less is more -- the Dispatcher/ Executor principle for multi-task Reinforcement Learning, https://arxiv.org/abs/2312.09120
- Sharut Gupta, Stefanie Jegelka, David Lopez-Paz, Kartik Ahuja, 20 Sep 2023 (v2), Context is Environment, https://arxiv.org/abs/2309.09888
References on computer chess theory:
- David N. L. Levy, 1988, Computer Chess Compendium, https://www.amazon.com/dp/0387913319
- Tord Romstad, Marco Costalba, Joona Kiiski, and contributors, August 2025 (accessed), Stockfish chess, https://stockfishchess.org/about/, Code: https://github.com/official-stockfish/stockfish-web
- Pandolfini, Bruce, 1997, Kasparov and Deep Blue: The Historic Chess Match Between Man and Machine, Simon & Schuster, https://www.amazon.com/Kasparov-Deep-Blue-Historic-Between/dp/068484852X/
- David Silver, Thomas Hubert, Julian Schrittwieser, Ioannis Antonoglou, Matthew Lai, Arthur Guez, Marc Lanctot, Laurent Sifre, Dharshan Kumaran, Thore Graepel, Timothy Lillicrap, Karen Simonyan, Demis Hassabis, 5 Dec 2017, Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm, DeepMind, https://arxiv.org/abs/1712.01815
|
• Online: Table of Contents • PDF: Free PDF book download |
|
The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
Get your copy from Amazon: The Sweetest Lesson |