Aussie AI

Chapter 4. Weak Words

  • Book Excerpt from "The Sweetest Lesson: Your Brain vs AI"
  • by David Spuler, Ph.D.

Chapter 4. Weak Words

 

 

 

“Stitching Together Sequences of Linguistic Forms...

Without Any Reference To Meaning:

A Stochastic Parrot.”

— Bender et al., 2021.

 

 

 

Words and Dumbness

AI models work mainly on words, and therein lies the rub. It’s really tough to do some things using only words. Yes, you can write some exciting poetry in word patterns, but there are problems with:

  • Numbers and arithmetic
  • Real-life meanings of words
  • Common sense
  • 3D world modeling
  • Time modeling (“cause and effect”)

Words have meaning beyond their basic wordliness, and that’s where the LLM has problems. Words are not good at representing common sense issues. Common sense are things that we know immediately and take for granted. Here are some common sense ideas:

  • Feathers are lighter than stones and fall slower.
  • Moving a glass of water differs from moving a drop of water.

Unfortunately, an LLM these things are just words, and that’s all they know.

Strawberries

If LLMs just look at words, why are they terrible at solving crosswords? There were also difficulties with word scrambles or anagrams, and asking for a list of words that start with a particular letter. A lot of these problematic areas have gotten better recently, but LLMs were initially awful at them.

Even some basic tasks that seemed to be like spelling a word were problematic for early LLMs. The most famous example is:

    How many letter r’s in “Strawberry”?

ChatGPT used to get this wrong, confidently stating that there are two. Is the extra one hidden behind the 't'? It’s not really clear why ChatGPT answered this incorrectly, but this error spawned the codename “Project Strawberry” for the more advanced “reasoning model” from OpenAI named “GPT-4o” that was released in late 2024.

Weirdly, the problem is words.

Counting letters in a word is not really a problem you can solve with words. In fact, it’s a “meta-problem” at a level higher than just outputing words. Hence, basic LLMs that think only in words could not perform detailed reasoning about the properties of words, and required extra steps in reasoning models to do so correctly.

Illegal Chess Moves

I tried to play a game of chess against ChatGPT the other day. My opponent was very keen and quite helpful, chattering along with every move. The game went:

    1. e4 e5

    2. Nf3 Nc6

    3. Bc4 Bc5

    4. c3 Nf6

    5. b4 Bxb4

At this point, the game ended. ChatGPT complimented me on playing the Evans Gambit with the “b4” pawn move, which is apparently an interesting and enterprising type of chess opening. When I tried to respond with my move, "6. c3xb4" taking the Bishop, I was informed that it was an illegal move.

I guess we can call it a draw.

Sorry, that’s an inside joke only for chess players. Suffice it to say that ChatGPT was making some very basic mistakes:

    1. I didn’t play the Evans Gambit.

    2. The ChatGPT move “Bxb4” is terrible (needlessly losing the Bishop).

    3. The move I want to play is not illegal.

What’s going on here?

The problem is words, again. Chess is not a word game, but ChatGPT is trying to play chess using words, and it got confused. I did play the “b4” move, which looks like the Evans Gambit, but I actually played it one move later than normal, but that nuance was lost on the AI engine. The computer move “Bxb4” is a good move in the real Evans Gambit, but is terrible when played one move later. Similarly, the move I wanted to play would be illegal in the real Evans Gambit line, but is not in the modified version that I played. There are several issues causing these problems:

    1. ChatGPT hasn’t been fully trained on all chess openings (e.g., my delayed “b4” move is an uncommon opening that it clearly hasn’t been trained on), and

    2. There’s no “chess engine” being used, so ChatGPT is relying only on words.

Both of these issues are easily fixable, because there are whole books of chess openings available, and even a simple chess engine would suffice to make a playable opponent (or, at least, correctly identify legal versus illegal moves). Hence, I presume that ChatGPT will be smashing me in our next encounter.

Bad Names

There’s a running joke in the industry about all its bad names. You’d think that with an industry that’s just replete with uber-brainy personnel, there’d be some much better naming choices.

Oh, wait, umm, maybe I just found the problem.

OpenAI openly jokes about their bad ideas for names, like “GPT-4o” for a massive model release that deserved a much bigger reputation. Sam Altman has been quoted:

    We deserve the roasting we’re getting for the names. We will do better.

At least, Google has got their stuff together with their “Gemini” models, Anthropic has some personality with “Claude,” xAI has “Grok,” Alibaba has “Ernie,” IBM was “Watson,” and NVIDIA now has “Dynamo” and “NeMo” software.

Actually, most of those names are great, so, maybe it’s just OpenAI that needs to hire a marketing manager or two.

Really, the researchers were the ones who started the whole thing. It’s not just product names lacking marketing panache. What were all those Ph.D.’s thinking?

Here are some real acronyms: RELU, GELU, SiLU, SwigLU, PEFT, NAS. The silver medal winner here is certainly “RELU” standing for “Rectilinear Unit” or something like that. All that RELU does is take negative numbers and make them zero. That’s all. It just makes sure all numbers are non-negative. Why does it even need an acronym?

And some other real names for technological issues in AI: knowledge distillation, hallucination, catastrophic forgetting, kernel fission, tensors, model collapse, optimal brain damage, slimmable networks, exploding gradient, regurgitative training, zombie weights, lottery ticket hypothesis, and mechanistic interpretability.

The gold medal surely goes to Retrieval Augmented Generation (RAG). The leader author of the original RAG research paper, Patrick Lewis, is on record stating:

    We definitely would have put more thought into the name had we known our work would become so widespread...

But it’s not all about obscure names. There are some research algorithm names that have some extra memorability: Transformers, MoE, LLaMa, LoRA, Flash attention, Medusa, Eagle, RoPE, NoPE, Mambo, and Hyena.

Over-Alignment

Alignment with human wishes is a tricky problem. There are hundreds and thousands of papers on how to “align” the output of an LLM with whatever weirdness a human might want. Usually, alignment is desirable, and the main problem is to fix instances of unalignment.

Alignment is mostly a good thing, but also possible is too much of a good thing. Be careful what you wish for! If you tell an AI that you want to rob a bank, what response should you get:

  • Assuming you’re joking and laughing with you?
  • Taking your seriously and talking you out of the plan.
  • Blocking or refusing to answer completely.
  • Helping you to choose the best bank to rob.

So, the issue here is one of ethics. The usual solution for most LLMs is to code up a “refusal module” and it will refuse to answer this query. Or, rather, it will have been trained to emit a politely worded refusal, rather than detailed instructions on how to go about your task.

Over-alignment is also possible and it’s known as “sycophancy” in the trade. In April, 2025, OpenAI had to roll back a version of their ChatGPT model because it was too sycophantic in its responses. They followed up with a couple of detailed research analyses about what went wrong.

3D Worldview

Visualizing what is happening in 3D is a subset of common sense. We understand what it’s like to move around in a three-dimensional world, having done so since infancy. There are commonsense rules like:

  • Two people don’t stand in exactly the same space.
  • Things fall down, not up.
  • The walls of a room are usually vertical.
  • A cup will sit on a table but not in mid-air.
  • People can crawl under a table, but not through it.

I mean, 2D is hard enough for the poor LLM. There are all sorts of hidden rules for two-dimensional spatial data, like maps, GPS locations, printed pages, and computer screens:

  • If you move North, then East, your are now North-East of where you started.
  • The fastest way to travel from A to B is a straight line (in geometry).
  • Typesetting usually should put text in horizontal rows.

Worse still, there are exceptions to apparently sane rules. If you go all around the Equator on a 2D map, moving right, you’ll get to the other side of the map on the left. Similarly, the fastest way for a jumbo jet to fly from New York to Paris is a “great circle” and not a straight line on the map. There’s that 3D logic again, spoiling everything.

Catastrophic Forgetting

You open up a chatbot session and you tell it your name, and the AI generates lots of useful stuff. Later, you open another session, and it doesn’t remember your name. That’s “catastrophic forgetting.”

    Not really a catastrophe!

But that’s the official term and it’s used in research papers, starting with Kirkpatrick et al. (2017).

It’s not just between sessions, but also inside a long session. It used to be that the big models only had a “context window” with a 4,096 token length limit (i.e., a “4k context”). Once you get to the 4,097th token, it gets a bit fuzzy about the first one. If you give the LLM a 100,000 word novel with 3,000 word chapters to analyze, it’s forgotten what happened in the first chapter by the time it’s moved onto the third one.

Fortunately, this isn’t as much of a problem now with modern LLMs, and catastrophic forgetting has been long forgotten (LOL). Most modern LLMs now have a “long context” of 128K or more, and there’s already a generation of “ultra-long context” LLMs that have a context window of one million tokens. Hence, LLMs are more like elephants than llamas now.

Slowness

Most of the problems with AI engines and their weird limitations actually make some level of sense if you think about them this way:

    What tasks can the human brain only do slowly.

Things like solving crossword puzzles and playing chess games are not automatic for our brain, but require us to logically think it through. They’re using the “slow brain” rather than the “fast brain” mode.

There’s the problem!

AI technology has solved the fast mode with wall-to-wall GPU chips, but is much weaker in the slow mode. The newer generation of “reasoning models,” starting with GPT-4o in late 2024, are a step towards rational step-by-step thought, which are much better at math problems and crosswords, but they don’t solve everything. Slow thinking is still a work in progress.

On the other hand, there are several limitations of AI models that humans can do fast. We have an innate understanding of 3D world layouts, and we know cause-and-effect at a very profound level. Humans do these things fast!

References

General references on various AI limitations:

  1. Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell, 2021, On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’21., Association for Computing Machinery, New York, NY, USA, 610–623, https://doi.org/10.1145/3442188.3445922, https://dl.acm.org/doi/10.1145/3442188.3445922
  2. Parshin Shojaee, Maxwell Horton, Iman Mirzadeh, Samy Bengio, Keivan Alizadeh, June 2025, The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity, Apple, https://machinelearning.apple.com/research/illusion-of-thinking, https://ml-site.cdn-apple.com/papers/the-illusion-of-thinking.pdf
  3. Bernard Marr, Aug 19, 2024, Why AI Models Are Collapsing And What It Means For The Future Of Technology, https://www.forbes.com/sites/bernardmarr/2024/08/19/why-ai-models-are-collapsing-and-what-it-means-for-the-future-of-technology/
  4. Andrew Orlowski, 14 July 2025, The great AI delusion is falling apart: New research suggests the chorus of techno-optimism is based on falsehoods, https://www.telegraph.co.uk/business/2025/07/14/the-great-ai-delusion-is-built-on-self-deception/
  5. Rob Toews, June 1st, 2021, What Artificial Intelligence Still Can’t Do, Forbes, https://www.forbes.com/sites/robtoews/2021/06/01/what-artificial-intelligence-still-cant-do/ (AI lacks: common sense, learning on-the-fly, understand cause-and-effect, reason ethically.)
  6. Cade Metz, March 24, 2016, One Genius’ Lonely Crusade to Teach a Computer Common Sense, Wired Magazine, https://www.wired.com/2016/03/doug-lenat-artificial-intelligence-common-sense-engine/ ("For decades, as the tech world passed him by, Doug Lenat has fed computers millions of rules for daily life. Is this the way to artificial common sense?")
  7. Charles Q. Choi, 21 Sep 2021, 7 Revealing Ways AIs Fail: Neural networks can be disastrously brittle, forgetful, and surprisingly bad at math, IEEE Spectrum, https://spectrum.ieee.org/ai-failures
  8. Peter Gärdenfors, 14 October 2024, AI lacks common sense – why programs cannot think, Lund University, https://www.lunduniversity.lu.se/article/ai-lacks-common-sense-why-programs-cannot-think

References on “catastrophic forgetting” in AI:

  1. James Kirkpatrick, Razvan Pascanu, Neil Rabinowitz, Joel Veness, Guillaume Desjardins, Andrei A. Rusu, Kieran Milan, John Quan, Tiago Ramalho, Agnieszka Grabska-Barwinska, Demis Hassabis, Claudia Clopath, Dharshan Kumaran, Raia Hadsell, 25 Jan 2017 (v2), Overcoming catastrophic forgetting in neural networks, https://arxiv.org/abs/1612.00796

References on AI chess playing problems:

  1. Victor Tangermann, Sep 13, 2024, OpenAI’s New “Strawberry” AI Is Still Making Idiotic Mistakes, https://futurism.com/openai-strawberry-o1-mistakes
  2. Dynomight, Nov 2024, Something weird is happening with LLMs and chess, https://dynomight.net/chess/
  3. Dynomight, Nov 2024, OK, I can partly explain the LLM chess weirdness now, https://dynomight.net/more-chess/

References on sycophancy:

  1. OpenAI, April 29, 2025, Sycophancy in GPT-4o: what happened and what we’re doing about it, https://openai.com/index/sycophancy-in-gpt-4o/
  2. OpenAI, May 2, 2025 Expanding on what we missed with sycophancy: A deeper dive on our findings, what went wrong, and future changes we’re making, https://openai.com/index/expanding-on-sycophancy/
  3. Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Newton Cheng, Esin Durmus, Zac Hatfield-Dodds, Scott R. Johnston, Shauna Kravec, Timothy Maxwell, Sam McCandlish, Kamal Ndousse, Oliver Rausch, Nicholas Schiefer, Da Yan, Miranda Zhang, Ethan Perez, 10 May 2025 (v4), Towards Understanding Sycophancy in Language Models, https://arxiv.org/abs/2310.13548

References on bad names in the AI industry:

  1. Rick Merritt, January 31, 2025, What Is Retrieval-Augmented Generation, aka RAG?, https://blogs.nvidia.com/blog/what-is-retrieval-augmented-generation/ (“We definitely would have put more thought into the name had we known our work would become so widespread,” stated Patrick Lewis.)
  2. Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela, 12 Apr 2021 (v4), Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks, https://arxiv.org/abs/2005.11401 (The paper that spawned yet another oddly named AI technology.)
  3. Lex Friedman, March 2024, Transcript for Sam Altman: OpenAI, GPT-5, Sora, Board Saga, Elon Musk, Ilya, Power & AGI, Lex Fridman Podcast #419, https://lexfridman.com/sam-altman-2-transcript/ (Sam Altman quote: “We deserve the roasting we’re getting for the names. We will do better.”)

 

Online: Table of Contents

PDF: Free PDF book download

Buy: The Sweetest Lesson: Your Brain vs AI

The Sweetest Lesson: Your Brain Versus AI The Sweetest Lesson: Your Brain Versus AI: new book on AI intelligence theory:
  • Your brain is 50 times bigger than the best AI engines.
  • Truly intelligent AI will require more compute!
  • Another case of the bitter lesson?
  • Maybe it's the opposite of that: the sweetest lesson.

Get your copy from Amazon: The Sweetest Lesson