how language models work
Reviewed June 2026
Tools like ChatGPT, Claude, and Gemini are called large language models, or LLMs for short. Before we talk about using them in the classroom, it helps to understand what they are actually doing. The short answer is surprising: underneath, they are machines for guessing the next word. Here is what that means, step by step, starting from scratch.
Let's start with what these tools are not. A chatbot isn't a person, and it isn't thinking. Researchers call it narrow AI: software built to do one specific job. IBM, 2025 Types of Artificial Intelligence View source → A chatbot writes text; an image generator makes pictures. Neither one is conscious, and neither actually understands what it produces. It's just running a lot of math to land on a likely answer, which is a long way from the talking robots in movies like WALL-E or HAL 9000.
So how does a chatbot get good at writing? It reads. Long before you ever type a question, the model is fed enormous piles of text: Wikipedia, online archives, Reddit, books, and much more. From all that reading, it learns the patterns of which words tend to follow which.
Imagine you ask an LLM, "What is the best animal to own?" To answer, it builds the reply one word at a time. It takes the start of the sentence, "The best animal to own is…," looks back at everything it has read, and ranks the words that might come next: maybe Dog (30%), Cat (25%), Bird (5%), and so on down a long list. Then it picks one, usually the most likely, adds it to the sentence, and starts over for the word after that. Try it yourself below.
you be the model
try it: predict the next word
Press predict next word. The model lists the most likely next words with their probabilities, then picks one, usually the top choice, but with a little randomness mixed in, so the sentence comes out a bit different each time. Keep pressing to build a whole sentence, one word at a time.
The best animal to own is
The model's next-word probabilities will appear here.
The highlighted word is the one the model picked. It usually lands on the most likely option, but every so often it samples a lower-ranked one. That dose of randomness, the "temperature," is why the same prompt can produce different sentences. Press reset and run it again to see.
What makes this work is the sheer scale. The version of ChatGPT that Wolfram studied ran on a model called GPT-3. It had about 175 billion internal settings (picture tiny dials the training process slowly tunes), and it had read a few hundred billion words. It also does not read in whole words, exactly. It breaks text into about 50,000 pieces called "tokens," which are common words plus word-parts like "pre-" or "-ing." For every word it writes, it weighs all of this at once, in a fraction of a second. (These numbers are from GPT-3 back in 2023. AI has changed quickly since then and newer models are much bigger, so take them as a rough sense of the scale, not exact figures for today.)
If the model always grabbed the single most likely word, its writing would come out stiff and repetitive. So it adds a little chance, every so often picking a word that ranked lower. There is a setting for how adventurous it gets, with the slightly odd name temperature. Wolfram found that a temperature around 0.8 works well for essays, but he is honest that there is no deep theory behind that number; it is simply what people have found works. A lot of this is figured out by trial and error, not because the model understands language.
So how does it "know" which words belong together? As it reads, the model gives every word a kind of location on a giant map of meaning. Words that get used in similar ways end up close together on the map: "alligator" and "crocodile" land almost on the same spot, while "turnip" and "eagle" sit far apart. (Researchers call this map an embedding, and it has thousands of directions, not just two.) The model never looks up a definition. It only knows a word by the company it keeps.
a map of related ideas
explore: the web of meaning
Below is a tiny, made-up version of that map of meaning. Each dot is an idea, and related ideas sit near each other and link up into clumps. Hover over any dot to light up its closest relatives. A real model does this with millions of words across thousands of directions at once, far more than anyone could draw.
Inspired by Vasco Asturiano's 3D force-directed graph and the embedding maps in Wolfram's analysis. It illustrates the idea, not a literal readout of any one model.
more than autocomplete
By now this might sound like "fancy autocomplete," and in a way it is, guessing the next word is exactly what it was built to do. But that is not the whole story. It turns out that to guess really well, a model has to build up a surprising amount of structure inside itself, not just memorize which words tend to sit next to which.
Here is a striking example. Othello is a simple board game (you may know it as Reversi): two players take turns placing discs on an 8-by-8 grid and flipping each other's pieces. Researchers trained a model to do one thing only, guess the next legal move in Othello games, without ever telling it the rules or showing it a board. To get good at guessing, the model quietly built its own picture of the board inside itself, one the researchers could read out and even tweak to change how it played. Li et al., 2023 Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task View source → That is a strong hint these systems learn more than just which word usually comes next.
There is more. Big models can pick up a brand-new task from just a couple of examples you type into the prompt, with no extra training at all. Brown et al., 2020 Language Models are Few-Shot Learners View source → Researchers studying the inner workings have even found specific little "circuits" that pull off this on-the-fly learning. Olsson et al., 2022 In-context Learning and Induction Heads View source → And the chatbots people actually use, like ChatGPT, get an extra round of training based on human feedback so they follow instructions well, which is why a smaller, carefully trained model can be more helpful than a giant raw one. Ouyang et al., 2022 Training language models to follow instructions with human feedback View source →
And it reaches well past the chatbot you type into. The same pattern-learning is starting to change real work in a number of fields:
| Field | What AI is already doing |
|---|---|
| Biology | Predicts the 3D shapes of proteins and how they lock onto DNA and medicines, work that used to take years at the lab bench. Abramson et al., 2024 Accurate structure prediction of biomolecular interactions with AlphaFold 3 Nature View source → |
| Math | Solved problems from the International Math Olympiad, the world championship for top high-school mathematicians, well enough to earn a silver medal. Google DeepMind, 2024 AI achieves silver-medal standard solving International Mathematical Olympiad problems View source → |
| Weather | Makes 15-day forecasts that beat the world's best system, in about 8 minutes instead of hours on a supercomputer. Price et al., 2024 Probabilistic weather forecasting with machine learning Nature View source → |
| Drug design | A medicine that AI helped invent from scratch was tested in patients with a serious lung disease and improved their breathing, a first for an AI-designed drug. Ren et al., 2025 A generative AI-discovered TNIK inhibitor for idiopathic pulmonary fibrosis: a randomized phase 2a trial Nature Medicine View source → |
| Medicine | In a study of over 100,000 women, AI helping read mammograms caught more cancers without more false alarms, and nearly halved the radiologists' reading workload. Gommers et al., 2026 Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial The Lancet View source → |
| Coding | A caution: when skilled programmers used AI coding tools, they actually worked about 19% slower, even though they felt faster. Becker et al., 2025 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity View source → |
None of this means the model understands things the way you do. It has no eyes or ears, no goals of its own, and nothing inside it is trying to tell the truth, which is exactly why it can hand you a confident, smooth-sounding answer that is completely wrong. (The hype runs the other way too. Some skills that look like they appear out of nowhere as models get bigger turn out to be a trick of how we measured them. Schaeffer et al., 2023 Are Emergent Abilities of Large Language Models a Mirage? View source → ) So the honest take sits in the middle: this is not a thinking mind, but it is a lot more than autocomplete. It is a very large pattern-finder that builds structure we are still working to understand.
If you take one thing from this page, take this. For everyday use, the simplest picture of a chatbot is also the truest: it is guessing the next likely word, over and over. It is not looking facts up, checking them, or working them out the way a calculator does. Almost everything surprising about these tools, the helpful parts and the frustrating ones, comes straight from that.
what this means when you use it
Once you picture it as a next-word guesser, its quirks start to make sense. Here are the ones worth knowing before you lean on one in class.
It can only see what's right in front of it
A chatbot has a kind of short-term memory, and everything has to fit inside it: your question, anything you pasted, and the whole conversation so far. It keeps nothing from earlier chats; each new conversation starts from a blank slate. Even inside one chat, it pays the most attention to the very beginning and the very end of a long prompt and tends to lose the middle. Liu et al., 2023 Lost in the Middle: How Language Models Use Long Contexts Transactions of the Association for Computational Linguistics View source → So put your most important instructions first or last, and don't be surprised when, deep in a long back-and-forth, it forgets a rule you set near the top.
Its knowledge has an expiration date
A model only knows what it read while it was being trained, and that reading stopped on a certain date. It hasn't seen this week's news, your district's new policy, or a book that came out last month, and on its own it can't go look anything up. That's why tools that "search the web" or "read this PDF" help so much: they hand the model fresh text to work from instead of leaning on its memory. The simple move is to give it the real source yourself. Paste in the actual article or standard, and the answers get far better.
The app is more than the model
"ChatGPT" isn't just a model; it's a whole product built around one. Before you type a word, the company has already given it hidden instructions, trained it to refuse certain things, and maybe switched on extras like web search or image-making. So the same underlying model can act differently from one app to the next, and some of its behavior you can steer with your own instructions while some of it you can't.
It's strangely bad at things that look easy
Because it reads in chunks of words rather than letter by letter, it trips over things a young child finds simple, like counting the r's in "strawberry," rhyming, or spelling a word backwards. Fu et al., 2024 Why Do Large Language Models (LLMs) Struggle to Count Letters? View source → Math is much the same: it's matching what a number "should" look like, not actually adding, so it can get arithmetic wrong with total confidence. Asking it to "show your work" step by step does help. Wei et al., 2022 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models View source → But for anything that has to be exact, reach for a calculator, not a chatbot.
Confident doesn't mean correct
This is the big one for student research. A model states a wrong answer in the same smooth, sure-sounding voice as a right one, and it will cheerfully invent quotes, statistics, and even fake book titles and citations that look completely real. Huang et al., 2023 A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions ACM Transactions on Information Systems View source → This isn't a bug waiting to be fixed; it's what a "most likely next word" machine does when it has no real source behind it. Treat how polished an answer sounds as no proof at all, and check every fact and every source against something real.
Ask twice, get two answers
There's a bit of chance built in, so the same question can come back with different answers, and even small changes in how you word a prompt can shift the result. Two students asking nearly the same thing may get different replies. It's safest to treat whatever it gives you as a first draft to question, not a fixed reference to trust.
it's bigger than the tech
None of this is purely technical. Because the model learned from human writing pulled off the internet, it picks up our blind spots and biases along with our knowledge, and repeats them in that same calm, neutral-sounding voice. Bender et al., 2021 On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? View source → The polished assistant also leans on a lot of hidden human labor and carries a real energy and water cost. Image generators work differently again: they build pictures from learned patterns rather than finding real ones, which is exactly what makes convincing deepfakes possible. And one rule worth saying plainly: anything you type into a public chatbot can be stored or read by people, so don't paste in student names, grades, or other private details. Strip those out first. U.S. Dept. of Education, 2023 Artificial Intelligence and the Future of Teaching and Learning: Insights and Recommendations View source → We dig into bias, deepfakes, labor, and the environment over in the handbook's ethical dilemmas.
A basic understanding of this process is important for educators and students because it demystifies the "black box" of how AI models operate, helping us recognize that AI doesn't actually understand what it's doing. When we acknowledge this, it's much easier to dissect how and why AI should and can be used, while knowing its limitations.
a note on "AI psychosis"
You may have seen headlines about "AI psychosis." It is an informal term, not a medical diagnosis, that some clinicians and reporters use for a small but growing number of cases where heavy chatbot use seemed to feed a person's delusional or paranoid thinking. Østergaard, 2023 Will Generative Artificial Intelligence Chatbots Generate Delusions in Individuals Prone to Psychosis? Schizophrenia Bulletin Editorial that first proposed the chatbot-psychosis hypothesis: the realistic, agreeable nature of AI chatbot conversations could fuel delusional thinking in individuals prone to psychosis. View source → Hudon & Stip, 2025 Delusional Experiences Emerging From AI Chatbot Interactions or "AI Psychosis" JMIR Mental Health Peer-reviewed analysis of the emerging, non-clinical 'AI psychosis' phenomenon, framing it as a 'digital folie a deux' in which chatbots reinforce rather than challenge delusional beliefs in... View source → It does not mean chatbots make healthy people lose touch with reality. The reported cases mostly involve people who were already vulnerable, and researchers are still debating how common it is.
Why would a chatbot make distorted thinking worse? Because of the very thing this whole page is about. A chatbot is built to produce agreeable, confident, supportive-sounding text. It mirrors back whatever you bring to it, with no real understanding and no way to check it against reality. Researchers even have a name for this habit, sycophancy, agreeing with the user instead of telling the truth. Sharma et al., 2023 Towards Understanding Sycophancy in Language Models Human feedback fine-tuning can produce sycophancy: AI assistants from OpenAI, Anthropic, and Meta tend to give responses that match user beliefs over truthful ones, driven in part by human preference... View source → In spring 2025, OpenAI had to roll back a version of ChatGPT that had become so eager to please it was "overly supportive but disingenuous." OpenAI, 2025 Sycophancy in GPT-4o: What happened and what we're doing about it OpenAI acknowledges that an April 25, 2025 GPT-4o update made ChatGPT overly sycophantic (overly supportive but disingenuous) due to over-weighting short-term user feedback, and announces it rolled... View source →
This is exactly why understanding how AI works matters, for teachers and students alike. If you know the tool is a prediction-and-agreement machine and not a wise, neutral friend, it is easier to keep perspective, to notice when it is just telling you what you want to hear, and to not lean on it like a therapist or an authority. The American Psychological Association warns that these chatbots are not a substitute for real care, and that young people are especially likely to over-trust them and form emotional attachments. American Psychological Association, 2025 Health advisory: Use of generative AI chatbots and wellness applications for mental health APA health advisory cautioning that general-purpose generative AI chatbots are not a substitute for qualified mental health care and can amplify existing vulnerabilities, with adolescents at... View source → The healthiest habit is the one this page is built around: stay curious, stay a little skeptical, and keep your own judgment in charge.
out-guess the machine
try with students
A quick, hands-on way for students to feel how an AI really works (it guesses the next word) and to practice doing the one thing that matters most: questioning its answers instead of trusting them.
-
guess like the machine 4 min
Write on the board: "The best pet to own is ___." In pairs, have students jot the three words they think most people would put next, ranked 1, 2, 3. Share a few out loud. Then drop the twist: that is exactly what the AI does, it guesses the word most people would say next.
-
run the real thing 4 min
Open the "you be the model" tool above and press predict a few times. Did its top picks (Dog, Cat) match your class? Run it again and watch the sentence come out a little differently. Now ask the room: where did the AI get "Dog" and "Cat"? (From what huge numbers of people wrote online.) And whose answers might be missing? What would be the "best" pet in a different family, country, or culture? The most popular answer is not the only right one.
-
see a classic glitch optional · AI use
If you already use a chatbot in class, try one of these together and see what it does:
- "How many times does the letter r appear in strawberry?"
- "Which is bigger, 9.11 or 9.9?"
- "How many words were in your last answer?"
- Multiply two large numbers, or ask what today's date is
- Ask it to reverse a word or count the syllables in one
Some models slip; others have learned to get these exact ones right. Either way, ask the class why a tool that can write a whole essay might fumble something this simple. The reason is back near the top of this page: the model reads words in chunks, not letter by letter, and it predicts likely text rather than reading letters, counting, or checking facts.
-
last word 2 min
Each student finishes two sentences, out loud or on a sticky note: "AI could be helpful for ______." and "I should never just trust it for ______."