back to the handbook

how language models work

Reviewed June 2026

Tools like ChatGPT, Claude, and Gemini are called large language models, or LLMs for short. Before we talk about using them in the classroom, it helps to understand what they are actually doing. The short answer is surprising: underneath, they are machines for guessing the next word. Here is what that means, step by step, starting from scratch.

Let's start with what these tools are not. A chatbot isn't a person, and it isn't thinking. Researchers call it narrow AI: software built to do one specific job. IBM, 2025 Types of Artificial Intelligence IBM (2025) View source → A chatbot writes text; an image generator makes pictures. Neither one is conscious, and neither actually understands what it produces. It's just running a lot of math to land on a likely answer, which is a long way from the talking robots in movies like WALL-E or HAL 9000.

So how does a chatbot get good at writing? It reads. Long before you ever type a question, the model is fed enormous piles of text: Wikipedia, online archives, Reddit, books, and much more. From all that reading, it learns the patterns of which words tend to follow which.

Imagine you ask an LLM, "What is the best animal to own?" To answer, it builds the reply one word at a time. It takes the start of the sentence, "The best animal to own is…," looks back at everything it has read, and ranks the words that might come next: maybe Dog (30%), Cat (25%), Bird (5%), and so on down a long list. Then it picks one, usually the most likely, adds it to the sentence, and starts over for the word after that. Try it yourself below.

you be the model

try it: predict the next word

Press predict next word. The model lists the most likely next words with their probabilities, then picks one, usually the top choice, but with a little randomness mixed in, so the sentence comes out a bit different each time. Keep pressing to build a whole sentence, one word at a time.

The best animal to own is

The model's next-word probabilities will appear here.

The highlighted word is the one the model picked. It usually lands on the most likely option, but every so often it samples a lower-ranked one. That dose of randomness, the "temperature," is why the same prompt can produce different sentences. Press reset and run it again to see.

0.3%1%3%10%30% 110100 rank of candidate word → probability DogCatBirdFerret
What that ranking looks like. Each dot is one possible next word for "The best animal to own is…," placed by how likely the model thinks it is: higher up means more likely, further right means less. A couple of words are strong favorites (Dog and Cat), and then the odds trail off into a long tail of options it could pick but probably won't. That shape, a few favorites and a long tail, is what real models show. Wolfram, 2023 What Is ChatGPT Doing … and Why Does It Work? Stephen Wolfram (2023) View source →

What makes this work is the sheer scale. The version of ChatGPT that Wolfram studied ran on a model called GPT-3. It had about 175 billion internal settings (picture tiny dials the training process slowly tunes), and it had read a few hundred billion words. It also does not read in whole words, exactly. It breaks text into about 50,000 pieces called "tokens," which are common words plus word-parts like "pre-" or "-ing." For every word it writes, it weighs all of this at once, in a fraction of a second. (These numbers are from GPT-3 back in 2023. AI has changed quickly since then and newer models are much bigger, so take them as a rough sense of the scale, not exact figures for today.)

If the model always grabbed the single most likely word, its writing would come out stiff and repetitive. So it adds a little chance, every so often picking a word that ranked lower. There is a setting for how adventurous it gets, with the slightly odd name temperature. Wolfram found that a temperature around 0.8 works well for essays, but he is honest that there is no deep theory behind that number; it is simply what people have found works. A lot of this is figured out by trial and error, not because the model understands language.

So how does it "know" which words belong together? As it reads, the model gives every word a kind of location on a giant map of meaning. Words that get used in similar ways end up close together on the map: "alligator" and "crocodile" land almost on the same spot, while "turnip" and "eagle" sit far apart. (Researchers call this map an embedding, and it has thousands of directions, not just two.) The model never looks up a definition. It only knows a word by the company it keeps.

a map of related ideas

explore: the web of meaning

Below is a tiny, made-up version of that map of meaning. Each dot is an idea, and related ideas sit near each other and link up into clumps. Hover over any dot to light up its closest relatives. A real model does this with millions of words across thousands of directions at once, far more than anyone could draw.

Inspired by Vasco Asturiano's 3D force-directed graph and the embedding maps in Wolfram's analysis. It illustrates the idea, not a literal readout of any one model.

more than autocomplete

By now this might sound like "fancy autocomplete," and in a way it is, guessing the next word is exactly what it was built to do. But that is not the whole story. It turns out that to guess really well, a model has to build up a surprising amount of structure inside itself, not just memorize which words tend to sit next to which.

Here is a striking example. Othello is a simple board game (you may know it as Reversi): two players take turns placing discs on an 8-by-8 grid and flipping each other's pieces. Researchers trained a model to do one thing only, guess the next legal move in Othello games, without ever telling it the rules or showing it a board. To get good at guessing, the model quietly built its own picture of the board inside itself, one the researchers could read out and even tweak to change how it played. Li et al., 2023 Emergent World Representations: Exploring a Sequence Model Trained on a Synthetic Task Kenneth Li, Aspen K. Hopkins, David Bau, Fernanda Viégas, Hanspeter Pfister, Martin Wattenberg (2023) View source → That is a strong hint these systems learn more than just which word usually comes next.

There is more. Big models can pick up a brand-new task from just a couple of examples you type into the prompt, with no extra training at all. Brown et al., 2020 Language Models are Few-Shot Learners Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal (2020) View source → Researchers studying the inner workings have even found specific little "circuits" that pull off this on-the-fly learning. Olsson et al., 2022 In-context Learning and Induction Heads Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan (2022) View source → And the chatbots people actually use, like ChatGPT, get an extra round of training based on human feedback so they follow instructions well, which is why a smaller, carefully trained model can be more helpful than a giant raw one. Ouyang et al., 2022 Training language models to follow instructions with human feedback Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin (2022) View source →

And it reaches well past the chatbot you type into. The same pattern-learning is starting to change real work in a number of fields:

FieldWhat AI is already doing
Biology Predicts the 3D shapes of proteins and how they lock onto DNA and medicines, work that used to take years at the lab bench. Abramson et al., 2024 Accurate structure prediction of biomolecular interactions with AlphaFold 3 Josh Abramson, Jonas Adler, Jack Dunger, Richard Evans, Tim Green, John M. Jumper (2024) Nature View source →
Math Solved problems from the International Math Olympiad, the world championship for top high-school mathematicians, well enough to earn a silver medal. Google DeepMind, 2024 AI achieves silver-medal standard solving International Mathematical Olympiad problems Google DeepMind (2024) View source →
Weather Makes 15-day forecasts that beat the world's best system, in about 8 minutes instead of hours on a supercomputer. Price et al., 2024 Probabilistic weather forecasting with machine learning Ilan Price, Alvaro Sanchez-Gonzalez, Ferran Alet, Tom R. Andersson, Andrew El-Kadi, Matthew Willson (2024) Nature View source →
Drug design A medicine that AI helped invent from scratch was tested in patients with a serious lung disease and improved their breathing, a first for an AI-designed drug. Ren et al., 2025 A generative AI-discovered TNIK inhibitor for idiopathic pulmonary fibrosis: a randomized phase 2a trial Zuojun Ren, Feng Ren, Alex Aliper, Alex Zhavoronkov (2025) Nature Medicine View source →
Medicine In a study of over 100,000 women, AI helping read mammograms caught more cancers without more false alarms, and nearly halved the radiologists' reading workload. Gommers et al., 2026 Interval cancer, sensitivity, and specificity comparing AI-supported mammography screening with standard double reading without AI in the MASAI study: a randomised, controlled, non-inferiority, single-blinded, population-based, screening-accuracy trial Jessie Gommers, Viktoria Hernström, Victor Josefsson, Hanna Sartor, Daniel Schmidt, Kristina Lång (2026) The Lancet View source →
Coding A caution: when skilled programmers used AI coding tools, they actually worked about 19% slower, even though they felt faster. Becker et al., 2025 Measuring the Impact of Early-2025 AI on Experienced Open-Source Developer Productivity Joel Becker, Nate Rush, Beth Barnes, David Rein (2025) View source →

None of this means the model understands things the way you do. It has no eyes or ears, no goals of its own, and nothing inside it is trying to tell the truth, which is exactly why it can hand you a confident, smooth-sounding answer that is completely wrong. (The hype runs the other way too. Some skills that look like they appear out of nowhere as models get bigger turn out to be a trick of how we measured them. Schaeffer et al., 2023 Are Emergent Abilities of Large Language Models a Mirage? Rylan Schaeffer, Brando Miranda, Sanmi Koyejo (2023) View source → ) So the honest take sits in the middle: this is not a thinking mind, but it is a lot more than autocomplete. It is a very large pattern-finder that builds structure we are still working to understand.

If you take one thing from this page, take this. For everyday use, the simplest picture of a chatbot is also the truest: it is guessing the next likely word, over and over. It is not looking facts up, checking them, or working them out the way a calculator does. Almost everything surprising about these tools, the helpful parts and the frustrating ones, comes straight from that.

what this means when you use it

Once you picture it as a next-word guesser, its quirks start to make sense. Here are the ones worth knowing before you lean on one in class.

It can only see what's right in front of it

A chatbot has a kind of short-term memory, and everything has to fit inside it: your question, anything you pasted, and the whole conversation so far. It keeps nothing from earlier chats; each new conversation starts from a blank slate. Even inside one chat, it pays the most attention to the very beginning and the very end of a long prompt and tends to lose the middle. Liu et al., 2023 Lost in the Middle: How Language Models Use Long Contexts Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang (2023) Transactions of the Association for Computational Linguistics View source → So put your most important instructions first or last, and don't be surprised when, deep in a long back-and-forth, it forgets a rule you set near the top.

Its knowledge has an expiration date

A model only knows what it read while it was being trained, and that reading stopped on a certain date. It hasn't seen this week's news, your district's new policy, or a book that came out last month, and on its own it can't go look anything up. That's why tools that "search the web" or "read this PDF" help so much: they hand the model fresh text to work from instead of leaning on its memory. The simple move is to give it the real source yourself. Paste in the actual article or standard, and the answers get far better.

The app is more than the model

"ChatGPT" isn't just a model; it's a whole product built around one. Before you type a word, the company has already given it hidden instructions, trained it to refuse certain things, and maybe switched on extras like web search or image-making. So the same underlying model can act differently from one app to the next, and some of its behavior you can steer with your own instructions while some of it you can't.

It's strangely bad at things that look easy

Because it reads in chunks of words rather than letter by letter, it trips over things a young child finds simple, like counting the r's in "strawberry," rhyming, or spelling a word backwards. Fu et al., 2024 Why Do Large Language Models (LLMs) Struggle to Count Letters? Tairan Fu, Raquel Ferrando, Javier Conde, Carlos Arriaga, Pedro Reviriego (2024) View source → Math is much the same: it's matching what a number "should" look like, not actually adding, so it can get arithmetic wrong with total confidence. Asking it to "show your work" step by step does help. Wei et al., 2022 Chain-of-Thought Prompting Elicits Reasoning in Large Language Models Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia (2022) View source → But for anything that has to be exact, reach for a calculator, not a chatbot.

Confident doesn't mean correct

This is the big one for student research. A model states a wrong answer in the same smooth, sure-sounding voice as a right one, and it will cheerfully invent quotes, statistics, and even fake book titles and citations that look completely real. Huang et al., 2023 A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang (2023) ACM Transactions on Information Systems View source → This isn't a bug waiting to be fixed; it's what a "most likely next word" machine does when it has no real source behind it. Treat how polished an answer sounds as no proof at all, and check every fact and every source against something real.

Ask twice, get two answers

There's a bit of chance built in, so the same question can come back with different answers, and even small changes in how you word a prompt can shift the result. Two students asking nearly the same thing may get different replies. It's safest to treat whatever it gives you as a first draft to question, not a fixed reference to trust.

it's bigger than the tech

None of this is purely technical. Because the model learned from human writing pulled off the internet, it picks up our blind spots and biases along with our knowledge, and repeats them in that same calm, neutral-sounding voice. Bender et al., 2021 On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, Shmargaret Shmitchell (2021) View source → The polished assistant also leans on a lot of hidden human labor and carries a real energy and water cost. Image generators work differently again: they build pictures from learned patterns rather than finding real ones, which is exactly what makes convincing deepfakes possible. And one rule worth saying plainly: anything you type into a public chatbot can be stored or read by people, so don't paste in student names, grades, or other private details. Strip those out first. U.S. Dept. of Education, 2023 Artificial Intelligence and the Future of Teaching and Learning: Insights and Recommendations U.S. Department of Education, Office of Educational Technology (2023) View source → We dig into bias, deepfakes, labor, and the environment over in the handbook's ethical dilemmas.

A basic understanding of this process is important for educators and students because it demystifies the "black box" of how AI models operate, helping us recognize that AI doesn't actually understand what it's doing. When we acknowledge this, it's much easier to dissect how and why AI should and can be used, while knowing its limitations.

a note on "AI psychosis"

You may have seen headlines about "AI psychosis." It is an informal term, not a medical diagnosis, that some clinicians and reporters use for a small but growing number of cases where heavy chatbot use seemed to feed a person's delusional or paranoid thinking. Østergaard, 2023 Will Generative Artificial Intelligence Chatbots Generate Delusions in Individuals Prone to Psychosis? Soren Dinesen Ostergaard (2023) Schizophrenia Bulletin Editorial that first proposed the chatbot-psychosis hypothesis: the realistic, agreeable nature of AI chatbot conversations could fuel delusional thinking in individuals prone to psychosis. View source → Hudon & Stip, 2025 Delusional Experiences Emerging From AI Chatbot Interactions or "AI Psychosis" Alexandre Hudon, Emmanuel Stip (2025) JMIR Mental Health Peer-reviewed analysis of the emerging, non-clinical 'AI psychosis' phenomenon, framing it as a 'digital folie a deux' in which chatbots reinforce rather than challenge delusional beliefs in... View source → It does not mean chatbots make healthy people lose touch with reality. The reported cases mostly involve people who were already vulnerable, and researchers are still debating how common it is.

Why would a chatbot make distorted thinking worse? Because of the very thing this whole page is about. A chatbot is built to produce agreeable, confident, supportive-sounding text. It mirrors back whatever you bring to it, with no real understanding and no way to check it against reality. Researchers even have a name for this habit, sycophancy, agreeing with the user instead of telling the truth. Sharma et al., 2023 Towards Understanding Sycophancy in Language Models Mrinank Sharma, Meg Tong, Tomasz Korbak, David Duvenaud, Amanda Askell, Samuel R. Bowman, Ethan Perez (2023) Human feedback fine-tuning can produce sycophancy: AI assistants from OpenAI, Anthropic, and Meta tend to give responses that match user beliefs over truthful ones, driven in part by human preference... View source → In spring 2025, OpenAI had to roll back a version of ChatGPT that had become so eager to please it was "overly supportive but disingenuous." OpenAI, 2025 Sycophancy in GPT-4o: What happened and what we're doing about it OpenAI (2025) OpenAI acknowledges that an April 25, 2025 GPT-4o update made ChatGPT overly sycophantic (overly supportive but disingenuous) due to over-weighting short-term user feedback, and announces it rolled... View source →

This is exactly why understanding how AI works matters, for teachers and students alike. If you know the tool is a prediction-and-agreement machine and not a wise, neutral friend, it is easier to keep perspective, to notice when it is just telling you what you want to hear, and to not lean on it like a therapist or an authority. The American Psychological Association warns that these chatbots are not a substitute for real care, and that young people are especially likely to over-trust them and form emotional attachments. American Psychological Association, 2025 Health advisory: Use of generative AI chatbots and wellness applications for mental health American Psychological Association (2025) APA health advisory cautioning that general-purpose generative AI chatbots are not a substitute for qualified mental health care and can amplify existing vulnerabilities, with adolescents at... View source → The healthiest habit is the one this page is built around: stay curious, stay a little skeptical, and keep your own judgment in charge.

out-guess the machine

try with students

A quick, hands-on way for students to feel how an AI really works (it guesses the next word) and to practice doing the one thing that matters most: questioning its answers instead of trusting them.

  1. guess like the machine 4 min

    Write on the board: "The best pet to own is ___." In pairs, have students jot the three words they think most people would put next, ranked 1, 2, 3. Share a few out loud. Then drop the twist: that is exactly what the AI does, it guesses the word most people would say next.

  2. run the real thing 4 min

    Open the "you be the model" tool above and press predict a few times. Did its top picks (Dog, Cat) match your class? Run it again and watch the sentence come out a little differently. Now ask the room: where did the AI get "Dog" and "Cat"? (From what huge numbers of people wrote online.) And whose answers might be missing? What would be the "best" pet in a different family, country, or culture? The most popular answer is not the only right one.

  3. see a classic glitch optional · AI use

    If you already use a chatbot in class, try one of these together and see what it does:

    • "How many times does the letter r appear in strawberry?"
    • "Which is bigger, 9.11 or 9.9?"
    • "How many words were in your last answer?"
    • Multiply two large numbers, or ask what today's date is
    • Ask it to reverse a word or count the syllables in one

    Some models slip; others have learned to get these exact ones right. Either way, ask the class why a tool that can write a whole essay might fumble something this simple. The reason is back near the top of this page: the model reads words in chunks, not letter by letter, and it predicts likely text rather than reading letters, counting, or checking facts.

  4. last word 2 min

    Each student finishes two sentences, out loud or on a sticky note: "AI could be helpful for ______." and "I should never just trust it for ______."

back to the handbook