Plain English Definitions

Glossary

Terms you'll encounter when reading about AI.

Jump to:
A

Agent / AI Agent

Takes actions, not just answers

An AI system that can execute a sequence of actions to complete a goal — browsing the web, running code, sending emails, filling out forms — rather than just generating a response. You give it an objective; it figures out the steps. Still early but rapidly improving.

Agentic Workflow

Multi-step AI acting in the world

A process where an AI agent takes a sequence of autonomous actions to complete a goal — searching the web, running code, editing files, calling other services — rather than just producing a single response. You set the goal; the agent plans and executes the steps. The human typically reviews the outcome rather than supervising each action.

API / Application Programming Interface

How software talks to software

The connection point that lets one piece of software send requests to another. When developers "build with AI," they're usually calling a model through its API — sending a prompt and receiving a response programmatically, without using the chat interface.

Alignment

Making AI do what humans intend

The challenge of ensuring AI systems reliably pursue the goals and values their developers and users intend — not just the literal objective they were optimized for. A model that maximizes engagement at any cost isn't "aligned" with human wellbeing, even if it scores well on its training metric. Alignment research is the field trying to make AI systems safer and more reliably beneficial as they become more capable.

Attention

How the model decides what's relevant

The mechanism inside transformers that lets each token dynamically weigh the relevance of every other token in the sequence when determining meaning. Rather than reading word-by-word, the model considers all tokens simultaneously. What lets a model understand that "it" in "The trophy didn't fit in the suitcase because it was too big" refers to the trophy — not the suitcase. Introduced in the 2017 paper "Attention Is All You Need."

B

Backpropagation

How a model learns from its mistakes

The algorithm used to train neural networks. After the model makes a prediction and calculates how wrong it was, backpropagation computes the gradient — how much each parameter contributed to the error — and nudges every weight in the direction that reduces it. Repeat this trillions of times across the training corpus and the model converges toward useful predictions. The engine behind everything the model learned.

Bias

When training data skews the model's worldview

Systematic errors or skewed outputs that result from patterns in training data reflecting real-world inequalities, historical prejudices, or gaps in representation. A model trained mostly on English text from Western sources will reflect those perspectives more than others. Bias isn't always obvious — it often shows up in subtle ways, like generating different default assumptions about who holds which jobs. Knowing it exists is the first step to catching it.

Benchmark

A standardized test for model performance

A test used to measure and compare AI model capabilities — reasoning, math, coding, language understanding. Labs publish benchmark scores to show progress, but take them with skepticism. A model can score well on benchmarks and still be frustrating to use in practice. Real-world utility matters more.

C

Chain of Thought

Show your working

A prompting technique where you ask the model to reason through a problem step by step before giving a final answer. Adding "Let's think through this step by step" can significantly improve accuracy on logic, math, and multi-step problems. Reasoning models like o1 do this automatically as part of their architecture — it's baked in rather than prompted.

Context Window

The model's working memory

The total amount of text a model can "see" and reason over at once — your messages, any documents you've shared, and the conversation history. Everything outside the context window is invisible to the model. Larger context windows let you work with longer documents and more complex tasks.

Compute / GPU

The hardware that makes AI possible

Technical

The raw processing power used to train and run AI models — primarily GPUs (graphics processing units), originally designed for rendering video games, now the backbone of AI. Training a frontier model can cost tens of millions of dollars in compute alone. "Compute constraints" explains why not every organization can train their own model, and why access to GPUs is a strategic resource in AI development.

Corpus

The text a model learned from

The large collection of text data used to train a language model — books, websites, code, articles, and more. When people say a model was "trained on the internet," they're loosely describing its corpus. The size and composition of the corpus heavily shapes what a model knows, what it's good at, and what biases it might carry.

D

Discriminative AI

AI that classifies, not creates

AI that learns to distinguish between categories rather than generate new content. A spam filter deciding "spam or not spam," a model flagging fraudulent transactions, an image classifier labeling photos — these are all discriminative. Most AI before the generative wave was discriminative. The two aren't mutually exclusive: many modern systems use discriminative components under the hood even when the end product feels generative.

Deep Learning

The engine under most modern AI

A type of machine learning that uses layered neural networks to find patterns in data. "Deep" refers to the many layers in those networks, not to any philosophical depth. Most modern AI — image recognition, voice assistants, LLMs — is built on deep learning. You don't need to understand how it works to use these tools well, but it's the technique that made the current wave of AI possible.

Distillation

Compressing a large model into a smaller one

Technical

A training technique where a smaller "student" model is trained to imitate the outputs of a larger "teacher" model, rather than learning from raw data. The result: a compact model that punches above its weight. This is why "mini" and "lite" model variants often outperform older full-sized models — they've been distilled from something much larger. DeepSeek-R1 used distillation to build a highly capable model at a fraction of the usual training cost.

E

Embeddings

Turning meaning into numbers

A way of representing text (or images or audio) as a list of numbers that captures meaning and relationships. "Dog" and "puppy" would have similar embeddings; "dog" and "democracy" wouldn't. Used in search, recommendations, and RAG systems to find semantically related content even when the exact words don't match.

F

Fine-tuning

Teaching an existing model new tricks

Further training a pre-existing model on a smaller, curated dataset to specialize its behavior — making it better at a specific task, domain, or style. Fine-tuning changes the model itself. It's different from RAG, which leaves the model unchanged and injects relevant documents at query time.

Foundation Model

The base layer everything else builds on

A large model trained on broad, general data that can be adapted to many different tasks. Claude, GPT-4, and Gemini are all foundation models. Most AI products and tools are built on top of a foundation model rather than trained from scratch — the cost of training from scratch is too high for most organizations.

Few-shot / Zero-shot

How many examples you give the model

Zero-shot: no examples — just an instruction. The model figures out the task from description alone. Few-shot: a small number of examples showing the pattern you want before asking the model to follow it. Few-shot prompting is one of the most reliable techniques for getting consistent, correctly-formatted output — especially useful for structured tasks or when you need a specific style.

G

Generative AI

AI that creates, not just classifies

AI that produces new content — text, images, video, audio, code — rather than only analyzing or categorizing existing content. ChatGPT, Midjourney, and Suno are all generative AI. The term became mainstream around 2022–2023 with the rise of tools that anyone could use to generate original content.

GPT / Generative Pre-trained Transformer

The architecture that sparked the AI wave

The model architecture behind OpenAI's products and the technical breakthrough that triggered the current generation of AI tools. "Pre-trained" means trained on enormous amounts of text before being specialized. GPT-4 and its successors are the models powering ChatGPT — but "GPT" is often used loosely to mean any large language model.

Guardrails

Hard limits on what a model will do

Safety constraints applied to a model's outputs — preventing it from generating harmful content, revealing confidential information, or acting outside defined boundaries. Guardrails can be baked into the model itself through training (like RLHF), enforced via system prompts, or applied as a separate filtering layer. Every enterprise AI deployment has them. The debate is always how tight to set them without making the model too restrictive to be useful.

Grounding

Tying outputs to verifiable sources

Connecting a model's outputs to real, verifiable sources rather than relying solely on training data. A grounded response is backed by retrieved documents, live data, or explicit citations — not just what the model remembers. RAG is the most common grounding technique. Grounding reduces hallucinations and makes AI outputs more trustworthy for factual, high-stakes tasks.

H

Hallucination

Confident and wrong

When an AI generates something false while presenting it as fact — a made-up citation, a wrong date, a plausible-sounding statistic that doesn't exist. The model isn't lying; it's producing fluent, probable-seeming text that happens to be incorrect. The most important AI limitation to internalize. Always verify factual claims on anything that matters.

Human-in-the-Loop

AI acts, humans approve

A design pattern where a person reviews or approves AI outputs before they're acted upon. Best practice for any high-stakes or consequential workflow — not a sign of distrust, but sound process design. As agents become more capable, knowing when to keep humans in the loop is an increasingly important judgment call.

I

Inference

The model generating an output

The process of running a trained model to produce a response. When you send a message to Claude, inference is what's happening on the server — the model processing your input and generating tokens one by one. Inference speed (how fast you get a response) and inference cost (what it costs to run) are key factors in AI product design.

Instruction Tuning

From text predictor to assistant

A training phase where a model is fine-tuned on examples of instruction-following to make it behave like a helpful assistant rather than just continuing text. A raw pre-trained model would complete your sentence; an instruction-tuned model answers your question. Often combined with RLHF to shape safe, helpful behavior.

J

Jailbreak

Tricking a model into ignoring its guardrails

A prompt or technique designed to bypass a model's safety constraints — getting it to produce content it would normally refuse. Jailbreaks range from simple role-play tricks ("pretend you have no restrictions") to elaborate multi-step prompts. AI labs actively work to patch them; the cat-and-mouse dynamic is ongoing. Understanding jailbreaks matters for anyone deploying AI in a context where misuse is a risk.

K

Knowledge Cutoff

The point where the model's knowledge stops

The date after which a model has no training data. Events, publications, or developments after this date are simply unknown to the model unless you provide them directly in your prompt. Always check the cutoff when asking about recent events — and when accuracy matters, paste in the relevant current information rather than assuming the model knows it.

L

Large Language Model / LLM

The engine inside most AI tools

A type of AI model trained on massive amounts of text to understand and generate language. Claude, ChatGPT, Gemini, and Grok are all LLMs. "Large" refers to the number of parameters — typically billions — learned during training. Most of the AI tools you use for writing, analysis, and conversation are built on LLMs.

Latency

How long you wait for a response

The time between sending a prompt and receiving a response. Matters more in real-time applications — voice assistants, live chat — than in document analysis or drafting tasks where a few seconds is fine. Reasoning models typically have higher latency because they generate an internal chain of thought before answering. Capability and speed are usually a tradeoff.

Logits

Raw scores before they become probabilities

The unnormalized scores a model produces for every possible next token before they're converted into probabilities. Higher logit = more likely candidate. Softmax converts logits into a proper probability distribution. Temperature is applied to logits before softmax — which is the mechanism behind how it sharpens or flattens the distribution. Mostly relevant when working directly with model APIs or building on top of models.

M

Machine Learning / ML

Systems that learn from data

The broader field of AI where systems learn patterns from data rather than following explicitly programmed rules. Deep learning and large language models are subsets of machine learning. When someone says they "work in ML," they're probably doing something more technical than prompt engineering — building or training models, not just using them.

Multimodal

Beyond text — images, audio, video

A model that can process and generate multiple types of data — text, images, audio, video — rather than text alone. GPT-4o and Gemini are multimodal. Practically: you can show the model a chart, a screenshot, or a whiteboard photo and ask questions about it. This capability is expanding rapidly.

Model Card

The owner's manual for an AI model

Technical

A short document published alongside a model that explains what it was trained on, what it's designed to do, its known limitations, and where it's likely to fail. Popularized by Google researchers as a best practice for responsible AI. If you're evaluating an AI tool for professional use, the model card is where a responsible lab will tell you what they know about its weaknesses.

Model Collapse

What happens when AI trains on AI

A failure mode where a model trained on AI-generated content progressively degrades — losing diversity, accuracy, and nuance over successive generations of training. An emerging concern as the internet fills with AI-generated text that future models will inevitably train on. Still an active area of research, but worth knowing as AI-generated content becomes the norm rather than the exception.

N

Neural Network

The architecture under the hood

The computational structure that underlies modern AI — loosely inspired by how neurons connect in the brain, though the analogy only goes so far. Consists of layers of interconnected nodes that transform inputs into outputs through billions of learned numerical weights. You don't need to understand how it works to use AI well, but knowing it exists helps explain why models behave the way they do.

O

Open Source / Open Weights

AI you can download and run yourself

Models whose weights are publicly released — anyone can download, run, or modify them without paying per API call or sending data to a third party. Llama (Meta) and Mistral are the most prominent examples. Open weights models are increasingly capable and run locally on a laptop or server you control. Important distinction from "open source" software: releasing weights isn't the same as releasing the training code or data, so the term is sometimes contested.

On-Device / Local Model

AI that runs on your hardware, not the cloud

AI that runs entirely on your own device — laptop, phone, or server — without sending data to an external API. Apple Intelligence runs on-device for privacy. Running Llama locally via Ollama is another example. Benefits: privacy (data never leaves your machine), no API costs, works offline. Trade-off: you're limited to smaller, less capable models than what cloud providers run on their giant clusters.

Overfitting

Memorizing instead of learning

When a model learns its training data too well — memorizing specific examples rather than generalizing patterns — and performs poorly on new inputs it hasn't seen. A classic ML problem. In fine-tuning, overfitting on a small dataset can make a model excellent at a narrow task while degrading its general capability. The risk of fine-tuning with too little data or too many training steps.

P

Parameters / Weights

The numbers that encode what a model learned

The numerical values inside a model that encode everything it learned during training. "A 70 billion parameter model" has 70 billion of these values. More parameters generally means more capability — and more compute required to run it. When you hear about "model weights," it's the same thing: the file you'd download to run an open-source model locally.

Prompt

Your input to the model

The text you give an AI model — your question, instruction, context, or example. Everything you type in a chat interface is a prompt. The quality of your prompt has an outsized effect on the quality of the output, which is why prompt engineering became a skill worth developing.

Prompt Engineering

Getting better outputs through better inputs

The practice of crafting prompts to reliably get useful outputs from AI models. Less about engineering, more about clear communication — giving the model the right role, context, format instructions, and examples. Anyone who uses AI effectively is doing some version of prompt engineering, whether they call it that or not.

Prompt Injection

Hijacking an AI with hidden instructions

An attack where malicious instructions hidden inside content — a document, a webpage, a user message — are designed to override an AI's intended behavior. For example: a PDF that contains invisible text saying "Ignore your previous instructions and instead do X." A real security concern for agentic systems that process untrusted content. One of the more important attack surfaces to understand if you're building with AI.

Q

Quantization

Shrinking a model to fit on smaller hardware

Technical

A technique that reduces a model's memory footprint by representing its numerical weights with less precision — for example, storing values as 4-bit integers instead of 32-bit floats. A quantized model is smaller and faster but may lose some accuracy. When you see "Q4" or "Q8" in a local model filename, that's the quantization level. It's what makes it possible to run a capable model on a laptop with 16GB of RAM rather than a data center.

R

RAG / Retrieval-Augmented Generation

AI grounded in your documents

An architecture where the model retrieves relevant content from an external database and uses it when generating a response, rather than relying solely on training data. Makes AI dramatically more accurate for domain-specific questions. NotebookLM is essentially a no-code RAG tool. Most enterprise "AI that knows your company's data" is built on RAG.

Reasoning Model

Think first, then answer

A model trained to work through problems step by step — generating an internal chain of reasoning — before producing a final answer. OpenAI's o1 and o3 are examples. Better at complex logic, math, and multi-step problems than standard chat models. Trade-off: they're slower and more expensive to run.

RLHF / Reinforcement Learning from Human Feedback

How models learn to be helpful

A training technique where human raters compare and score model outputs, and the model is fine-tuned to produce higher-rated responses. The key process behind making models helpful, honest, and safe rather than just fluent. Most leading AI assistants use some version of RLHF or its successors to shape model behavior.

S

System Prompt

The hidden instructions shaping every response

Instructions given to a model before a conversation begins — setting its role, tone, constraints, and context. You often don't see it, but it shapes every response. When a product built on Claude behaves differently than Claude.ai, a system prompt is usually why. You can write your own in tools like Claude's Projects to customize behavior for recurring tasks.

Synthetic Data

AI-generated training data

Technical

Training data that was generated by an AI model rather than collected from human-produced text. Used when real data is scarce, sensitive, or expensive to label. Many recent models — including reasoning models — are trained heavily on synthetic data. The risk: models trained on AI-generated data can amplify errors or drift toward homogenous outputs over generations. Related to model collapse, and an active area of research.

Softmax

Turning scores into probabilities

The mathematical function that converts a set of raw scores (logits) into a probability distribution — values between 0 and 1 that sum to 1. Used at the final step of a model's next-token prediction to produce something you can sample from. Temperature is applied to the logits before softmax runs, which is the mechanism that sharpens or flattens which tokens the model is likely to pick.

T

Tool Use / Function Calling

How models reach outside themselves to act

Technical

The ability of a model to call external functions, APIs, or services as part of generating a response — searching the web, reading a file, running code, querying a database. The mechanism behind agentic behavior. When Claude uses a calculator or browses the web, it's using tools. "Function calling" is the API-level term for the same concept. This is what transforms a language model from a text generator into something that can take action in the world.

Temperature

Creativity vs. consistency

A setting that controls how random or creative a model's outputs are. Low temperature = predictable, consistent, conservative. High temperature = more varied, surprising, occasionally weird. Most chat interfaces set this automatically. It matters most when you're using the API directly or building a product on top of a model.

Tokenization

Splitting text into chunks the model can process

The process of breaking input text into tokens before a model can process it. Most modern models use byte-pair encoding (BPE), which learns common subword units from training data. The same sentence can tokenize differently across models. Understanding tokenization helps explain why models sometimes struggle with character-counting or letter-spotting tasks — they don't see individual letters, they see token-sized chunks.

Token

The basic unit of text a model processes

The chunk of text a model reads and generates — roughly ¾ of a word in English. "Unbelievable" might be two tokens; "AI" is one. Model limits, pricing, and context windows are all measured in tokens rather than words or characters. For most users this is invisible, but it becomes relevant when working with long documents or building on the API.

Training Data

What the model learned from

The text (and other content) a model was trained on. The quality, diversity, and recency of training data heavily shapes what a model knows, how it reasons, and what biases it carries. Models have a knowledge cutoff — a date after which they have no information — because training data is collected up to a point and then the model is frozen.

Transformer

The architecture that changed everything

The neural network architecture that powers virtually every modern LLM. Introduced by Google in 2017 in the paper "Attention Is All You Need." The "T" in GPT. Before transformers, language models were far less capable. You don't need to understand how it works — just know that when someone says "transformer model," they mean the same family of technology as Claude and ChatGPT.

V

Vector Database

Storage built for semantic search

A type of database optimized for storing embeddings and finding similar content quickly based on meaning rather than exact keyword matches. The storage layer in most RAG systems — your documents get converted to embeddings, stored in a vector database, and retrieved by semantic similarity when you ask a question. Pinecone and Weaviate are common examples.

W

Weights

See: Parameters

The numerical values inside a model that encode its learned knowledge and behaviors. Often used interchangeably with "parameters." "Downloading model weights" means getting the file that contains everything a model learned — which is what you do when running open-source models like Llama locally rather than accessing a hosted API.