TL;DR: An embedding turns a piece of text into a long list of numbers, a point in space. Text that means similar things lands in nearby spots, and the AI decides “do these two mean the same thing” by checking whether the two points sit in the same direction: the angle between them, called cosine similarity. That’s how search finds a page sharing zero words with your query. It’s a genuinely useful trick, and as the famous “king − man + woman = queen” example shows, a bit more of a magic show than the demos let on.
Search your notes for “how to make my laptop quieter” and a decent search engine hands you a page titled “reducing fan noise on a notebook.” Not one word in common. No laptop, no quieter. Yet it’s exactly the page you wanted.
Keyword matching can’t do that. It needs the words to overlap. So what’s doing the matching?
The answer is embeddings. It’s simpler than it sounds, and the most famous demo of it is half a con. We’ll get to that.
Turn every sentence into an arrow
The move underneath is this: take a piece of text and turn it into a list of numbers. Not one number — a long list. OpenAI’s current small model gives you 1,536 numbers per input; the large one, 3,072.
A list of numbers is just coordinates. Two numbers put a point on a page (x, y). Three put it in a room. 1,536 put it in a space you can’t picture, but the math doesn’t care that you can’t. Every sentence you embed becomes one pin stuck somewhere in that space.
Here’s the whole trick in one line: the model places the pins so that things that mean similar things land near each other. “Reduce fan noise on a notebook” gets a pin right next to “make my laptop quieter” even though they share no words, because during training the model saw them turn up in the same kinds of contexts. (That “same contexts” idea is old; it’s the engine behind the early word2vec models too.)
This is a different job from tokenizing. Tokens are how the text gets chopped into chunks to read. Embeddings are what you get after: each chunk (or whole sentence, or whole document) handed a location on the meaning-map.
Measuring meaning is measuring an angle
So you’ve got two pins. How do you ask “do these mean roughly the same thing”? You check whether they point the same way from the center.
Picture an arrow from the origin to each pin. Point nearly the same direction, the texts mean nearly the same thing. At right angles, unrelated. Opposite ways, opposed. The number that captures this is the cosine of the angle between them: cosine similarity. Same direction is 1, perpendicular is 0, and it bottoms out at −1.
Why the angle instead of the plain distance between the pins? Because direction survives length. A three-line note and a long article about the same thing should still count as similar, and comparing direction holds up where comparing distance wobbles. Conveniently, OpenAI’s embedding models hand back vectors already scaled to length 1, so the cosine is just the dot product: multiply the two lists pairwise, add them up, done.
A 2D stand-in, to get the feel:
import numpy as np
def cosine(a, b):
a, b = np.array(a), np.array(b)
return a @ b / (np.linalg.norm(a) * np.linalg.norm(b))
laptop = [0.9, 0.1] # pretend this is "make my laptop quieter"
fan = [0.8, 0.2] # "reduce fan noise on a notebook"
banana = [0.1, 0.95] # "banana bread recipe"
cosine(laptop, fan) # ~0.99 -> basically the same direction
cosine(laptop, banana) # ~0.21 -> off at a wide angle
Real vectors have 1,536 dimensions, not 2, and nobody labels what each one means. But the operation is exactly this, only wider.
One aside worth keeping: you can often chop the tail off these vectors (keep the first few hundred of the 1,536 numbers) and still get most of the meaning. OpenAI’s newer models are trained for that on purpose and call it Matryoshka, after the nesting dolls.
The famous trick is mostly a magic show
You’ve probably seen the line that makes embeddings sound magical: take the vector for “king”, subtract “man”, add “woman”, land on “queen.” Word math. Meaning as arithmetic.
It’s real, but it’s dressed up. Here’s the part the demos skip. When you compute king − man + woman and ask for the nearest word, the standard code throws out the three words you put in. Leave them in, and the nearest vector to king − man + woman is usually king itself.
Go back through the classic word2vec examples and a lot of the crowd-pleasers only land with that quiet exclusion in place. So the honest version: the vector arithmetic nudges you into roughly the right neighborhood, and then a filter that hides the obvious answer takes credit for the punchline. There’s a real regularity in the space (there’s even a tidy mathematical reason the offset works at all), just not the clean “meaning = algebra” the slide implies.
I raise this not to be a killjoy but because it’s the tell for how to read all of this. Embeddings capture statistical similarity: what turns up near what. That’s a good deal less than the word “understands” implies, and it’s easy to forget when a system leans on the map and calls the result understanding.
Where you actually meet this
Most of the time embeddings work out of sight:
- Semantic search: the laptop-and-fan case. You search by meaning, not by keyword.
- RAG: when a chatbot “looks something up” in your documents, it usually embeds your question, finds the nearest chunks on the map, and pastes them into the context window before answering.
- Dedup, clustering, recommendations: “find me more like this.”
And the same wiring carries the same warning. Because the map is built from how words co-occur in human writing, it inherits human patterns, including the ugly ones. Two sentences can sit close because they truly mean the same thing, or just because they lean on the same stereotype. The geometry can’t tell those two apart, and you can’t tune that out; it comes in with the method, which only ever knew what-sits-near-what.
So “does the AI understand that these two mean the same?” comes down to something almost embarrassingly literal: it turned both into arrows and checked the angle. That single move, run at a scale you can’t picture, is most of what “semantic” anything does today: search, retrieval, “more like this.” Geometry doing an impression of comprehension, good enough most of the time that it’s easy to forget which one it is. The place it trips is exactly where those two come apart.



