TL;DR: A list comprehension like
[n*n for n in range(5)]does the same thing as a small for-loop. It just writes the result first and the source second, which is the opposite of the order you’d write the loop in. If something trips you up, it’s probably that reversal, not the concept. Translate it back into a for-loop and most of the mystery tends to go away.
Seeing [x for x in data if x > 0] for the first time and pausing for a second seems like a pretty normal reaction. It doesn’t look like the statements you’ve been writing. No colon, no indentation, and the for has wandered into the middle. Plenty of tutorials just say “this is a list comprehension, it’s very Pythonic” and move on, but I’m not sure that line actually helps anyone read the thing.
So instead of memorising the syntax, it might be easier to start from the for-loop you probably already know.
The same thing, two ways
Say you want a list of squares from 0 to 4. With a for-loop it looks roughly like this:
squares = []
for n in range(5):
squares.append(n * n)
## [0, 1, 4, 9, 16]
Three lines. Make an empty list, run the loop, append one at a time. Nothing fancy, and it runs.
The comprehension version:
squares = [n * n for n in range(5)]
## [0, 1, 4, 9, 16]
One line, same result. That’s not just me saying so. There’s a small test at the bottom that feeds both versions into assertEqual, and they come out equal. So I’d say it’s safe to treat the comprehension as shorthand for that loop, because that’s more or less what it is, squeezed onto one line.
How to read one: translate it back
The thing that matters here, I think, is reading order. A comprehension looks like this:
[ expression for var in source ]
n * n for n in range(5)
Three blocks:
for n in range(5), same as the start of a normal loop, “pull items out of range(5), call each one n”n * n, what each round produces, basically the thing insideappend()- the outer
[ ], collect it all into a list
My guess is that people get stuck because the eye expects “for first, then do something”, but a comprehension is the other way round: result first, then where it came from. Reading it back-to-front can help: glance at the for ... in ... in the middle to see the source, then look back at the front block. After a few of these it stops feeling weird, at least it did for me.
If you’ve read the earlier Python Chunks post, it quietly used a comprehension to slice a list ([input_list[i:i+n] for i in range(0, len(input_list), n)]) without really explaining it. That line might read a little easier now.
Filtering: put the if at the end
You can tack an if onto the end as a filter. Say you only want even numbers:
evens = [n for n in range(10) if n % 2 == 0]
## [0, 2, 4, 6, 8]
Translate it back and it’s roughly:
evens = []
for n in range(10):
if n % 2 == 0:
evens.append(n)
The trailing if acts like a gate: produce a value if the condition holds, skip it otherwise. So the result comes out shorter than the source. This is the use I reach for most often, pulling the few items that match out of a pile.
The part people mix up: trailing if vs leading if/else
This is the bit I find easiest to confuse, so it’s worth pulling apart. The if above sits at the end and asks “keep this one or not”. But the moment you write if/else, it jumps to the front and means something different:
labels = ["fizz" if n % 2 == 0 else "buzz" for n in range(6)]
## ['fizz', 'buzz', 'fizz', 'buzz', 'fizz', 'buzz']
Six results, none dropped. That’s because "fizz" if ... else "buzz" is a conditional (ternary) expression. It is the “expression” block, and it always returns one value, just a different one depending on the condition. So it’s not filtering; it’s more like “produce one for every item, they just look different”.
A rough way to keep them apart:
[x for x in xs if cond],ifat the end, filters, result may be shorter[a if cond else b for x in xs],if/elseat the front, produces every item, same length
Mixing these two up seems fairly common, and I still pause sometimes to work out which one I’m looking at. When I genuinely can’t tell, translating it back to a loop usually settles it.
Two side notes: the loop variable doesn’t leak, and… it’s not really faster
Here’s something that maybe doesn’t get noticed much: the loop variable inside a comprehension doesn’t survive afterwards. Compare with a plain for-loop:
for m in range(3):
pass
print(m) ## 2 — m is still around, in the outer scope
_ = [n * n for n in range(3)]
print(n) ## NameError: name 'n' is not defined
A normal loop leaves m lingering in the current scope (that’s been true throughout Python 3), while the comprehension’s n is cleaned up once it finishes. One fewer variable you might accidentally reuse, a small upside, though honestly not something you’d usually think about.
On performance, I want to flag one thing, because it seems to get passed around a lot. Older articles like to say comprehensions are “twice as fast”, but that figure is probably quite a few years old. On Python 3.14.3, timing it with timeit (range(1000), 20,000 runs), a comprehension against an append loop comes out roughly like:
comprehension: 0.52s
append loop : 0.58s
loop / comp : about 1.1x
Only around ten percent, much smaller than the legend. My guess is that recent CPython’s adaptive specializing interpreter (introduced in 3.11) optimises the append loop too. So rather than reaching for a comprehension “because it’s faster”, I’d lean on “because it reads more cleanly”. That gap probably won’t show up in real code anyway. I haven’t profiled this across many machines, so take the exact number as one data point rather than a universal constant.
Not just lists: dicts and sets too
Swap the outer brackets and the same ordering carries over to dictionaries and sets. Dict comprehensions came in via PEP 274; the set comprehension syntax was added later, in Python 3.0 / 2.7. Slightly different origins, though they feel consistent to write.
word = "mississippi"
## set comprehension — dedupes along the way
unique = {ch for ch in word}
## {'m', 'i', 's', 'p'}
## dict comprehension — key: value
counts = {ch: word.count(ch) for ch in set(word)}
## {'m': 1, 'i': 4, 's': 4, 'p': 2}
One value inside { } gives you a set; a key: value pair gives you a dict. Reading them works the same as a list, nothing new to pick up. The dict form is the one I use most, usually to zip two lists into a lookup table.
A bit further: flattening nests, and the walrus
Two that come up fairly often but are easy to write badly.
Flattening a 2-D list. Multiple for clauses read left to right, in the same order as nested loops:
matrix = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
flat = [x for row in matrix for x in row]
## [1, 2, 3, 4, 5, 6, 7, 8, 9]
Read it as for row in matrix (outer), for x in row (inner), then x (produce). The written order matches outer-to-inner just like nested loops, so if the single line throws you, unpacking it in your head helps. If it stays confusing, I’d just write the plain loop — no need to force one line.
The walrus operator := came in with PEP 572, Python 3.8 onward. When you want to filter and reuse the value you computed during filtering, it lets you compute it once:
data = [" 10 ", "x", " 20", "", "30 "]
def parse(s):
s = s.strip()
return int(s) if s.isdigit() else None
cleaned = [v for s in data if (v := parse(s)) is not None]
## [10, 20, 30]
(v := parse(s)) stores the result in v and lets the trailing if test it, so parse() doesn’t run twice. Handy, though to be honest this is also about the point where readability starts sliding, so whether to use it is a judgement call — I tend to hesitate a little.
When a comprehension probably isn’t the move
More comprehensions isn’t better, and forcing one can make things harder than a loop. A few cases where I’d lean back toward a plain for-loop:
- Nesting past two levels, or several
ifs stacked in: too much on one line, and you (or future you) might not parse it later. If it’s unreadable, the comprehension has kind of lost the point. - Side effects each round: writing files,
print, firing a request. Comprehensions are really meant for building a new collection; writing[do(x) for x in xs]just for the side effect also builds a list you didn’t want, which is a bit wasteful. - Logic that needs intermediate variables or try/except: those don’t fit inside a comprehension, and cramming them in usually looks worse.
A rough test: could the person next to you read this line at a glance? If not, splitting it out is probably the kinder choice. The Zen of Python line “Readability counts” feels, to me, worth a bit more than the speed here.
Wrapping up
A list comprehension is maybe less mysterious than it looks — mostly it’s shorthand for a for-loop, with the difference being reading order: result first, source second. Find the source in the middle for ... in ..., check the front block for what each item becomes, and watch whether the if sits at the end (filter) or the front as if/else (produce every item). Once those click, dicts, sets, nesting, and the walrus are pretty much the same ordering extended — not really separate things to learn.
If you want to wander through other small Python pieces, these are nearby: Python lambda (the anonymous function comprehensions often appear with), Python Iterable (what can actually go in the “source” slot), Python f-string (another way to shorten code), and the practical Python Chunks for slicing data.
There’s also a Traditional Chinese version of this post if that reads more comfortably.
All examples were run on Python 3.14.3; the performance figures come from timeit and will vary by machine, so treat them as directional rather than exact. Primary sources: PEP 202 — List Comprehensions, Python tutorial 5.1.3.
Appendix: the test file I mentioned
Back when I said the loop and the comprehension give the same result, this is what checked it. It’s short. On Python 3.14.3, python -m unittest runs green.
import unittest
def by_loop():
out = []
for n in range(5):
out.append(n * n)
return out
def by_comprehension():
return [n * n for n in range(5)]
class TestSame(unittest.TestCase):
def test_two_ways_match(self):
# same thing, two ways — results should match
self.assertEqual(by_loop(), by_comprehension())
self.assertEqual(by_comprehension(), [0, 1, 4, 9, 16])
def test_tail_if_filters(self):
# trailing if filters; evens survive
self.assertEqual([n for n in range(10) if n % 2 == 0], [0, 2, 4, 6, 8])
def test_if_else_keeps_length(self):
# leading if/else produces every item; length unchanged
labels = ["fizz" if n % 2 == 0 else "buzz" for n in range(6)]
self.assertEqual(len(labels), 6)
if __name__ == "__main__":
unittest.main()
Nothing clever — it just pins down the three claims from earlier (“two ways are equivalent”, “trailing if shortens”, “leading if/else keeps the length”) with assertEqual. If a future Python release breaks one of them, this is what would flag it first.