If you’ve been working with Python for a while, you know that iterators are at the heart of writing memory-efficient and elegant code. Instead of building massive lists in memory, we can process items one at a time. The itertools
module is the standard library’s powerhouse for taking this concept to the next level. It’s a collection of fast, optimized tools for creating and manipulating iterators.
Table of Contents
In my experience, mastering itertools
is a key step in moving from a beginner to an intermediate Python programmer. It helps you write cleaner, more expressive, and often faster code. In this guide, I’ll walk you through some of the most useful functions in the module, with real-world examples showing how they can simplify your looping logic.
Infinite Iterators
These functions create iterators that can, in theory, go on forever. You’ll always need to provide some logic to break out of the loop.
count(start, step)
This is the simplest infinite iterator. It produces consecutive numbers, starting from start
(default 0) and incrementing by step
(default 1). I find it perfect for tasks where I need a counter but don’t know when to stop, like paginating through API results.
In this example from youtube-dl
, count()
is used to generate page numbers for a Google search. The loop continues fetching pages until it either finds the desired number of results or detects there are no more pages left.
Python
# File: youtube_dl/extractor/googlesearch.py
def _get_n_results(self, query, n):
entries = []
# Loop infinitely through page numbers
for pagenum in itertools.count():
webpage = self._download_webpage(
# ... query parameters using pagenum ...
)
# ... logic to parse entries ...
# Break condition
if (len(entries) >= n) or not re.search(r'id="pnnext"', webpage):
return res
cycle(iterable)
The cycle()
function endlessly repeats the elements of an iterable. If you give it
['A', 'B']
, it will produce A, B, A, B, A, B, and so on. A classic use case I’ve seen is for applying alternating styles.
The web framework twisted
uses cycle()
to assign alternating “odd” and “even” CSS classes to table rows, which is a neat and highly readable trick.
Python
# File: src/twisted/web/static.py
def _buildTableContent(self, elements):
tableContent = []
# Create an iterator that yields "odd", "even", "odd", ...
rowClasses = itertools.cycle(["odd", "even"])
for element, rowClass in zip(elements, rowClasses):
element["class"] = rowClass
tableContent.append(self.linePattern % element)
return tableContent
repeat(object, [times])
As the name suggests, repeat()
produces the same value over and over again, either infinitely or for a specified number of times
. I find this especially handy for providing a constant value in operations where a sequence is expected.
For instance, label-studio
uses repeat()
to generate SQL query placeholders. If it needs to build an IN
clause for 5 primary keys, it can generate the required "%s, %s, %s, %s, %s"
string elegantly.
Python
# File: label_studio/core/bulk_update_utils.py
n_pks = 5
# pks becomes "%s, %s, %s, %s, %s"
pks = ", ".join(itertools.repeat("%s", n_pks))
in_clause = f'"pk_column" in ({pks})'
Combinatoric Iterators
These functions are fantastic for generating all possible combinations or permutations of items in a sequence, which is common in everything from data science to testing.
product(*iterables)
This function computes the Cartesian product, which is equivalent to nested for-loops. It’s useful for generating all possible combinations from multiple input lists.
Apache Airflow uses product()
to create all possible filename variations by combining a list of base filenames with a list of possible suffixes (.asc
, .sha512
).
Python
# File: airflow/dev/check_files.py
def expand_name_variations(files):
# e.g., combines ['file-a'] and ['', '.asc'] into ['file-a', 'file-a.asc']
return sorted(
base + suffix
for base, suffix in itertools.product(files, ["", ".asc", ".sha512"])
)
permutations(iterable, r)
This function returns all possible orderings of an iterable. For an input of (0, 1, 2)
, it would produce (0, 1, 2)
, (0, 2, 1)
, (1, 0, 2)
, and so on.
combinations(iterable, r)
This function returns unique subsets of length r
from an iterable. Unlike permutations, the order of elements in a combination doesn’t matter, so (0, 1)
and (1, 0)
would not both be included.
The salt
automation tool uses combinations()
to find invalid pairs of configuration options. It generates all 2-item combinations of scheduling keys and checks if any of those invalid pairs exist in the user’s configuration.
Python
# File: salt/utils/schedule.py
scheduling_elements = ("when", "cron", "once")
# Creates sets: {'when', 'cron'}, {'when', 'once'}, {'cron', 'once'}
invalid_sched_combos = [
set(i) for i in itertools.combinations(scheduling_elements, 2)
]
More Handy Tools for Iterables
These functions operate on one or more iterables and terminate when the shortest input iterable is exhausted.
chain(*iterables)
: This is one of my favorites. It treats multiple iterables as a single, continuous sequence. It’s a memory-efficient way to loop over several lists or generators one after another.groupby(iterable, key)
: Groups consecutive items from an iterable that have the same key. Before using it, you almost always need to sort your data by the same key. It’s incredibly powerful for aggregating data.islice(iterable, start, stop, step)
: Provides a way to slice any iterable, including generators which don’t support normal slicing (my_gen[5:10]
). Apache Spark uses this to process large iterators in manageable batches.zip_longest(*iterables, fillvalue)
: Works like the built-inzip()
, but it continues until the longest iterable is exhausted, plugging any holes with thefillvalue
.starmap(function, iterable)
: Similar tomap()
, but instead of passing single items to the function, it unpacks tuples from the iterable as arguments. So,starmap(pow, [(2, 5), (3, 2)])
is equivalent topow(2, 5)
and thenpow(3, 2)
.
Conclusion
The itertools
module is a treasure trove of high-performance, memory-efficient tools for working with iterators. By getting comfortable with functions like count
, cycle
, product
, and chain
, you can write code that is not only more performant but also more expressive and Pythonic. The next time you find yourself writing complex loops or list comprehensions, take a moment to see if a tool in itertools
can do the job more elegantly.
More Topics
- Python Multithreading – How to Handle Concurrent Tasks
- Python Multiprocessing – How to Use Multiple CPU Cores
- Python Asyncio – How to Write Concurrent Code
- Python Data Serialization – How to Store and Transmit Your Data
- Python Context Managers – How to Handle Resources Like a Pro
- Python Project Guide: NumPy & SciPy: For Science!
- Python Project Guide: Times, Dates & Numbers