Skip to main content
  1. Posts/

How to sort a list of tuple or list in Python -- lambda or itemgetter?

··311 words·2 mins·
Python Sorting

In Python, when we want to sort a list of tuples or lists, we may want to sort it based on certain element in each sub-list, for example, sort the list based on the first element in each sub-list.

There are two different ways to achieve this, one is using lambda and the other is using itemgetter.

# using lambda
my_list.sort(key=lambda x:x[0])

# using itemgetter
from operator import itemgetter
my_list.sort(key=itemgetter(0))

According to my benchmark on a list of 1000 tuples, using itemgetter is almost twice as quick as the plain lambda method. The following is my code:

In [1]: a = list(range(1000))

In [2]: b = list(range(1000))

In [3]: import random

In [4]: random.shuffle(a)

In [5]: random.shuffle(b)

In [6]: c = list(zip(a, b))

In [7]: %timeit c.sort(key=lambda x: x[1])
81.4 µs ± 433 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

In [8]: random.shuffle(c)

In [9]: from operator import itemgetter

In [10]: %timeit c.sort(key=itemgetter(1))
47 µs ± 202 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

I have also tested the performance (run time in $µs$) of this two method for various list size. See the table below for a comparison:

+-----------+--------+------------+
| List size | lambda | itemgetter |
+-----------+--------+------------+
| 100       | 8.19   | 5.09       |
+-----------+--------+------------+
| 1000      | 81.4   | 47         |
+-----------+--------+------------+
| 10000     | 855    | 498        |
+-----------+--------+------------+
| 100000    | 14600  | 10100      |
+-----------+--------+------------+
| 1000000   | 172000 | 131000     |
+-----------+--------+------------+

(The code producing the above image can be found here)

From the above image, we can see that itemgetter is consistently faster than lambda regardless of the list size. Combined with its conciseness to select multiple elements from a list, itemgetter is clearly the winner to use in sort method.

Note: this post is based on my answer on StackOverflow.

Related

Run the Job Immediately after Starting Scheduler in Python APScheduler
·322 words·2 mins
Python APScheduler
Retry for Google Cloud Client
·197 words·1 min
Python GCP
Make Python logging Work in GCP
·570 words·3 mins
Python Logging GCP