Lexicographical ordering in Python PREMIUM

Series: Data Structures

Trey Hunner

5 min. read • Watch as video • Python 3.10—3.14 • Dec. 4, 2024

Watch as video

04:46

Show captions

Autoplay

Auto-expand

Let's talk about lexicographical ordering in Python.

Also see the lexicographical definition in Python Terminology.

String ordering

Is the string "apple" greater than the string "animal"?

>>> "apple" > "animal"

When I ask this question to a roomful of Python developers, I often see folks mouth L-M-N-O-P.

The word "animal" comes before "apple" in alphabetical ordering. So "apple" is greater than "animal":

>>> "apple" > "animal"
True

Strings in Python are ordered alphabetically. Well, sort of.

Uppercase "Apple" is less than lowercase "apple":

>>> "Apple" < "apple"
True

And we can also order characters that are not in the alphabet.

For example, the dash character (-) is less than the underscore character (_):

>>> '-' < '_'
True

Each character in a Python string has a number associated with it. These numbers are the Unicode code points for these characters:

>>> [ord(c) for c in "apple"]
[97, 112, 112, 108, 101]
>>> [ord(c) for c in "animal"]
[97, 110, 105, 109, 97, 108]

Python essentially uses alphabetical ordering.

But the alphabet it uses consists of every possible character, and the order of those characters comes from their Unicode code points.

I don't recommend memorizing the Unicode code points of each character.

Instead, I would lowercase your strings as you order them. And keep in mind that every character counts when you're ordering strings, even non-alphabetical ones.

Lexicographical ordering

Lexicographical ordering is a fancy word that means ordering things in a way that's similar to alphabetical ordering.

When ordering strings lexicographically, we compare the first character in each string, and then the second character, and so on. As long as the characters in the same position are equal, we need to go to the next pair. But once we've found different characters in the same position, we have an answer to which string is lower than the other.

Here's Python code that describes how lexicographical ordering works (you would never implement this yourself because Python does it for us):

def lexicographically_less_than(s1, s2):
    for a, b in zip(s1, s2):
        if a == b:
            # They're the same
            continue
        else:
            # They're different
            return (a < b)
    else:
        # One is a prefix of the other
        return len(s1) < len(s2)

The string that's lower is the one that has the lowest character of the first pair of differing characters.

Lexicographical ordering with tuples

Strings in Python are ordered lexicographically. But it's not just strings that are ordered this way.

Every sequence in Python is ordered lexicographically.

Here we have two three-item tuples, each containing numbers:

>>> x = (45, 1, 13)
>>> y = (45, 7, 4)

Which of these tuples is less than the other one?

Well, remember that just like strings, tuples are sequences, and all sequences are ordered lexicographically.

So to compare these tuples, we start by comparing the first item in each tuple: The first item in each is 45.

>>> x[0], y[0]
(45, 45)

Since the items are the same, we have to move to the second item. These items are different. 1 and 7 are not the same, which means we should have an answer to our comparison.

>>> x[1], y[1]
(1, 7)

Since 1 is less than 7, the first tuple is less than the second tuple:

>>> x[1] < y[1]
True
>>> x < y
True

Lexicographical ordering with different lengths

But what happens if the tuples have a different length? Well, the same thing happens with tuples as happens with strings.

If we compare two strings with a different length, Python still compares the first letter first, the second letter second, and so on:

>>> "truth" < "tradition"
False

But what if one string is a prefix of the other one?

>>> "cat" < "catastrophe"

We would run out of characters to compare in that situation!

When one string is a prefix of the other, the shorter string is less than the longer one:

>>> "cat" < "catastrophe"
True

The same thing applies to tuples.

If we compare tuples of different lengths, Python still compares index-by-index:

>>> x = (17, 9)
>>> y = (17, 4, 13)
>>> x < y
False

And if one tuple is a prefix of the other tuple, the shorter one is less than the longer one:

>>> x = (17, 4)
>>> y = (17, 4, 13)
>>> x < y
True

Lexicographical ordering works with all sequences

Strings and tuples are ordered lexicographically, but so are all other sequences.

Lists can be compared the same way:

>>> ["truth", "dare"] < ["truth", "democracy"]
True

And so can byte strings:

>>> b"\x01\x06" < b"\x01\x08"
True

All sequences are ordered lexicographically in Python.

Use cases of lexicographical ordering

What's the purpose of lexicographical ordering? Why is this implemented in Python?

Well, with strings, the purpose is pretty apparent. Alphabetical ordering can be convenient!

But with other types of sequences, lexicographical ordering can seem a bit unusual. But it can be helpful.

For example, if we have a list of tuples that represent a number and a string, and we wanted to order these tuples based on the number, we can simply sort the tuples:

>>> weights = [
...     (0.071, 'bees'),
...     (0.332, 'chickens'),
...     (0.023, 'crabs'),
...     (0.056, 'salmon'),
... ]
>>> sorted(weights)
[(0.023, 'crabs'), (0.056, 'salmon'), (0.071, 'bees'), (0.332, 'chickens')]

Since the number comes before the string, the numbers are compared first. The strings would only be compared if any of the numbers were equal.

Python's sorted function relies on the less than (>) operator, so any object that can be ordered can also be sorted.

Lexicographical ordering is especially handy for ordering things hierarchically.

For example, if we needed to order alumni by their graduation year, their last name, and then their first name, we could make three-item tuples in that order, and then sort those tuples:

>>> students = [
...     (1978, "Little", "Nancy"),
...     (1978, "Gonzalez", "Maria"),
...     (1977, "Anders", "Paul"),
...     (1979, "Brady", "Doris"),
... ]
>>> sorted(students)
[(1977, 'Anders', 'Paul'), (1978, 'Gonzalez', 'Maria'), (1978, 'Little', 'Nancy'), (1979, 'Brady', 'Doris')]

Python does the heavy-lifting for us here.

It orders first by the year, but then compares the last name, but only if the years are equal. And only if the last names were equal would it compare the first names.

Use lexicographical ordering for hierarchical comparisons

If you need to order items based on multiple values at the same time, you probably could rely on lexicographical ordering.

All sequences in Python use lexicographical ordering when they're sorted, when they're compared with the less than operator, or when they're compared with the greater than operator.

Now it's your turn! 🚀

We don't learn by reading or watching. We learn by doing. That means writing Python code.

Practice this topic by working on these related Python exercises.

get_earliest: Return the earliest MM/DD/YYYY date string given sort_by_column.py: Program to sort a CSV file by specific columns update_reviews: Update one CSV file based on another CSV file sort_dict: Sort a dictionary by its keys FuzzyString: String-like class supporting case-insensitive comparisons