Sign in to your Python Morsels account to save your screencast settings.
Don't have an account yet? Sign up here.
Let's talk about lexicographical ordering in Python.
lexicographicaldefinition in Python Terminology.
Is the string "apple" greater than the string "animal"?
>>> "apple" > "animal"
When I ask this question to a roomful of Python developers, I often see folks mouth L-M-N-O-P.
The word "animal" comes before "apple" in alphabetical ordering.
So "apple" is greater than "animal":
>>> "apple" > "animal"
True
Strings in Python are ordered alphabetically. Well, sort of.
Uppercase "Apple" is less than lowercase "apple":
>>> "Apple" < "apple"
True
And we can also order characters that are not in the alphabet.
For example, the dash character (-) is less than the underscore character (_):
>>> '-' < '_'
True
Each character in a Python string has a number associated with it. These numbers are the Unicode code points for these characters:
>>> [ord(c) for c in "apple"]
[97, 112, 112, 108, 101]
>>> [ord(c) for c in "animal"]
[97, 110, 105, 109, 97, 108]
Python essentially uses alphabetical ordering.
But the alphabet it uses consists of every possible character, and the order of those characters comes from their Unicode code points.
I don't recommend memorizing the Unicode code points of each character.
Instead, I would lowercase your strings as you order them. And keep in mind that every character counts when you're ordering strings, even non-alphabetical ones.
Lexicographical ordering is a fancy word that means ordering things in a way that's similar to alphabetical ordering.
When ordering strings lexicographically, we compare the first character in each string, and then the second character, and so on. As long as the characters in the same position are equal, we need to go to the next pair. But once we've found different characters in the same position, we have an answer to which string is lower than the other.
Here's Python code that describes how lexicographical ordering works (you would never implement this yourself because Python does it for us):
def lexicographically_less_than(s1, s2):
for a, b in zip(s1, s2):
if a == b:
# They're the same
continue
else:
# They're different
return (a < b)
else:
# One is a prefix of the other
return len(s1) < len(s2)
The string that's lower is the one that has the lowest character of the first pair of differing characters.
Strings in Python are ordered lexicographically. But it's not just strings that are ordered this way.
Every sequence in Python is ordered lexicographically.
Here we have two three-item tuples, each containing numbers:
>>> x = (45, 1, 13)
>>> y = (45, 7, 4)
Which of these tuples is less than the other one?
Well, remember that just like strings, tuples are sequences, and all sequences are ordered lexicographically.
So to compare these tuples, we start by comparing the first item in each tuple:
The first item in each is 45.
>>> x[0], y[0]
(45, 45)
Since the items are the same, we have to move to the second item.
These items are different.
1 and 7 are not the same, which means we should have an answer to our comparison.
>>> x[1], y[1]
(1, 7)
Since 1 is less than 7, the first tuple is less than the second tuple:
>>> x[1] < y[1]
True
>>> x < y
True
But what happens if the tuples have a different length? Well, the same thing happens with tuples as happens with strings.
If we compare two strings with a different length, Python still compares the first letter first, the second letter second, and so on:
>>> "truth" < "tradition"
False
But what if one string is a prefix of the other one?
>>> "cat" < "catastrophe"
We would run out of characters to compare in that situation!
When one string is a prefix of the other, the shorter string is less than the longer one:
>>> "cat" < "catastrophe"
True
The same thing applies to tuples.
If we compare tuples of different lengths, Python still compares index-by-index:
>>> x = (17, 9)
>>> y = (17, 4, 13)
>>> x < y
False
And if one tuple is a prefix of the other tuple, the shorter one is less than the longer one:
>>> x = (17, 4)
>>> y = (17, 4, 13)
>>> x < y
True
Strings and tuples are ordered lexicographically, but so are all other sequences.
Lists can be compared the same way:
>>> ["truth", "dare"] < ["truth", "democracy"]
True
And so can byte strings:
>>> b"\x01\x06" < b"\x01\x08"
True
All sequences are ordered lexicographically in Python.
What's the purpose of lexicographical ordering? Why is this implemented in Python?
Well, with strings, the purpose is pretty apparent. Alphabetical ordering can be convenient!
But with other types of sequences, lexicographical ordering can seem a bit unusual. But it can be helpful.
For example, if we have a list of tuples that represent a number and a string, and we wanted to order these tuples based on the number, we can simply sort the tuples:
>>> weights = [
... (0.071, 'bees'),
... (0.332, 'chickens'),
... (0.023, 'crabs'),
... (0.056, 'salmon'),
... ]
>>> sorted(weights)
[(0.023, 'crabs'), (0.056, 'salmon'), (0.071, 'bees'), (0.332, 'chickens')]
Since the number comes before the string, the numbers are compared first. The strings would only be compared if any of the numbers were equal.
Python's sorted function relies on the less than (>) operator, so any object that can be ordered can also be sorted.
Lexicographical ordering is especially handy for ordering things hierarchically.
For example, if we needed to order alumni by their graduation year, their last name, and then their first name, we could make three-item tuples in that order, and then sort those tuples:
>>> students = [
... (1978, "Little", "Nancy"),
... (1978, "Gonzalez", "Maria"),
... (1977, "Anders", "Paul"),
... (1979, "Brady", "Doris"),
... ]
>>> sorted(students)
[(1977, 'Anders', 'Paul'), (1978, 'Gonzalez', 'Maria'), (1978, 'Little', 'Nancy'), (1979, 'Brady', 'Doris')]
Python does the heavy-lifting for us here.
It orders first by the year, but then compares the last name, but only if the years are equal. And only if the last names were equal would it compare the first names.
If you need to order items based on multiple values at the same time, you probably could rely on lexicographical ordering.
All sequences in Python use lexicographical ordering when they're sorted, when they're compared with the less than operator, or when they're compared with the greater than operator.
We don't learn by reading or watching. We learn by doing. That means writing Python code.
Practice this topic by working on these related Python exercises.
Sign in to your Python Morsels account to track your progress.
Don't have an account yet? Sign up here.