Close
Register
Close Window

Data Structures and Algorithms

Chapter 1 Introduction

Show Source |    | About   «  1.5. Comparables, Comparators and Key-Value Pairs   ::   Contents   ::   2.1. Chapter Introduction: Arrays  »

1.6. Comparables and Comparators: An Example

Note: This example treats comparables and comparators, as they work in Java. Python code is also included, but since Python thinks of comparisons in a different way, the code is not always transferrable.

1.6.1. Comparables and comparators in Java

In Java there are two main ways of comparing objects. The one that most people find easier to understand is comparable. But there is an alternative, comparator, which is very useful.

As a running example we use a class for persons:

class Person implements Comparable<Person> {
    private String givenName;
    private String familyName;
    private int birthYear;

    public String givenName()  { return givenName  ; }
    public String familyName() { return familyName ; }
    public int    birthYear()  { return birthYear  ; }

    public String toString() {
        return givenName() + " " + familyName() + " (" + birthYear() + ")";
    }
}
class Person(Comparable):
    def __init__(self, given, family, birth):
        self.givenName  = given
        self.familyName = family
        self.birthYear  = birth

    def __str__(self):
        return f"{self.givenName} {self.familyName} ({self.birthYear})"

Also, let’s create an array of persons that we can use later:

static ArrayList<Person> getPeople() {
    ArrayList<Person> people = new ArrayList<>();
    people.add(new Person("Unsuk", "Chin", 1961));
    people.add(new Person("Anna", "Thorvaldsdóttir", 1977));
    people.add(new Person("Andrea", "Tarrodi", 1981));
    people.add(new Person("Diana", "Čemerytė", 1974));
    people.add(new Person("Elfrida", "Andrée", 1841));
    people.add(new Person("Guy", "d’Hardelot", 1858));
    people.add(new Person("Nadia", "Boulanger", 1887));
    people.add(new Person("Lili", "Boulanger", 1893));
    return people;
}
def getPeople():
    return [Person("Unsuk", "Chin", 1961),
            Person("Anna", "Thorvaldsdóttir", 1977),
            Person("Andrea", "Tarrodi", 1981),
            Person("Diana", "Čemerytė", 1974),
            Person("Elfrida", "Andrée", 1841),
            Person("Guy", "d’Hardelot", 1858),
            Person("Nadia", "Boulanger", 1887),
            Person("Lili", "Boulanger", 1893),
            ]

Now we can print the list:

ArrayList<Person> people = getPeople();
for (Person p : people) System.out.println(p);
people = getPeople()
for p in people: print(p)

Which results in:

Unsuk Chin (1961)
Anna Thorvaldsdóttir (1977)
Andrea Tarrodi (1981)
Diana Čemerytė (1974)
Elfrida Andrée (1841)
Guy d’Hardelot (1857)
Nadia Boulanger (1887)
Lili Boulanger (1893)

1.6.2. Comparables

With the Comparable interface we can define the “natural ordering” for a class (Java 8 Comparable). Recall the interface for comparables from the course API:

// Note: This is the same as the java.lang.Comparable interface
interface Comparable<E> {
    int compareTo(E other);  // Returns <0  if  this < other,
                             //      or  0  if  this == other,
                             //      or >0  if  this > other.
}
# Note: by implementing these methods,
# one can use the standard Python comparison operators (==, !=, <, <=, >, >=).
class Comparable:
    def __eq__(self, other): """Test if self == other."""
    def __ne__(self, other): """Test if self != other."""
    def __lt__(self, other): """Test if self < other."""
    def __le__(self, other): """Test if self <= other."""
    def __gt__(self, other): """Test if self > other."""
    def __ge__(self, other): """Test if self >= other."""

Let’s say that we want the natural ordering to be to compare persons by their family name. Then we let Person implement the Comparable interface by overriding .compareTo(…):

class Person implements Comparable<Person> {
    // ...as above...
    public int compareTo(Person other) {
        return this.familyName().compareTo(other.familyName());
    }
}
class Person(Comparable):
    # ...as above...
    def __eq__(self, other): return self.familyName == other.familyName
    def __ne__(self, other): return self.familyName != other.familyName
    def __lt__(self, other): return self.familyName <  other.familyName
    def __le__(self, other): return self.familyName <= other.familyName
    def __gt__(self, other): return self.familyName >  other.familyName
    def __ge__(self, other): return self.familyName >= other.familyName

Now we can sort our people array according to family name:

people = getPeople(); // reset the people list
Collections.sort(people);
for (Person p : people) System.out.println(p);
people = getPeople()  # reset the people list
people.sort()
for p in people: print(p)

Resulting in:

Elfrida Andrée (1841)
Nadia Boulanger (1887)
Lili Boulanger (1893)
Unsuk Chin (1961)
Andrea Tarrodi (1981)
Anna Thorvaldsdóttir (1977)
Guy d’Hardelot (1857)
Diana Čemerytė (1974)

Two things to note, which we address later:

  1. Guy d’Hardelot and Diana Čemerytė come last – this is because .compareTo(…) gives a case-sensitive ordering and doesn’t care ignore diacritics.

  2. Nadia Boulanger comes before Lili, even though L comes before N in the alphabet.

1.6.3. Comparators, the old way

What if we sometimes want to sort the list according to some other ordering, e.g., birth year or given name? Enter comparators, and here is the interface (Java 8 Comparator):

// Note: This is a subset of the java.util.Comparator interface
interface Comparator<E> {
    int compare(E one, E other);  // Returns <0  if  one < other,
                                  //      or  0  if  one == other,
                                  //      or >0  if  one > other.
}
# Python 3 has nothing similar to the Java comparator interface.
#
# Instead Python 3 advocates to use a *key extractor*, which is a function
# that converts an element to another element which then can be compared
# using the natural ordering.
#
# The `functools` library has a function `cmp_to_key` which can
# convert a comparator-like function to a key extractor.

To use this we have to implement a separate class for each ordering we want to use. Here’s one for comparing by birth year:

class BirthYearComparator implements Comparator<Person> {
    public int compare(Person one, Person other) {
        return Integer.compare(one.birthYear(), other.birthYear());
    }
}
# Note: Python doesn't have comparators like Java does.
# The most similar is to define a comparator-like function:
def birthYearComparator(one, other):
    return (-1 if one.birthYear < other.birthYear else
             1 if one.birthYear > other.birthYear else 0)

Notes:

  1. Don’t compare numbers by using subtraction! This might lead to overflow and rounding errors. Instead use the static .compare(…) methods that are built into the number classes (Integer, Double, etc).

  2. Since numbers are not objects, you cannot use one.birthYear.compareTo(…). You can do new Integer(one.birthYear).compareTo(…), or you can use Integer.compare(…) as above.

And here’s the class for comparing by given name:

class GivenNameComparator implements Comparator<Person> {
    public int compare(Person one, Person other) {
        return one.givenName().compareTo(other.givenName());
    }
}
def givenNameComparator(one, other):
    return (-1 if one.givenName < other.givenName else
             1 if one.givenName > other.givenName else 0)

To use them you have to first create an object, i.e., instantiate the comparator:

Comparator<Person> byBirthYear = new BirthYearComparator();
people = getPeople(); // reset the people list
Collections.sort(people, byBirthYear);
for (Person p : people) System.out.println(p);
byBirthYear = functools.cmp_to_key(birthYearComparator)
people = getPeople()  # reset the people list
people.sort(key=byBirthYear)
for p in people: print(p)

Result:

Elfrida Andrée (1841)
Guy d’Hardelot (1857)
Nadia Boulanger (1887)
Lili Boulanger (1893)
Unsuk Chin (1961)
Diana Čemerytė (1974)
Anna Thorvaldsdóttir (1977)
Andrea Tarrodi (1981)

And similar for given names:

Comparator<Person> byGivenName = new GivenNameComparator();
people = getPeople(); // reset the people list
Collections.sort(people, byGivenName);
for (Person p : people) System.out.println(p);
byGivenName = functools.cmp_to_key(givenNameComparator)
people = getPeople()  # reset the people list
people.sort(key=byGivenName)
for p in people: print(p)

Result:

Andrea Tarrodi (1981)
Anna Thorvaldsdóttir (1977)
Diana Čemerytė (1974)
Elfrida Andrée (1841)
Guy d’Hardelot (1857)
Lili Boulanger (1893)
Nadia Boulanger (1887)
Unsuk Chin (1961)

1.6.4. Comparators, the new functional interface in Java 8

Since Java 8, there’s a functional interface which can be used to build comparators (and many other things). So we don’t have to write the class definitions, and instead write similarly to we would do in Python or Haskell:

byBirthYear = (one, other) -> Integer.compare(one.birthYear(), other.birthYear());
byGivenName = (one, other) -> one.givenName().compareTo(other.givenName());
byBirthYear = lambda person: person.birthYear
byGivenName = lambda person: person.givenName

Yay! That’s a lot nicer than the clumsy class definition (class BirthYearComparator implements Comparator<Person>, etc).

1.6.5. Comparing fields using key extractors

In many cases (including our example case), we only want to compare some fields in a class. Then we can use key extractors to simplify even more:

byBirthYear = Comparator.comparingInt(Person::birthYear);
byGivenName = Comparator.comparing(Person::givenName);
byBirthYear = operator.attrgetter('birthYear')
byGivenName = operator.attrgetter('givenName')
  • Note: We use .comparingInt(…) when defining byBirthYear. It’s not strictly necessary (i.e., we can use .comparing(…)), but it makes things slightly more efficient.

1.6.6. Comparing several fields

Remember the natural ordering? The problem with only comparing the family name is that if two persons have the same name they keep their internal order. So, Nadia Boulanger comes before Lili Boulanger even though L precedes N in the alphabet.

What we want is to be able to compare several fields. The old and not-so-good solution is to use clumsy if-then-elses, like this:

class FullNameComparator implements Comparator<Person> {
    public int compare(Person one, Person other) {
        int cmp = one.familyName().compareTo(other.familyName());
        if (cmp != 0) return cmp;
        cmp = one.givenName().compareTo(other.givenName());
        if (cmp != 0) return cmp;
        return 0;
    }
}
def fullNameComparator(one, other):
    return (-1 if one.familyName < other.familyName else
             1 if one.familyName > other.familyName else
            -1 if one.givenName  < other.givenName  else
             1 if one.givenName  > other.givenName  else 0)

After this we can instantiate a specific comparator:

Comparator<Person> byFullName = new FullNameComparator();
byFullName = functools.cmp_to_key(fullNameComparator)

If we have many fields this gets quite cumbersome (and error-prone). But using the functional interface, and the magic .thenComparing(…) method, it’s really easy:

byFullName = Comparator.comparing(Person::familyName)
    .thenComparing(Person::givenName);
# In Python we can simply create a tuple, and it will sort the way we want
byFullName = lambda person: (person.familyName, person.givenName)

And here it is in action:

people = getPeople(); // reset the people list
Collections.sort(people, byFullName);
for (Person p : people) System.out.println(p);
people = getPeople()  # reset the people list
people.sort(key=byFullName)
for p in people: print(p)

Result:

Elfrida Andrée (1841)
Lili Boulanger (1893)
Nadia Boulanger (1887)
Unsuk Chin (1961)
Andrea Tarrodi (1981)
Anna Thorvaldsdóttir (1977)
Guy d’Hardelot (1857)
Diana Čemerytė (1974)

As you can see, Lili now comes before Nadia. But there’s still the problem with Guy and Diana coming last in the list.

1.6.7. Case-insensitive and language-specific comparisons

The Java String class has a method .compareToIgnoringCase(…) which is what it sounds like.

But you shouldn’t use it if you’re serious about handling text correctly. This is because strings are no longer ASCII, but Unicode. And Unicode is a beast of its own – it knows how to write hundreds of different alphabets with diacritics and other special characters. (Unicode even knows about bidirectional text (left-to-right vs right-to-left), but we won’t discuss that here).

Now, correct string sorting depends on your locale. E.g., in Swedish we put Å, Ä, Ö at the end of the alphabet, while Á, Ô, È are mixed together with A, O, E, respectively. Also, it’s common to mix V and W in Swedish dictionaries. German on the other hand mixes Ä, Ö with A, O. And it sorts ß together with S.

So, here’s how to define a correct comparator for Swedish, which ignores case differences and orders according to Swedish locale:

Collator swedishLocale = Collator.getInstance(new Locale("sv", "SE"));
swedishLocale.setStrength(Collator.PRIMARY);
Comparator<Person> bySwedishLocale = 
    Comparator.comparing(Person::familyName, swedishLocale)
    .thenComparing(Person::givenName, swedishLocale);
import locale
locale.setlocale(locale.LC_COLLATE, 'sv_SE')
bySwedishLocale = lambda person: (locale.strxfrm(person.familyName.casefold()),
                                      locale.strxfrm(person.givenName.casefold()))
# Note: There's a bug in Python's Swedish locale, so Č comes after all other letters

And in action:

people = getPeople(); // reset the people list
Collections.sort(people, bySwedishLocale);
for (Person p : people) System.out.println(p);
people = getPeople()  # reset the people list
people.sort(key=bySwedishLocale)
for p in people: print(p)
# Note: Because of a bug in Python's Swedish locale, Diana Čemerytė is still printed last

Result:

Elfrida Andrée (1841)
Lili Boulanger (1893)
Nadia Boulanger (1887)
Diana Čemerytė (1974)
Unsuk Chin (1961)
Guy d’Hardelot (1857)
Andrea Tarrodi (1981)
Anna Thorvaldsdóttir (1977)

Finally Diana Čemerytė and Guy d’Hardelot find their right places in the list!

1.6.8. …and what about the names?

The names are taken from here: https://female-composers.forts.se/

1.6.9. Running the program

Here is the full source code. Just compile and run it without any arguments:

import java.util.*;
import java.text.Collator;

class Person implements Comparable<Person> {
    private String givenName;
    private String familyName;
    private int birthYear;

    // Constructor
    Person(String given, String family, int birth) {
        givenName  = given;
        familyName = family;
        birthYear  = birth;
    }

    // Getters
    public String givenName()  { return givenName  ; }
    public String familyName() { return familyName ; }
    public int    birthYear()  { return birthYear  ; }

    // Pretty-printing
    public String toString() {
        return givenName() + " " + familyName() + " (" + birthYear() + ")";
    }

    // Natural ordering
    // ...as above...
    public int compareTo(Person other) {
        return this.familyName().compareTo(other.familyName());
    }
}


class BirthYearComparator implements Comparator<Person> {
    public int compare(Person one, Person other) {
        return Integer.compare(one.birthYear(), other.birthYear());
    }
}


class GivenNameComparator implements Comparator<Person> {
    public int compare(Person one, Person other) {
        return one.givenName().compareTo(other.givenName());
    }
}

class FullNameComparator implements Comparator<Person> {
    public int compare(Person one, Person other) {
        int cmp = one.familyName().compareTo(other.familyName());
        if (cmp != 0) return cmp;
        cmp = one.givenName().compareTo(other.givenName());
        if (cmp != 0) return cmp;
        return 0;
    }
}

public class ComparatorExample {
static ArrayList<Person> getPeople() {
    ArrayList<Person> people = new ArrayList<>();
    people.add(new Person("Unsuk", "Chin", 1961));
    people.add(new Person("Anna", "Thorvaldsdóttir", 1977));
    people.add(new Person("Andrea", "Tarrodi", 1981));
    people.add(new Person("Diana", "Čemerytė", 1974));
    people.add(new Person("Elfrida", "Andrée", 1841));
    people.add(new Person("Guy", "d’Hardelot", 1858));
    people.add(new Person("Nadia", "Boulanger", 1887));
    people.add(new Person("Lili", "Boulanger", 1893));
    return people;
}
    
public static void main(String[] args) {
System.out.println("\n### No order");
ArrayList<Person> people = getPeople();
for (Person p : people) System.out.println(p);

System.out.println("\n### Natural ordering");
people = getPeople(); // reset the people list
Collections.sort(people);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by birth year (pre-Java-8 solution)");
Comparator<Person> byBirthYear = new BirthYearComparator();
people = getPeople(); // reset the people list
Collections.sort(people, byBirthYear);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by birth year (functional solution)");
byBirthYear = (one, other) -> Integer.compare(one.birthYear(), other.birthYear());
people = getPeople(); // reset the people list
Collections.sort(people, byBirthYear);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by birth year (using a key extractor)");
byBirthYear = Comparator.comparingInt(Person::birthYear);
people = getPeople(); // reset the people list
Collections.sort(people, byBirthYear);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by given name (pre-Java-8 solution)");
Comparator<Person> byGivenName = new GivenNameComparator();
people = getPeople(); // reset the people list
Collections.sort(people, byGivenName);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by given name (functional solution)");
byGivenName = (one, other) -> one.givenName().compareTo(other.givenName());
people = getPeople(); // reset the people list
Collections.sort(people, byGivenName);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by given name (using a key extractor)");
byGivenName = Comparator.comparing(Person::givenName);
people = getPeople(); // reset the people list
Collections.sort(people, byGivenName);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by full name: family name + given name (pre-Java-8 solution)");
Comparator<Person> byFullName = new FullNameComparator();
people = getPeople(); // reset the people list
Collections.sort(people, byFullName);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by full name: family name + given name (using key extractors and thenComparing)");
byFullName = Comparator.comparing(Person::familyName)
    .thenComparing(Person::givenName);
people = getPeople(); // reset the people list
Collections.sort(people, byFullName);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by Swedish locale, case-insensitive");
Collator swedishLocale = Collator.getInstance(new Locale("sv", "SE"));
swedishLocale.setStrength(Collator.PRIMARY);
Comparator<Person> bySwedishLocale = 
    Comparator.comparing(Person::familyName, swedishLocale)
    .thenComparing(Person::givenName, swedishLocale);
people = getPeople(); // reset the people list
Collections.sort(people, bySwedishLocale);
for (Person p : people) System.out.println(p);

System.out.println("\n### Ordered by Swedish locale, given name first");
Comparator<Person> bySwedishGivenName = 
    Comparator.comparing(Person::givenName, swedishLocale)
    .thenComparing(Person::familyName, swedishLocale);
people = getPeople(); // reset the people list
Collections.sort(people, bySwedishGivenName);
for (Person p : people) System.out.println(p);
}
}
from BaseAPI import Comparable
import functools 
import operator

class Person(Comparable):
    def __init__(self, given, family, birth):
        self.givenName  = given
        self.familyName = family
        self.birthYear  = birth

    def __str__(self):
        return f"{self.givenName} {self.familyName} ({self.birthYear})"

    # ...as above...
    def __eq__(self, other): return self.familyName == other.familyName
    def __ne__(self, other): return self.familyName != other.familyName
    def __lt__(self, other): return self.familyName <  other.familyName
    def __le__(self, other): return self.familyName <= other.familyName
    def __gt__(self, other): return self.familyName >  other.familyName
    def __ge__(self, other): return self.familyName >= other.familyName


# Note: Python doesn't have comparators like Java does.
# The most similar is to define a comparator-like function:
def birthYearComparator(one, other):
    return (-1 if one.birthYear < other.birthYear else
             1 if one.birthYear > other.birthYear else 0)


def givenNameComparator(one, other):
    return (-1 if one.givenName < other.givenName else
             1 if one.givenName > other.givenName else 0)

def fullNameComparator(one, other):
    return (-1 if one.familyName < other.familyName else
             1 if one.familyName > other.familyName else
            -1 if one.givenName  < other.givenName  else
             1 if one.givenName  > other.givenName  else 0)

def getPeople():
    return [Person("Unsuk", "Chin", 1961),
            Person("Anna", "Thorvaldsdóttir", 1977),
            Person("Andrea", "Tarrodi", 1981),
            Person("Diana", "Čemerytė", 1974),
            Person("Elfrida", "Andrée", 1841),
            Person("Guy", "d’Hardelot", 1858),
            Person("Nadia", "Boulanger", 1887),
            Person("Lili", "Boulanger", 1893),
            ]

print("\n### No order");
people = getPeople()
for p in people: print(p)

print("\n### Natural ordering")
people = getPeople()  # reset the people list
people.sort()
for p in people: print(p)

print("\n### Ordered by birth year (pre-Java-8 solution)")
byBirthYear = functools.cmp_to_key(birthYearComparator)
people = getPeople()  # reset the people list
people.sort(key=byBirthYear)
for p in people: print(p)

print("\n### Ordered by birth year (functional solution)")
byBirthYear = lambda person: person.birthYear
people = getPeople()  # reset the people list
people.sort(key=byBirthYear)
for p in people: print(p)

print("\n### Ordered by birth year (using a key extractor)")
byBirthYear = operator.attrgetter('birthYear')
people = getPeople()  # reset the people list
people.sort(key=byBirthYear)
for p in people: print(p)

print("\n### Ordered by given name (pre-Java-8 solution)")
byGivenName = functools.cmp_to_key(givenNameComparator)
people = getPeople()  # reset the people list
people.sort(key=byGivenName)
for p in people: print(p)

print("\n### Ordered by given name (functional solution)")
byGivenName = lambda person: person.givenName
people = getPeople()  # reset the people list
people.sort(key=byGivenName)
for p in people: print(p)

print("\n### Ordered by given name (using a key extractor)")
byGivenName = operator.attrgetter('givenName')
people = getPeople()  # reset the people list
people.sort(key=byGivenName)
for p in people: print(p)

print("\n### Ordered by full name: family name + given name (pre-Java-8 solution)")
byFullName = functools.cmp_to_key(fullNameComparator)
people = getPeople()  # reset the people list
people.sort(key=byFullName)
for p in people: print(p)

print("\n### Ordered by full name: family name + given name (functional solution and tuples)")
# In Python we can simply create a tuple, and it will sort the way we want
byFullName = lambda person: (person.familyName, person.givenName)
people = getPeople()  # reset the people list
people.sort(key=byFullName)
for p in people: print(p)

print("\n### Ordered by Swedish locale, case-insensitive")
print("# Note: There's a bug in Python's Swedish locale, so Č comes after all other letters")
import locale
locale.setlocale(locale.LC_COLLATE, 'sv_SE')
bySwedishLocale = lambda person: (locale.strxfrm(person.familyName.casefold()),
                                      locale.strxfrm(person.givenName.casefold()))
# Note: There's a bug in Python's Swedish locale, so Č comes after all other letters
people = getPeople()  # reset the people list
people.sort(key=bySwedishLocale)
for p in people: print(p)
# Note: Because of a bug in Python's Swedish locale, Diana Čemerytė is still printed last

print("\n### Ordered by Swedish locale, given name first")
bySwedishLocale = lambda person: (locale.strxfrm(person.givenName.casefold()),
                                  locale.strxfrm(person.familyName.casefold()))
people = getPeople()  # reset the people list
people.sort(key=bySwedishLocale)
for p in people: print(p)

   «  1.5. Comparables, Comparators and Key-Value Pairs   ::   Contents   ::   2.1. Chapter Introduction: Arrays  »

nsf
Close Window