DSABook – General lists

6.10 General lists

We need some notation to show the contents of a list, so we will use the same angle bracket notation that is normally used to represent sequences. To be consistent with standard array indexing, the first position on the list is denoted as 0. Thus, if there are $n$ elements in the list, they are given positions 0 through $n-1$ as $\langle\ a_0,\ a_1,\ ...,\ a_{n-1}\ \rangle$ . The subscript indicates an element’s position within the list. Using this notation, the empty list would appear as $\langle\ \rangle$ .

What basic operations do we want our lists to support? Our common intuition about lists tells us that a list should be able to grow and shrink in size as we insert and remove elements. We should be able to insert and remove elements from anywhere in the list. We should be able to gain access to any element’s value, either to read it or to change it. Finally, we should be able to know the size of the list, and to iterate through the elements in the list – i.e., the list should be a Collection.

6.10.1 ADT for general lists

Now we can define the ADT for a list object in terms of a set of operations on that object. We will use an interface to formally define the list ADT. List defines the member functions that any list implementation inheriting from it must support, along with their parameters and return types.

True to the notion of an ADT, an interface does not specify how operations are implemented. Two complete implementations are presented later (array-based lists and linked lists), both of which use the same list ADT to define their operations. But they are considerably different in approaches and in their space/time tradeoffs.

The code below presents our list ADT. The comments given with each member function describe what it is intended to do. However, an explanation of the basic design should help make this clearer. There are four main operations we want to support:

interface List extends Collection:
    add(i, x)  // Adds (inserts) x at position i; where 0 <= i <= size, increasing the size.
    get(i)     // Returns the element at position i; where 0 <= i < size.
    set(i, x)  // Sets the value at position i to x; where 0 <= i < size.
    remove(i)  // Removes the element at position i; where 0 <= i < size, decreasing the size.

Apart from these four, we also want to know the number of elements, to be able to loop through the list elements in order. So we make the List interface be a Collection too.

TODO

The List member functions allow you to build a list with elements in any desired order, and to access any desired position in the list.

The list class declaration presented here is just one of many possible interpretations for lists. Our list interface provides most of the operations that one naturally expects to perform on lists and serves to illustrate the issues relevant to implementing the list data structure. As an example of using the list ADT, here is a function to return true if there is an occurrence of a given element in the list, and false otherwise. The find method needs no knowledge about the specific list implementation, just the list ADT.

// Return true if k is in list L, false otherwise.
function find(L, k):
    for each n in L:
        if k == n:
            return true  // Found k.
    return false         // k not found.

There are two standard approaches to implementing lists, the array-based list, and the linked list.

6.10.2 Implementing using linked lists

We can use the same structure as for stacks when implementing general linked lists:

datatype LinkedList implements List:
    head: Node = null   // Pointer to list header
    size: Int = 0       // Size of list

TODO

Adding and removing nodes

However, if we want to add or remove nodes, there is a problem with using a pointer to the current node.

TODO

So, using a current pointer, it is possible to add and remove nodes, using some complicated coding. But this does not work for the very last node! There are several possible ways to deal with this problem. One is to always have an empty node (a “dummy node”) at the very end of the list, but this will increase memory usage.

Another simple solution is to have a pointer to the node before the current node. This is the solution we will adopt.

Adding a node

TODO

Here are some special cases for linked list insertion: Inserting at the beginning of a list, and appending at the end.

Here’s the code for addition.

datatype LinkedList:
    ...
    add(i, x):
        // precondition: 0 <= i <= size
        if i == 0:
            head = new Node(x, head)
        else:
            prev = head
            repeat i-1 times:
                prev = prev.next
            prev.next = new Node(x, prev.next)
        size = size + 1

Here’s an exercise for adding a value to a linked list.

Removing a node

TODO

Here’s the code for deletion:

datatype LinkedList:
    ...
    remove(self, i):
        // precondition: 0 <= i < size
        if i == 0:
            removed = head
            head = removed.next
        else:
            prev = head
            repeat i-1 times:
                prev = prev.next
            removed = prev.next
            prev.next = removed.next
        removed.next = null   // For garbage collection
        size = size - 1
        return removed.elem

And here’s an exercise.

Complexity analysis

Locating a certain position $i$ in the list requires $i$ steps. The worst case is if we want to go to the last node, so the time complexity for above all operations is $O(n)$ .

This is much worse than the array-based list (Section 6.10.3), where these operations are $O(1)$ . So are linked lists totally useless? No! But they don’t work well with our current List interface.

To make linked lists useful, we need an enhanced iterator interface, where we can move forwards and backwards in the list, and add/remove nodes through this enhanced iterator. In the standard Java API, this kind of iterator is called a ListIterator, which is part of Java’s standard LinkedList.

6.10.3 Implementing general lists using arrays

First we give a static implementation for array-based lists, named ArrayList. This inherits from the List ADT (Section 6.10.1), and must therefore implement all of the member functions of List.

Unlike normal arrays, lists can change in size: we can add elements to and remove from them. How can this be implemented? Well, what we don’t want to do is to create a completely new array every time elements are added or removed. So instead we will use an underlying array which is larger than we need.

Because of that will need two internal variables: the underlying array, and a size counter telling how much of the array is actually used. When we create a new array-list we have to decide the capacity, the largest possible size. Then the underlying array is initialised, and the size counter is set to 0 because there are no elements yet.

TODO

datatype ArrayList implements List:
    internalArray = new Array(capacity)  // Internal array containing the list elements
    size = 0                             // Size of list

Note: in Python you cannot create an array with a certain capacity. You can simulate it by creating a list with a number of empty elements: [None] * capacity, but this is not a real fixed-size array as explained just above.

Getting and setting values

Random access to any element in the list is quick and easy.

TODO

As you can see below, there are no loops in the methods get and set, which means that both require $O(1)$ time.

datatype ArrayList:
    ...
    get(i):
        // precondition: 0 <= i < size
        return internalArray[i]

    set(i, x):
        // precondition: 0 <= i < size
        internalArray[i] = x

Adding elements

Because the array-based list implementation is defined to store list elements in contiguous cells of the array, the add and remove methods must maintain this property.

Appending elements at the tail of an array-based list is super-fast.

TODO

However, adding an element at the head of the list requires shifting all existing elements in the array by one position toward the tail.

TODO

Therefore, if we want to add an element at position $i$ , then $n - i - 1$ elements must shift toward the tail to leave room for the new element. In the worst case, adding elements requires moving all $n$ elements, which is $O(n)$ .

datatype ArrayList:
    ...
    add(i, x):
        // precondition: 0 <= i <= size < internalArray.size
        size = size + 1
        for k in size-1, size-2 .. i+1:
            internalArray[k] = internalArray[k-1]
        internalArray[i] = x

Practice exercise

TODO

Removing elements

Removing an element from the head of the list is similar to adding in the sense that all remaining elements must shift. But now we have to shift toward the head to fill in the gap, instead of toward the tail. If we want to remove the element at position $i$ , then $n - i - 1$ elements must shift toward the head, as shown in the following slideshow.

TODO

In the worst case, insertion or removal each requires moving all $n$ elements, which is $O(n)$ .

datatype ArrayList:
    ...
    remove(i):
        // precondition: 0 <= i < size
        x = internalArray[i]
        for k in i+1 .. size-1:
            internalArray[k-1] = internalArray[k]
        size = size - 1
        internalArray[size] = null  // For garbage collection
        return x

Practice exercise

TODO

6.10.4 When to use linked lists?

According to the calculations above, linked lists are worse than array-based lists, because all operations are slow (linear time). So why even bother using linked lists?

First there are limited versions of lists that can be implemented efficiently using linked lists, we will look at [stacks] and [queues] later.

Second, our list API is not the best for linked lists. If we instead could have a pointer to the “current” list node, and have methods for moving forward and backward in the list, several of the operations can be constant time. In the Java standard API this is called a ListIterator, which is part of Java’s standard LinkedList.

But these advanced list iterators are not part of this course, and in fact there are not many algorithms where list iterators are particularly useful.

6.10.5 How are lists implemented in the standard libraries?

All serious languages have dynamic list implementations. Here are how they are implemented in Java and Python:

In Java, java.util.ArrayList implements dynamic arrays, meaning that the internal array grows automatically when necessary. The growth factor is 50%, so that if the array has size 1024, it will grow with another 512 elements. [Source: ArrayList.java] However, the ArrayList will never shrink automatically, but instead it’s up to the programmer to decide when to shrink it.
Java’s java.util.LinkedList implements doubly-linked list (Section 6.9), so that the iterator can move forward and backward through the list. [Source: LinkedList.java]
Python’s standard lists are dynamic. In fact, Python doesn’t even support fixed-length lists, so our code in this chapter is a bit of a hack. Python lists both grow and shrink the lists automatically, and the growth factor is 1/8 (12.5%), meaning that if the array has size 1024, it will grow with another 128 elements. It shrinks the array by 1/8 whenever less than half of the array is occupied. [Source: listobject.c]