6.10 General lists
We need some notation to show the contents of a list, so we will use the same angle bracket notation that is normally used to represent sequences. To be consistent with standard array indexing, the first position on the list is denoted as 0. Thus, if there are elements in the list, they are given positions 0 through as . The subscript indicates an element’s position within the list. Using this notation, the empty list would appear as .
What basic operations do we want our lists to support? Our common intuition about lists tells us that a list should be able to grow and shrink in size as we insert and remove elements. We should be able to insert and remove elements from anywhere in the list. We should be able to gain access to any element’s value, either to read it or to change it. Finally, we should be able to know the size of the list, and to iterate through the elements in the list – i.e., the list should be a Collection.
6.10.1 ADT for general lists
Now we can define the ADT for a list object in terms of a set of
operations on that object. We will use an interface to formally define
the list ADT. List
defines the member functions that any
list implementation inheriting from it must support, along with their
parameters and return types.
True to the notion of an ADT, an interface does not specify how operations are implemented. Two complete implementations are presented later (array-based lists and linked lists), both of which use the same list ADT to define their operations. But they are considerably different in approaches and in their space/time tradeoffs.
The code below presents our list ADT. The comments given with each member function describe what it is intended to do. However, an explanation of the basic design should help make this clearer. There are four main operations we want to support:
interface List extends Collection:
add(i, x) // Adds (inserts) x at position i; where 0 <= i <= size, increasing the size.
get(i) // Returns the element at position i; where 0 <= i < size.
set(i, x) // Sets the value at position i to x; where 0 <= i < size.
remove(i) // Removes the element at position i; where 0 <= i < size, decreasing the size.
Apart from these four, we also want to know the number of elements, to be able to loop through the list elements in order. So we make the List interface be a Collection too.
TODO
The List
member functions allow you to build a list with
elements in any desired order, and to access any desired position in the
list.
The list class declaration presented here is just one of many
possible interpretations for lists. Our list interface provides most of
the operations that one naturally expects to perform on lists and serves
to illustrate the issues relevant to implementing the list data
structure. As an example of using the list ADT, here is a function to
return true
if there is an occurrence of a given element in
the list, and false
otherwise. The find
method
needs no knowledge about the specific list implementation, just the list
ADT.
// Return true if k is in list L, false otherwise.
function find(L, k):
for each n in L:
if k == n:
return true // Found k.
return false // k not found.
There are two standard approaches to implementing lists, the array-based list, and the linked list.
6.10.2 Implementing using linked lists
We can use the same structure as for stacks when implementing general linked lists:
datatype LinkedList implements List:
= null // Pointer to list header
head: Node size: Int = 0 // Size of list
TODO
Adding and removing nodes
However, if we want to add or remove nodes, there is a problem with
using a pointer to the current
node.
TODO
So, using a current
pointer, it is possible to add and
remove nodes, using some complicated coding. But this does not work for
the very last node! There are several possible ways to deal with this
problem. One is to always have an empty node (a “dummy node”) at the
very end of the list, but this will increase memory usage.
Another simple solution is to have a pointer to the node before the current node. This is the solution we will adopt.
Adding a node
TODO
Here are some special cases for linked list insertion: Inserting at
the beginning of a list, and appending at the end.
Here’s the code for addition.
datatype LinkedList:
...add(i, x):
// precondition: 0 <= i <= size
if i == 0:
= new Node(x, head)
head else:
= head
prev repeat i-1 times:
= prev.next
prev next = new Node(x, prev.next)
prev.size = size + 1
Here’s an exercise for adding a value to a linked list.
Removing a node
TODO
Here’s the code for deletion:
datatype LinkedList:
...remove(self, i):
// precondition: 0 <= i < size
if i == 0:
= head
removed = removed.next
head else:
= head
prev repeat i-1 times:
= prev.next
prev = prev.next
removed next = removed.next
prev.next = null // For garbage collection
removed.size = size - 1
return removed.elem
And here’s an exercise.
Complexity analysis
Locating a certain position in the list requires steps. The worst case is if we want to go to the last node, so the time complexity for above all operations is .
This is much worse than the array-based list (Section 6.10.3), where these operations are . So are linked lists totally useless? No! But they don’t work well with our current List interface.
To make linked lists useful, we need an enhanced iterator interface, where we can move forwards and backwards in the list, and add/remove nodes through this enhanced iterator. In the standard Java API, this kind of iterator is called a ListIterator, which is part of Java’s standard LinkedList.
6.10.3 Implementing general lists using arrays
First we give a static implementation for array-based lists, named
ArrayList. This inherits from the List ADT (Section 6.10.1),
and must therefore implement all of the member functions of
List
.
Unlike normal arrays, lists can change in size: we can add elements to and remove from them. How can this be implemented? Well, what we don’t want to do is to create a completely new array every time elements are added or removed. So instead we will use an underlying array which is larger than we need.
Because of that will need two internal variables: the underlying array, and a size counter telling how much of the array is actually used. When we create a new array-list we have to decide the capacity, the largest possible size. Then the underlying array is initialised, and the size counter is set to 0 because there are no elements yet.
TODO
datatype ArrayList implements List:
= new Array(capacity) // Internal array containing the list elements
internalArray size = 0 // Size of list
Note: in Python you cannot create an array with a certain
capacity. You can simulate it by creating a list with a number of empty
elements: [None] * capacity
, but this is not a real
fixed-size array as explained just above.
Getting and setting values
Random access to any element in the list is quick and easy.
TODO
As you can see below, there are no loops in the methods
get
and set
, which means that both require
time.
datatype ArrayList:
...get(i):
// precondition: 0 <= i < size
return internalArray[i]
set(i, x):
// precondition: 0 <= i < size
= x internalArray[i]
Adding elements
Because the array-based list implementation is defined to store list
elements in contiguous cells of the array, the add
and
remove
methods must maintain this property.
Appending elements at the tail of an array-based list is super-fast.
TODO
However, adding an element at the head of the list requires shifting all existing elements in the array by one position toward the tail.
TODO
Therefore, if we want to add an element at position , then elements must shift toward the tail to leave room for the new element. In the worst case, adding elements requires moving all elements, which is .
datatype ArrayList:
...add(i, x):
// precondition: 0 <= i <= size < internalArray.size
size = size + 1
for k in size-1, size-2 .. i+1:
= internalArray[k-1]
internalArray[k] = x internalArray[i]
Practice exercise
TODO
Removing elements
Removing an element from the head of the list is similar to adding in the sense that all remaining elements must shift. But now we have to shift toward the head to fill in the gap, instead of toward the tail. If we want to remove the element at position , then elements must shift toward the head, as shown in the following slideshow.
TODO
In the worst case, insertion or removal each requires moving all elements, which is .
datatype ArrayList:
...remove(i):
// precondition: 0 <= i < size
= internalArray[i]
x for k in i+1 .. size-1:
-1] = internalArray[k]
internalArray[ksize = size - 1
size] = null // For garbage collection
internalArray[return x
Practice exercise
TODO
6.10.4 When to use linked lists?
According to the calculations above, linked lists are worse than array-based lists, because all operations are slow (linear time). So why even bother using linked lists?
First there are limited versions of lists that can be implemented efficiently using linked lists, we will look at [stacks] and [queues] later.
Second, our list API is not the best for linked lists. If we instead could have a pointer to the “current” list node, and have methods for moving forward and backward in the list, several of the operations can be constant time. In the Java standard API this is called a ListIterator, which is part of Java’s standard LinkedList.
But these advanced list iterators are not part of this course, and in fact there are not many algorithms where list iterators are particularly useful.
6.10.5 How are lists implemented in the standard libraries?
All serious languages have dynamic list implementations. Here are how they are implemented in Java and Python:
- In Java, java.util.ArrayList implements dynamic arrays, meaning that the internal array grows automatically when necessary. The growth factor is 50%, so that if the array has size 1024, it will grow with another 512 elements. [Source: ArrayList.java] However, the ArrayList will never shrink automatically, but instead it’s up to the programmer to decide when to shrink it.
- Java’s java.util.LinkedList implements doubly-linked list (Section 6.9), so that the iterator can move forward and backward through the list. [Source: LinkedList.java]
- Python’s standard lists are dynamic. In fact, Python doesn’t even support fixed-length lists, so our code in this chapter is a bit of a hack. Python lists both grow and shrink the lists automatically, and the growth factor is 1/8 (12.5%), meaning that if the array has size 1024, it will grow with another 128 elements. It shrinks the array by 1/8 whenever less than half of the array is occupied. [Source: listobject.c]