Abstract

An essential text on the analysis, development, and implementation of fundemental data structures and algorithms.

Lessons

1.3 Bags, Queues, and Stacks (120)

Understanding basic ADTs is essential since they are the building blocks upon which more complex ADTs and efficient algorithms are built.

Collections and Generics

A set of values is a collection of objects.

Generics allow data structures to be reused for multiple different concretions/implementations. Java doesn’t support generic array creation.

Autoboxing is Java’s ability to cast to or from a primitive type to or from a wrapper/reference type on the fly.

Linked Lists

A linked list is a recursive data structure that is either empty (null) or a reference to a node having a generic item and a reference to a linked list.

Linked lists enable implementations of bags, queues, and stacks with performance gains that would otherwise be unobtainable.

Loitering is a common problem with Linked Lists since they are challenging to debug. Java’s GC will reclaim memory that is overwritten (i.e., null) removing all references to the object.

Linked lists are useful for representing data in collection ADTs (abstract data types). Linked lists represent sequences of items. Arrays can do this, but linked lists tend to be more performant at removing or inserting into a sequence.

The standard solution to enable arbitrary insertion or deletion is to maintain a doubly linked list, a link in each direction.

Bags, Stacks and Queues

Bag, Queue, and Stack differ in that they define different algorithms for which object is to be removed or examined next. Each one can significantly affect the efficiency of algorithms.

Bag: Removing items is not supported. Can test if it is empty and the number of items it contains. Order of iteration is unspecified.

Queue: with a FIFO queue, items are processed in the order they were added to the queue.

Stack: LIFO/Pushdown Stacks process items in the reverse of the order in which they were added. Fixed capacity stacks are in insertion order. Push/Pop operations should take time independent of the stack size.

2.1 Elementary Sorts (244)

Rules

Java’s Comparable interface makes it easy to compare and sort items that have a natural order. The compareTo() method (by convention returning -1, 0, and 1) must implement a total order while adhering to reflexive, antisymmetric, and transitive rules.

Consider four things when evaluating sorting algorithms:

Selection Sort (248)

~(N^2)/2 compares and N (Linear time) exchanges. Works by repeatedly selecting the smallest item. Uses two loops to compare a test against the smallest item already found.

Insertion Sort (250)

~N^2/4 compares and ~N^2/4 exchanges. Worst case is ~N^2/2. Best case is N-1 and 0 exchanges. Works well for certain types of non-sorted or partially sorted random arrays where the number of inversions is less than a constant multiple of the array size. It is much more efficient for these types of arrays. Can speed it up by shortening the inner loop by moving the large entries to the right one position rather than doing full exchanges.

Comparing Algorithms (254)

Compare algorithms by implementing and debugging, analyzing basic properties, formulating a hypothesis, and running experiments.

Insertion and selection are roughly quadratic though insertion sort is about two times as fast. The more trials, the more accurate the estimate. Run experiments to validate the hypothesis on the data at hand. Applications with significant numbers of equal keys involve a more careful analysis.

Shellsort (258)

Shellsort is an extension of insertion sort that works by exchanging array entries that are far apart but only partially ordered during the initial process. The increment sequence selected, the mode by which the Shellsort is accomplished has a significant impact on the performance of this algorithm, but no provably best variant has been found. It performs well on arrays that are in arbitrary order.

Its performance isn’t really quadratic, but the worst case looks something like (N^3/2) compares, depending on the increment sequence.

2.2 Mergesort (270)

Abstract In Place Merge

Abstract in-place merge requires an output array. Sorts the first half of the array in place and then the second half. After that, it merges them by moving items around in the array.

Top-down Mergesort

6N lg N array accesses. Mergesort is a divide-and-conquer paradigm algorithm. It is based on abstract in-place merge, and it works based on the proof that if it sorts two sub-arrays, then it sorts the whole array by merging. It uses between 1/2N lg N and N lg N compares. A few improvements can be made though. When addressing a new problem, you should first use the most straightforward implementation and then refine it if a bottleneck is discovered.

To improve it use insertion sorts for small sub-arrays (<15) since it is faster in the small case. You can also test if the array is in order.

Bottom-up Mergesort

1/2N lg N to N lg N compares and 6N lg N array accesses to sort an array of length N. With arrays of a power of two top-down and bottom-up perform roughly the same. Bottom-up is the method of looking for sorting linked lists since it sorts it in place. Whenever you consider using one of these algorithms, it is worth considering the other.

The Complexity of Sorting (279)

Computational complexity. Compare based algorithms have a minimum number of compares of lg(N!) ~ N lg N. The compare tree height of N <= number of leaves <= 2^h

~N lg N compares in the worst case is the upper bound since that is what Mergesort provides. Mergesort is an asymptotically optimal compare-based sorting algorithm. Proper upper bounds allow devs to make performance guarantees whereas good lower bounds protect us from striving after performance that isn’t attainable.

2.3 Quicksort (288)

The Basic Algorithm

Quicksort is a divide and conquer based algorithm with in-place and time proportional characteristics. It divides the target array in two and when the two are sorted the whole array is sorted. It is a randomized algorithm to ensure consistent performance. Proof by induction that the recursive method constitutes a proper sort. A partitioning item is used to set a dividing line between the two partitions.

Performance Characteristics

Quicksort’s performance depends on how well partitioning divides the array. Which depends on the partitioning key. The ideal scenario is when Quicksort’s partitions the array exactly in half. The partitioning falls in the middle on average. Randomizing the array helps avoid unbalanced partitions. Average ~2Nln N compares. Worst case ~ (N^2) / 2 compares worst case.

Running time will be in a constant factor of 1.39N lg N. Quicksort is typically faster than mergesort since it does less data movement.

Algorithmic Improvements (295)

Quicksort was developed in 1960 by C.A.R. Hoare. Attempts to improve it can often lead to unexpected side effects. However, certain improvements can be made. One such improvement is using insertion sort for small sub-arrays. Median of three partitioning offers better and more consistent partitioning.

Entropy-optimal sorting (3-way Quicksort) offers significant improvements for arrays with large numbers of duplicate keys. 3-way Quicksort is an ideal solution for a library/package sort function since it works well in a variety of real-world applications. The Dijkstra Dutch National flag problem characterizes the algorithm. 3-way partitioning as developed by J. Bentley and D. McIlroy is asymptotically faster than Mergesort. Worst case happens when keys are all distinct 3-way quicksort reduces sort time from linearithmic to linear for arrays with large numbers of duplicate keys.

You can use the entropy of the array derive both the lower and upper bounds of compares. Compare based sorting algorithms require at least NH - H compares (H = Shannon Entropy). 3-way mergesort ((2ln 2) NH). H = lg N when keys are all distinct.