Radix sort
|
Radix sort is a fast stable sorting algorithm which can be used to sort items that are identified by unique keys. Every key is a string or number, and radix sort sorts these keys in some particular lexicographic-like order.
The algorithm operates in O(n · k) time, where n is the number of items, and k is the average key length. This algorithm was originally used to sort punched cards in several passes. A computer algorithm was invented for radix sort in 1954 at MIT by Harold H. Seward. In many newer applications requiring super speeds and massive memory, the computer radix sort is an improvement on (slower) comparison sorts.
Radix sort has resurfaced as an alternative to other high performance sorting algorithms (like heapsort and mergesort) which require O(n · lg n) comparisons, where n is the number of items to be sorted. These other algorithms are able to sort with respect to more complicated orderings than a lexicographic one, but this is of little importance in many practical applications.
A radix sort algorithm works as follows:
- take the least significant digit (or group of bits, both being examples of radices) of each key.
- sort the list of elements based on that digit, but keep the order of elements with the same digit (this is the definition of a stable sort).
- repeat the sort with each more significant digit.
The sort in step 2 is usually done using bucket sort, which works since there are only finitely many digits.
If the size of the possible key space is proportional to the number of items, then each key will be lg n symbols long, and radix sort uses O(n · lg n) time in this case. In practice, O(n) time is only observed when the number of items is much greater than the size of the key space.
If the keys used are short integers, it is practical to complete the sorting with only two passes, and comparisons can be done with only a few bit operations that operate in constant time. In this case, and even when sorting 32-bit floating point numbers, radix sort can in practice be significantly faster than any other sorting algorithm.
The greatest disadvantages of radix sort are that it usually cannot be made to run in place, so O(n) additional memory space is needed, and that it requires one pass over the input list for each symbol in the key, so it is very slow for potentially-long keys. Multiple histogramming techniques, where all the required histograms are constructed in a single pass, can reduce the latter problem.
Radix sort typically uses the following sorting order: short keys come before longer keys, and keys of the same length are sorted lexicographically. This coincides with the normal order of numbers, if the numbers are represented as digit strings.
Contents |
An example
Sort the list:
170, 45, 75, 90, 2, 24, 802, 66
- sorting by least significant digit (1s place) gives:
170, 90, 2, 802, 24, 45, 75, 66 - sorting by next digit (10s place) gives:
2, 802, 24, 45, 66, 170, 75, 90 - sorting by most significant digit (100s place) gives:
2, 24, 45, 66, 75, 90, 170, 802
It is important to realise that each of the above steps requires just a single pass over the data, since each item can be placed in it's correct bucket without having to be compared with other items.
Iterative version using queues
A simple version of the Radix sort can be acheived easily using queues as buckets. The following process is repeated for a number of times equal to the greatest number of digits in any integer in the integers to be sorted, with k being the number of passes through the loop already completed:
- the integers are enqueued into an array of ten separate queues based on their kth digit
So, using the numbers from the previous example, the queues for the 1st pass would be:
0: 170, 90; 1: none; 2: 2, 802; 3: none; 4: 24; 5: 45, 75; 6: 66; 7 - 9: empty - the queues are dequeued back into an array of integers, in increasing order
Using the same numbers, the array will look like this after the 1st pass:
170, 90, 2, 802, 24, 45, 75, 66
For the second pass:
Queues:
0: 2, 802; 1: none; 2: 24; 3: none; 4: 45; 5: none; 6: 66; 7: 170, 75; 8: none; 9: 90
Array:
2, 802, 24, 45, 66, 170, 75, 90 (note that at this point only the three-digit numbers are out of order)
For the third:
Queues:
0: 2, 24, 45, 66, 75, 90; 1: 170; 2 - 7: none; 8: 802; 9: none
Array:
2, 24, 45, 66, 75, 90, 170, 802 (sorted)
While this may not be the most efficient Radix sort algorithm, it is relatively simple, and still quite efficient. Note that in some languages, such as Java, integers must be placed in wrapper objects to be placed in a queue, and then unrapped when dequeued
Recursion
A recursively subdividing radix sort algorithm works as follows:
- take the most significant digit of each key.
- sort the list of elements based on that digit, grouping elements with the same digit into one bucket.
- recursively sort each bucket, starting with the next most significant digit.
- concatenate the buckets together in order
This recursive method can be interpreted as a generalization of quicksort from strings with two possible symbols to strings with any number of possible symbols.
External links
- Demonstration and comparison (http://www.csse.monash.edu.au/~lloyd/tildeAlgDS/Sort/Radix/) of Radix sort with Bubble sort, Merge sort and Quicksort implemented in JavaScript
- Radix sort animated (http://ciips.ee.uwa.edu.au/~morris/Year2/PLDS210/radixsort.html)
- Article (http://www.codercorner.com/RadixSortRevisited.htm) about Radix sorting IEEE floating-point numbers with implementation
- Faster Floating Point Sorting and Multiple Histogramming (http://www.stereopsis.com/radix.html) with implementationde:Radixsort
fr:Tri par base lt:Skaitmeninis rūšiavimas nl:Radix sort ja:基数ソート pt:Radix sort zh:基数排序