The last three months have been really hasty but I’m fortunate enough to make past the first three months at Alibaba Cloud. Before the lunar year 2018 concludes I would like to discuss a bit about a powerful algorithm recently used in my work - the Quickselect
algorithm.
Randomization at its best
Given an unsorted, numeric array, Quickselect
finds the k-th largest or smallest element in the array. This algorithm has an average time complexity at O(n)
to find, let’s say, the median of an array. Using an exhaustive approach, it takes at least O(nlogn)
. Quickselect
is, however, a randomized approach and the most efficient median-finder available. The worst case time complexity is O(n^2)
though.
Description
Following are the steps of Quickselect
to find the k-th largest or smallest element:
- (Randomly) choose an element from the array as
pivot
. - Partition the array into 2 parts: left subarray contains elements smaller than or equal to the
pivot
; right subarray contains elements larger than thepivot
. - Form a new array:
[left subarray + pivot + right subarray]
- Compare
k
withlen(left subarray)
:k == len + 1
: pivot is the target.k <= len
: repeat step 1-3 on left subarrayk > len + 1
: repeat step 1-3 on right subarray withk = k - len
.
Sample code
import random
def quickselect(a, k):
"""
:param a: unsorted array
:param k: k-th element to find
"""
(left, pivot, right) = partition(a)
if len(left) == k - 1:
result = pivot
elif len(left) >= k:
result = quickselect(left, k)
else:
result = quickselect(right, k - 1 -len(left))
return result
def partition(a):
"""
:param a: array or subarray to partition
"""
s = len(a)
# base cases
if s==1:
return([], a[0], [])
if s==2:
if a[0] < a[1]:
return([], a[0], a[1])
else:
return([], a[1], a[0])
# choose pivot
p = random.randint(0, len(a) - 1)
pivot = a[p]
left = []
right = []
for i in range(len(a)):
if not i == p:
if a[i] > pivot:
right.append(a[i])
else:
left.append(a[i])
return (left, pivot, right)
Improvements
As mentioned above, the performance of quicksort
is good in average but sensitive to the chosen pivot. the worst case time complexity of Quickselect
is O(n^2)
. It leaves room for caveat in a robust system. However, with some techniques, the worst case time complexity can be improved to O(n)
. Refer to median of medians, Introselect for more information.