Bucket Sort


Bucket Sort is a non-comparison-based sorting algorithm that works by distributing the elements of the input array into a number of buckets. Each bucket is then sorted individually, either using another sorting algorithm or recursively applying Bucket Sort. After sorting the individual buckets, the elements are concatenated to form the final sorted array.

This algorithm is efficient when the input is uniformly distributed over a range, as it can provide linear time complexity in specific cases. However, its performance depends heavily on the distribution of the input data.


What is Bucket Sort?

Bucket Sort is an algorithm that works by dividing the input array into several "buckets". Each bucket represents a range of values, and the idea is that if the elements are uniformly distributed, each bucket will contain roughly the same number of elements. After distributing the elements, each bucket is sorted independently, and the sorted elements are combined into a final sorted array.

This technique is effective when the data is uniformly distributed over a range. It is often used for sorting floating-point numbers or values within a fixed range, such as grades or percentages.

Key Properties of Bucket Sort:

  • Non-comparison-based: Bucket Sort avoids comparing elements directly. Instead, it relies on distributing elements into buckets.
  • Stable Sort: If the sorting algorithm used within the buckets is stable, then Bucket Sort will also be stable.
  • Efficient for Uniform Distributions: It is most effective when the elements are uniformly distributed over a range.
  • Works for Floating Point Numbers: Bucket Sort is commonly used to sort floating-point numbers in a defined range.

How Does Bucket Sort Work?

The Bucket Sort algorithm works by dividing the input data into several buckets, sorting each bucket, and then concatenating the sorted buckets to get the final sorted array. Here's how the algorithm works step-by-step:

  1. Create Buckets: Create a number of empty buckets, based on the number of elements or the range of the input data.
  2. Distribute Elements into Buckets: Map each element of the input array into the appropriate bucket based on its value.
  3. Sort Individual Buckets: Sort each bucket individually using another sorting algorithm (like Insertion Sort, Merge Sort, etc.). Since each bucket contains a small number of elements, sorting them will be efficient.
  4. Concatenate the Sorted Buckets: Once all buckets are sorted, concatenate the sorted elements from each bucket to form the final sorted array.

Steps of Bucket Sort:

  1. Determine the Range of Data: Find the minimum and maximum values in the input array to determine the bucket ranges.
  2. Create Buckets: Based on the range and the number of elements, create a set of buckets.
  3. Distribute Elements into Buckets: Place each element into its corresponding bucket according to the value.
  4. Sort the Buckets: Sort the individual buckets using a sorting algorithm (e.g., Insertion Sort).
  5. Concatenate the Results: Concatenate the sorted buckets into a single sorted array.

Bucket Sort: Python Code Implementation

Here is an implementation of Bucket Sort in Python, which sorts floating-point numbers in the range [0, 1):

def insertion_sort(bucket):
    # Simple Insertion Sort to sort elements within a bucket
    for i in range(1, len(bucket)):
        key = bucket[i]
        j = i - 1
        while j >= 0 and key < bucket[j]:
            bucket[j + 1] = bucket[j]
            j -= 1
        bucket[j + 1] = key

def bucket_sort(arr):
    # Create empty buckets
    n = len(arr)
    if n <= 1:
        return arr

    # Create an empty bucket list
    buckets = [[] for _ in range(n)]

    # Find the range of the input data
    min_value, max_value = min(arr), max(arr)

    # Distribute the elements into the buckets
    for num in arr:
        index = int((num - min_value) / (max_value - min_value + 1) * (n - 1))
        buckets[index].append(num)

    # Sort each bucket and concatenate the results
    sorted_arr = []
    for bucket in buckets:
        insertion_sort(bucket)  # Sorting each bucket using Insertion Sort
        sorted_arr.extend(bucket)

    return sorted_arr

# Example usage:
arr = [0.42, 0.32, 0.23, 0.89, 0.12, 0.57, 0.71, 0.98]
print("Unsorted Array:", arr)
sorted_arr = bucket_sort(arr)
print("Sorted Array:", sorted_arr)

Output:

Unsorted Array: [0.42, 0.32, 0.23, 0.89, 0.12, 0.57, 0.71, 0.98]
Sorted Array: [0.12, 0.23, 0.32, 0.42, 0.57, 0.71, 0.89, 0.98]

Explanation of Code:

  • Insertion Sort: We use Insertion Sort to sort the elements within each bucket. Since each bucket usually contains a small number of elements, Insertion Sort is efficient for this.
  • Bucket Creation and Distribution: We divide the input data into buckets based on the value of each element. We calculate the index for each element based on its relative value and place it into the corresponding bucket.
  • Concatenating the Results: After sorting the buckets, we concatenate the sorted elements from all buckets to form the final sorted array.

Time and Space Complexity of Bucket Sort

Time Complexity:

  • Best Case: O(n+n2), where n is the number of elements. If all elements are uniformly distributed across buckets, and if we use an efficient sorting algorithm like Merge Sort within each bucket, the time complexity is linear.
  • Average Case: O(n+n2), where n is the number of elements. The complexity depends on how evenly the elements are distributed among the buckets.
  • Worst Case: O(n2), if all elements fall into a single bucket and we end up sorting the entire array using an inefficient sorting algorithm. This is unlikely with a good distribution function.
    • If a good distribution function is used (elements are uniformly distributed), Bucket Sort performs well in most cases.

Space Complexity:

  • Space Complexity: O(n), because we need space to store the buckets. The extra space required is proportional to the number of elements in the array.

Advantages of Bucket Sort

  1. Efficient for Uniform Distributions: When elements are uniformly distributed across a range, Bucket Sort can achieve linear time complexity.
  2. Scalable for Large Datasets: It can handle large datasets efficiently if the data is evenly distributed across buckets.
  3. Non-comparison-based Sorting: Since Bucket Sort is a non-comparison-based algorithm, it can outperform comparison-based algorithms like QuickSort and MergeSort in certain cases.
  4. Can Be Combined with Other Algorithms: Bucket Sort can be combined with other sorting algorithms, like Merge Sort or Insertion Sort, within individual buckets to further improve efficiency.

Disadvantages of Bucket Sort

  1. Limited to Uniform Distributions: Bucket Sort is most efficient when the input data is uniformly distributed. If the data is highly skewed or not uniformly distributed, the algorithm may degrade to O(n^2).
  2. Requires Extra Space: It requires additional space for the buckets, which could be a drawback if memory is constrained.
  3. Not Suitable for All Data Types: It is mainly used for sorting numbers or strings with a fixed range. It is not suitable for complex data types or arbitrary comparisons.

When to Use Bucket Sort?

  1. Uniformly Distributed Data: When the data is uniformly distributed within a known range, Bucket Sort can be highly efficient.
  2. Floating Point Numbers: Bucket Sort is often used to sort floating-point numbers or values in a specific range, such as grades or percentages.
  3. For Large Datasets: If the data is large and uniformly distributed, Bucket Sort can handle the input more efficiently than comparison-based algorithms.