Hướng dẫn sorting algorithms in python
Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Introduction to Sorting Algorithms in Python Show
Sorting is a basic building block that many other algorithms are built upon. It’s related to several exciting ideas that you’ll see throughout your programming career. Understanding how sorting algorithms in Python work behind the scenes is a fundamental step toward implementing correct and efficient algorithms that solve real-world problems. In this tutorial, you’ll learn:
By the end of this tutorial, you’ll understand sorting algorithms from both a theoretical and a practical standpoint. More importantly, you’ll have a deeper understanding of different algorithm design techniques that you can apply to other areas of your work. Let’s get started! The Importance of Sorting Algorithms in PythonSorting is one of the most thoroughly studied algorithms in computer science. There are dozens of different sorting implementations and applications that you can use to make your code more efficient and effective. You can use sorting to solve a wide range of problems:
From commercial applications to academic research and everywhere in between, there are countless ways you can use sorting to save yourself time and effort. Python’s Built-In Sorting AlgorithmThe Python language, like many other high-level programming languages, offers the ability to sort data out of the box using >>>
You can use The Significance of Time ComplexityThis tutorial covers two different ways to measure the runtime of sorting algorithms:
Timing Your CodeWhen comparing two sorting algorithms in Python, it’s always informative to look at how long each one takes to run. The specific time each algorithm takes will be partly determined by your hardware, but you can still use the proportional time between executions to help you decide which implementation is more time efficient. In this section, you’ll focus on a practical way to measure the actual time it takes to run to your sorting algorithms using the Here’s a function you can use to time your algorithms:
In this example,
Here’s an example of how to use
If you save the above code in a
Remember that the time in seconds of every experiment depends in part on the hardware you use, so you’ll likely see slightly different results when running the code. Measuring Efficiency With Big O NotationThe specific time an algorithm takes to run isn’t enough information to get the full picture of its time complexity. To solve this problem, you can use Big O (pronounced “big oh”) notation. Big O is often used to compare different implementations and decide which one is the most efficient, skipping unnecessary details and focusing on what’s most important in the runtime of an algorithm. The time in seconds required to run different algorithms can be influenced by several unrelated factors, including processor speed or available memory. Big O, on the other hand, provides a platform to express runtime complexity in hardware-agnostic terms. With Big O, you express complexity in terms of how quickly your algorithm’s runtime grows relative to the size of the input, especially as the input grows arbitrarily large. Assuming that n is the size of the input to an algorithm, the Big O notation represents the relationship between n and the number of steps the algorithm takes to find a solution. Big O uses a capital letter “O” followed by this relationship inside parentheses. For example, O(n) represents algorithms that execute a number of steps proportional to the size of their input. Although this tutorial isn’t going to dive very deep into the details of Big O notation, here are five examples of the runtime complexity of different algorithms:
This tutorial covers the Big O runtime complexity of each of the sorting algorithms discussed. It also includes a brief explanation of how to determine the runtime on each particular case. This will give you a better understanding of how to start using Big O to classify other algorithms. The Bubble Sort Algorithm in PythonBubble Sort is one of the most straightforward sorting algorithms. Its name comes from the way the algorithm works: With every new pass, the largest element in the list “bubbles up” toward its correct position. Bubble sort consists of making multiple passes through a list, comparing elements one by one, and swapping adjacent items that are out of order. Implementing Bubble Sort in PythonHere’s an implementation of a bubble sort algorithm in Python:
Since this implementation sorts the array in ascending order, each step “bubbles” the largest element to the end of the array. This means that each iteration takes fewer steps than the previous iteration because a continuously larger portion of the array is sorted. The loops in lines 4 and 10 determine the way the algorithm runs through the list. Notice how As the loops progress, line 15 compares each element with its adjacent value, and line 18 swaps them if they are in the incorrect order. This ensures a sorted list at the end of the function. To properly analyze how the algorithm works, consider a list with values Now take a step-by-step look at what’s happening with the array as the algorithm progresses:
Measuring Bubble Sort’s Big O Runtime ComplexityYour implementation of bubble sort consists of two nested
You learned earlier that Big O focuses on how the runtime grows in comparison to the size of the input. That means that, in order to turn the above equation into the Big O complexity of the algorithm, you need to remove the constants because they don’t change with the input size. Doing so simplifies the notation to n2 - n. Since n2 grows much faster than n, this last term can be dropped as well, leaving bubble sort with an average- and worst-case complexity of O(n2). In cases where the algorithm receives
an array that’s already sorted—and assuming the implementation includes the O(n), then, is the best-case runtime complexity of bubble sort. But keep in mind that best cases are an exception, and you should focus on the average case when comparing different algorithms. Timing Your Bubble Sort ImplementationUsing your
You can
now run the script to get the execution time of
It took Analyzing the Strengths and Weaknesses of Bubble SortThe main advantage of the bubble sort algorithm is its simplicity. It is straightforward to both implement and understand. This is probably the main reason why most computer science courses introduce the topic of sorting using bubble sort. As you saw before, the disadvantage of bubble sort is that it is slow, with a runtime complexity of O(n2). Unfortunately, this rules it out as a practical candidate for sorting large arrays. The Insertion Sort Algorithm in PythonLike bubble sort, the insertion sort algorithm is straightforward to implement and understand. But unlike bubble sort, it builds the sorted list one element at a time by comparing each item with the rest of the list and inserting it into its correct position. This “insertion” procedure gives the algorithm its name. An excellent analogy to explain insertion sort is the way you would sort a deck of cards. Imagine that you’re holding a group of cards in your hands, and you want to arrange them in order. You’d start by comparing a single card step by step with the rest of the cards until you find its correct position. At that point, you’d insert the card in the correct location and start over with a new card, repeating until all the cards in your hand were sorted. Implementing Insertion Sort in PythonThe insertion sort algorithm works exactly like the example with the deck of cards. Here’s the implementation in Python:
Unlike bubble sort, this implementation of insertion sort constructs the sorted list by pushing smaller items to the left. Let’s break down
Here’s a figure illustrating the different iterations of the algorithm when sorting the array Now here’s a summary of the steps of the algorithm when sorting the array:
Measuring Insertion Sort’s Big O Runtime ComplexitySimilar to your bubble sort implementation, the insertion sort algorithm has a couple of nested loops that go over the list. The inner loop is pretty efficient because it only goes through the list until it finds the correct position of an element. That said, the algorithm still has an O(n2) runtime complexity on the average case. The worst case happens when the supplied array is sorted in reverse order. In this case, the inner loop has to execute every comparison to put every element in its correct position. This still gives you an O(n2) runtime complexity. The best case happens when the supplied array is already sorted. Here, the inner loop is never executed, resulting in an O(n) runtime complexity, just like the best case of bubble sort. Although bubble sort and insertion sort have the same Big O runtime complexity, in practice, insertion sort is considerably more efficient than bubble sort. If you look at the implementation of both algorithms, then you can see how insertion sort has to make fewer comparisons to sort the list. Timing Your Insertion Sort ImplementationTo prove the assertion that insertion sort is more efficient than bubble sort, you can time the insertion sort algorithm and compare it with the results of bubble sort. To do this, you just need to replace the call to
You can execute the script as before:
Notice how the insertion sort implementation took around Analyzing the Strengths and Weaknesses of Insertion SortJust like bubble sort, the insertion sort algorithm is very uncomplicated to implement. Even though insertion sort is an O(n2) algorithm, it’s also much more efficient in practice than other quadratic implementations such as bubble sort. There are more powerful algorithms, including merge sort and Quicksort, but these implementations are recursive and usually fail to beat insertion sort when working on small lists. Some Quicksort implementations even use insertion sort internally if the list is small enough to provide a faster overall implementation. Timsort also uses insertion sort internally to sort small portions of the input array. That said, insertion sort is not practical for large arrays, opening the door to algorithms that can scale in more efficient ways. The Merge Sort Algorithm in PythonMerge sort is a very efficient sorting algorithm. It’s based on the divide-and-conquer approach, a powerful algorithmic technique used to solve complex problems. To properly understand divide and conquer, you should first understand the concept of recursion. Recursion involves breaking a problem down into smaller subproblems until they’re small enough to manage. In programming, recursion is usually expressed by a function calling itself. Divide-and-conquer algorithms typically follow the same structure:
In the case of merge sort, the divide-and-conquer approach divides the set of input values into two equal-sized parts, sorts each half recursively, and finally merges these two sorted parts into a single sorted list. Implementing Merge Sort in PythonThe implementation of the merge sort algorithm needs two different pieces:
Here’s the code to merge two different arrays:
With the above function in place, the only missing piece is a function that recursively splits the input array in half and uses
Here’s a quick summary of the code:
Notice how this function calls
itself recursively, halving the array each time. Each iteration deals with an ever-shrinking array until fewer than two elements remain, meaning there’s nothing left to sort. At this point, Take a look at a representation of the steps that merge sort will take to sort the array The figure uses yellow arrows to represent halving the array at each recursion level. The green arrows represent merging each subarray back together. The steps can be summarized as follows:
Measuring Merge Sort’s Big O ComplexityTo analyze the complexity of merge sort, you can look at its two steps separately:
Interestingly, O(n log2n) is the best possible worst-case runtime that can be achieved by a sorting algorithm. Timing Your Merge Sort ImplementationTo compare the speed of merge sort with the previous two implementations, you can use the same mechanism as before and replace the name of the algorithm in line 8:
You can execute the script to get the execution time of
Compared to bubble sort and insertion sort, the merge sort implementation is extremely fast, sorting the ten-thousand-element array in less than a second! Analyzing the Strengths and Weaknesses of Merge SortThanks to its runtime complexity of O(n log2n), merge sort is a very efficient algorithm that scales well as the size of the input array grows. It’s also straightforward to parallelize because it breaks the input array into chunks that can be distributed and processed in parallel if necessary. That said, for small lists, the time cost of the recursion allows algorithms such as bubble sort and insertion sort to be faster. For example, running an experiment with a list of ten elements results in the following times:
Both bubble sort and insertion sort beat merge sort when sorting a ten-element list. Another drawback of merge sort is that it creates copies of the array
when calling itself recursively. It also creates a new list inside Due to this limitation, you may not want to use merge sort to sort large lists in memory-constrained hardware. The Quicksort Algorithm in PythonJust like merge sort, the Quicksort algorithm applies the divide-and-conquer principle to divide the input array into two lists, the first with small items and the second with large items. The algorithm then sorts both lists recursively until the resultant list is completely sorted. Dividing the input list is referred to as
partitioning the list. Quicksort first selects a Putting every element from the Implementing Quicksort in PythonHere’s a fairly compact implementation of Quicksort:
Here’s a summary of the code:
Here’s an illustration of the steps that Quicksort takes to sort the array The yellow lines represent the partitioning of the array into three lists:
Selecting the pivot ElementWhy does the implementation above select the Because of how the Quicksort algorithm works, the number of recursion levels depends on where On the other hand, if the algorithm consistently picks either the smallest or largest element of the array as the As you can see, Quicksort’s efficiency often depends on the Another option for selecting the Measuring Quicksort’s Big O ComplexityWith Quicksort, the input list is partitioned in linear time, O(n), and this process repeats recursively an average of log2n times. This leads to a final complexity of O(n log2n). That said,
remember the discussion about how the selection of the Theoretically, if the algorithm focuses first on finding the median value and then uses it as the By using the median value as the Timing Your Quicksort ImplementationBy now, you’re familiar with the process for timing the runtime of the algorithm. Just change the name of the algorithm in line 8:
You can execute the script as you have before:
Not only does Quicksort
finish in less than one second, but it’s also much faster than merge sort ( Analyzing the Strengths and Weaknesses of QuicksortTrue to its name, Quicksort is very fast. Although its worst-case scenario is theoretically O(n2), in practice, a good implementation of Quicksort beats most other sorting implementations. Also, just like merge sort, Quicksort is straightforward to parallelize. One of Quicksort’s main disadvantages is the lack of a guarantee that it will achieve the average runtime complexity. Although worst-case scenarios are rare, certain applications can’t afford to risk poor performance, so they opt for algorithms that stay within O(n log2n) regardless of the input. Just like merge sort, Quicksort also trades off memory space for speed. This may become a limitation for sorting larger lists. A quick experiment sorting a list of ten elements leads to the following results:
The results show that Quicksort also pays the price of recursion when the list is sufficiently small, taking longer to complete than both insertion sort and bubble sort. The Timsort Algorithm in PythonThe Timsort algorithm is considered a hybrid sorting algorithm because it employs a best-of-both-worlds combination of insertion sort and merge sort. Timsort is near and dear to the Python community because it was created by Tim Peters in 2002 to be used as the standard sorting algorithm of the Python language. The main characteristic of Timsort is that it takes advantage of already-sorted elements that exist in most real-world datasets. These are called natural runs. The algorithm then iterates over the list, collecting the elements into runs and merging them into a single sorted list. Implementing Timsort in PythonIn this section, you’ll create a barebones Python implementation that illustrates all the pieces of the Timsort algorithm. If you’re interested, you can also check out the original C implementation of Timsort. The first step in implementing Timsort is modifying the implementation of
This modified implementation adds a couple of parameters, Now take a look at the implementation of Timsort:
Although the implementation is a bit more complex than the previous algorithms, we can summarize it quickly in the following way:
Notice how, unlike merge sort, Timsort merges subarrays that were previously sorted. Doing so decreases the total number of comparisons required to produce a sorted list. This advantage over merge sort will become apparent when running experiments using different arrays. Finally, line 2 defines
Combining both conditions above offers several options for Measuring Timsort’s Big O ComplexityOn average, the complexity of Timsort is O(n log2n), just like merge sort and Quicksort. The logarithmic part comes from doubling the size of the run to perform each linear merge operation. However, Timsort performs exceptionally well on already-sorted or close-to-sorted lists, leading to a best-case scenario of O(n). In this case, Timsort clearly beats merge sort and matches the best-case scenario for Quicksort. But the worst case for Timsort is also O(n log2n), which surpasses Quicksort’s O(n2). Timing Your Timsort ImplementationYou can use
Now execute the script to get the execution time of
At
Now try to sort an already-sorted list using these four algorithms and see what happens. You can modify your
If you execute the script now, then all the algorithms will run and output their corresponding execution time:
This time, Timsort comes in at a whopping thirty-seven percent faster than merge sort and five percent faster than Quicksort, flexing its ability to take advantage of the already-sorted runs. Notice how Timsort benefits from two algorithms that are much slower when used by themselves. The genius of Timsort is in combining these algorithms and playing to their strengths to achieve impressive results. Analyzing the Strengths and Weaknesses of TimsortThe main disadvantage of Timsort is its complexity. Despite implementing a very simplified version of the original algorithm, it still requires much more code because it relies on both One of Timsort’s advantages is its ability to predictably perform in O(n log2n) regardless of the structure of the input array. Contrast that with Quicksort, which can degrade down to O(n2). Timsort is also very fast for small arrays because the algorithm turns into a single insertion sort. For real-world usage, in which it’s common to sort arrays that already have some preexisting order, Timsort is a great option. Its adaptability makes it an excellent choice for sorting arrays of any length. ConclusionSorting is an essential tool in any Pythonista’s toolkit. With knowledge of the different sorting algorithms in Python and how to maximize their potential, you’re ready to implement faster, more efficient apps and programs! In this tutorial, you learned:
You also learned about different techniques such as recursion, divide and conquer, and randomization. These are fundamental building blocks for solving a long list of different algorithms, and they’ll come up again and again as you keep researching. Take the code presented in this tutorial, create new experiments, and explore these algorithms further. Better yet, try implementing other sorting algorithms in Python. The list is vast, but selection sort, heapsort, and tree sort are three excellent options to start with. Watch Now This tutorial has a related video course created by the Real Python team. Watch it together with the written tutorial to deepen your understanding: Introduction to Sorting Algorithms in Python |