Sort a DataFrame

Sometimes, you may need to arrange the rows of a DataFrame based on specific criteria. To do this, we can use the sort_values() function.

This tutorial uses classic Iris dataset, which can be downloaded here Iris dataset.

import pandas as pddf = pd.read_csv('Iris.csv')

Before sorting, the data is ordered by its Id column ascendingly.

df.head()

Output:


Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
0 1 5.1 3.5 1.4 0.2 Iris-setosa
1 2 4.9 3.0 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
4 5 5.0 3.6 1.4 0.2 Iris-setosa

1. Sort a DataFrame by a Single Column

To sort a DataFrame by a specific column, such as SepalLengthCm, we can use the sort_values() function. Simply provide the column name to the by parameter like this: df.sort_values(by='SepalLengthCm'). By default, this function arranges the data in ascending order.

df.sort_values(by=['SepalLengthCm']).head(10)

Output:


Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
13 14 4.3 3.0 1.1 0.1 Iris-setosa
42 43 4.4 3.2 1.3 0.2 Iris-setosa
38 39 4.4 3.0 1.3 0.2 Iris-setosa
8 9 4.4 2.9 1.4 0.2 Iris-setosa
41 42 4.5 2.3 1.3 0.3 Iris-setosa
22 23 4.6 3.6 1.0 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
6 7 4.6 3.4 1.4 0.3 Iris-setosa
47 48 4.6 3.2 1.4 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa

In the returned DataFrame displayed above, you might notice that the Id column is no longer in ascending order. This is because the DataFrame has been sorted based on the SepalLengthCm column.

To sort the data by the SepalLengthCm column descendingly, we can set the ascending parameter of the sort_values() function to False, as shown in the example below:

df.sort_values(by=['SepalLengthCm'], ascending=False).head(10)

Output:


Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
131 132 7.9 3.8 6.4 2.0 Iris-virginica
135 136 7.7 3.0 6.1 2.3 Iris-virginica
122 123 7.7 2.8 6.7 2.0 Iris-virginica
117 118 7.7 3.8 6.7 2.2 Iris-virginica
118 119 7.7 2.6 6.9 2.3 Iris-virginica
105 106 7.6 3.0 6.6 2.1 Iris-virginica
130 131 7.4 2.8 6.1 1.9 Iris-virginica
107 108 7.3 2.9 6.3 1.8 Iris-virginica
125 126 7.2 3.2 6.0 1.8 Iris-virginica
109 110 7.2 3.6 6.1 2.5 Iris-virginica

The data has now been aranged in descending order based on the SepalLengthCm column.

2. Sort a DataFrame by Multiple Columns

Occasionally, a situation arises where a tie occurs while sorting data based on a single column. In such cases, we can enhance our sorting strategy by including an additional column within the by argument. This way, if there's a tie in the primary column, the data will be sorted by the secondary column.

Let's see the following example where we use the sort_values() function to sort the data based on both the SepalLengthCm and SepalWidthCm columns.

df.sort_values(by=['SepalLengthCm', 'SepalWidthCm']).head(10)

Output:


Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
13 14 4.3 3.0 1.1 0.1 Iris-setosa
8 9 4.4 2.9 1.4 0.2 Iris-setosa
38 39 4.4 3.0 1.3 0.2 Iris-setosa
42 43 4.4 3.2 1.3 0.2 Iris-setosa
41 42 4.5 2.3 1.3 0.3 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
47 48 4.6 3.2 1.4 0.2 Iris-setosa
6 7 4.6 3.4 1.4 0.3 Iris-setosa
22 23 4.6 3.6 1.0 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa

The data has now been primarily sorted based on the SepalLengthCm column. In the event of a tie in the SepalLengthCm column, it is further sorted by the secondary column SepalWidthCm. For instance, for rows with SepalLengthCm value of 4.4, like Id of 9, 39, and 43, they have been further ordered based on their SepalWidthCm values to resolve the tie.

3. Sort a DataFrame in Ascending and Descending Order

If you want to sort a DataFrame in ascending order based on the SepalLengthCm column and in descending order based on the SepalWidthCm column, you can achieve this by providing a list of boolean values to specify the sorting order for each column using the ascending parameter within the sort_values function. Here's how:

df.sort_values(by=['SepalLengthCm', 'SepalWidthCm'], ascending=[True, False]).head(10)

Output:


Id SepalLengthCm SepalWidthCm PetalLengthCm PetalWidthCm Species
13 14 4.3 3.0 1.1 0.1 Iris-setosa
42 43 4.4 3.2 1.3 0.2 Iris-setosa
38 39 4.4 3.0 1.3 0.2 Iris-setosa
8 9 4.4 2.9 1.4 0.2 Iris-setosa
41 42 4.5 2.3 1.3 0.3 Iris-setosa
22 23 4.6 3.6 1.0 0.2 Iris-setosa
6 7 4.6 3.4 1.4 0.3 Iris-setosa
47 48 4.6 3.2 1.4 0.2 Iris-setosa
3 4 4.6 3.1 1.5 0.2 Iris-setosa
2 3 4.7 3.2 1.3 0.2 Iris-setosa

Excellent! In the next tutorial, we'll delve into another crucial aspect of Data Wrangling: Replace Values.

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
0	1	5.1	3.5	1.4	0.2	Iris-setosa
1	2	4.9	3.0	1.4	0.2	Iris-setosa
2	3	4.7	3.2	1.3	0.2	Iris-setosa
3	4	4.6	3.1	1.5	0.2	Iris-setosa
4	5	5.0	3.6	1.4	0.2	Iris-setosa

	Id	SepalLengthCm	SepalWidthCm	PetalLengthCm	PetalWidthCm	Species
131	132	7.9	3.8	6.4	2.0	Iris-virginica
135	136	7.7	3.0	6.1	2.3	Iris-virginica
122	123	7.7	2.8	6.7	2.0	Iris-virginica
117	118	7.7	3.8	6.7	2.2	Iris-virginica
118	119	7.7	2.6	6.9	2.3	Iris-virginica
105	106	7.6	3.0	6.6	2.1	Iris-virginica
130	131	7.4	2.8	6.1	1.9	Iris-virginica
107	108	7.3	2.9	6.3	1.8	Iris-virginica
125	126	7.2	3.2	6.0	1.8	Iris-virginica
109	110	7.2	3.6	6.1	2.5	Iris-virginica