
sort()
The sort()
function is used to arrange the rows in a DataFrame in ascending or descending order based on one or more column values.
By default, the sort()
function sorts data in ascending order, but you can specify descending order as well.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSession
# Initialize Spark Sessionspark = SparkSession.builder.appName("selectExample").getOrCreate()
# Sample data with duplicatesdata = [ (1, 20, 15), (2, 22, 16), (3, 20, 14), (4, 23, 17), (5, 18, 15)]# Column namescolumns = ["ID", "Weight", "Length"]df = spark.createDataFrame(data, columns)df.show()
Output:
+---+------+------+
| ID|Weight|Length|
+---+------+------+
| 1| 20| 15|
| 2| 22| 16|
| 3| 20| 14|
| 4| 23| 17|
| 5| 18| 15|
+---+------+------+
Example: Use sort()
to sort data by one column in ascending order
df.sort("Weight")
: it sorts the df DataFrame in ascending order based on the Weight column.
# sort by weightdf_sorted = df.sort("Weight")df_sorted.show()
Output:
+---+------+------+
| ID|Weight|Length|
+---+------+------+
| 5| 18| 15|
| 3| 20| 14|
| 1| 20| 15|
| 2| 22| 16|
| 4| 23| 17|
+---+------+------+
Example: Use sort()
to sort data by one column in descending order
df.sort(col("Weight").desc())
: it sorts the df DataFrame in descending order based on the Weight column.
from pyspark.sql.functions import col
# sort by weight in descending orderdf_sorted_desc = df.sort(col("Weight").desc())df_sorted_desc.show()
Output:
+---+------+------+
| ID|Weight|Length|
+---+------+------+
| 4| 23| 17|
| 2| 22| 16|
| 1| 20| 15|
| 3| 20| 14|
| 5| 18| 15|
+---+------+------+
Example: Use sort
to sort data by multiple columns
df.sort(col("Weight"), col("Length"))
: this line of code sorts the df DataFrame first by the Weight column in descending order and then by the Length column in ascending order for rows with the same 'Weight'.
df_sorted_both = df.sort(col("Weight").desc(), col("Length").asc())df_sorted_both.show()
Output:
+---+------+------+
| ID|Weight|Length|
+---+------+------+
| 4| 23| 17|
| 2| 22| 16|
| 3| 20| 14|
| 1| 20| 15|
| 5| 18| 15|
+---+------+------+
# Stop the Spark Sessionspark.stop()