`max`

The max() function in Apache Spark is an aggregation function used to compute the maximum value of a column in a DataFrame.

Usage

max() can be applied directly to a DataFrame to find the maximum value in a specific column.
When combined with groupBy(), it can be used to find the maximum values for each group in a column.

Create Spark Session and sample DataFrame

from pyspark.sql import SparkSessionfrom pyspark.sql.functions import max
# Initialize Spark Sessionspark = SparkSession.builder.appName("maxExample").getOrCreate()
# Sample DataFramedata = [("group A", 45), ("group A", 30), ("group A", 55),        ("group B", 10), ("group B", 20), ("group B", 60),       ]columns = ["Group", "Variable"]df = spark.createDataFrame(data, columns)df.show()

Output:
+-------+--------+
|  Group|Variable|
+-------+--------+
|group A|      45|
|group A|      30|
|group A|      55|
|group B|      10|
|group B|      20|
|group B|      60|
+-------+--------+

Example: Use `max()` to return the max value of a column

max("Variable"): it returns the max value of the Variable column.
alias("Maximum Value"): it renames the returned column as Maxium Value.

max_df = df.select(max("Variable").alias("Maximum Value"))max_df.show()

Output:
+-------------+
|Maximum Value|
+-------------+
|           60|
+-------------+

Example: Use `max()` with `groupBy()` to return the max value of each group

groupBy("Group"): it groups the data by the Group column.
max("Variable").alias("Maximum Value"): now it returns the max value of each group and renamed it as Maximum Value.

grouped_data = df.groupBy("Group").agg(max("Variable").alias("Maximum Value"))grouped_data.show()

Output:
+-------+-------------+
|  Group|Maximum Value|
+-------+-------------+
|group A|           55|
|group B|           60|
+-------+-------------+

# Stop the Spark Sessionspark.stop()

< Previous

Next >

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery

max

Usage

Create Spark Session and sample DataFrame

Example: Use max() to return the max value of a column

Example: Use max() with groupBy() to return the max value of each group

Amazing eBook to learn ggplot2 FAST & EASY

`max`

Example: Use `max()` to return the max value of a column

Example: Use `max()` with `groupBy()` to return the max value of each group