`min()`

The min() function in Apache Spark is an aggregation function designed to compute the minimum value of a column in a DataFrame.

Usage

The min() function can be applied directly to a DataFrame to find the minimum value in a specific column.
When used with groupBy(), it returns the minimum values for each group in a column.

Create Spark Session and sample DataFrame

from pyspark.sql import SparkSessionfrom pyspark.sql.functions import min
# Initialize Spark Sessionspark = SparkSession.builder.appName("minExample").getOrCreate()
# Sample DataFramedata = [("group A", 45), ("group A", 30), ("group A", 55),        ("group B", 10), ("group B", 20), ("group B", 60),       ]columns = ["Group", "Variable"]df = spark.createDataFrame(data, columns)df.show()

Output:
+-------+--------+
|  Group|Variable|
+-------+--------+
|group A|      45|
|group A|      30|
|group A|      55|
|group B|      10|
|group B|      20|
|group B|      60|
+-------+--------+

Example: Use `min` to compute the min value of a column

min("Variable"): it computes the minimum value of the Variable column.
alias("Minimum Value"): it renames the resulting column as Minimum Value.

df.select(min("Variable").alias("Minimum Value")).show()

Output:
+-------------+
|Minimum Value|
+-------------+
|           10|
+-------------+

Example: Use `min()` with `groupBy()` to compute the min value of each group

groupBy("Group"): it groups the data by the Group column.
agg(min("Variable").alias("Minimum Value"): it computes the min value of each group and renames it as Minimum Value.

grouped_data = df.groupBy("Group").agg(min("Variable").alias("Minimum Value"))grouped_data.show()

Output:
+-------+-------------+
|  Group|Minimum Value|
+-------+-------------+
|group A|           30|
|group B|           10|
+-------+-------------+

# Stop the Spark Sessionspark.stop()

< Previous

Next >

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery

min()

Usage

Create Spark Session and sample DataFrame

Example: Use min to compute the min value of a column

Example: Use min() with groupBy() to compute the min value of each group

Amazing eBook to learn ggplot2 FAST & EASY

`min()`

Example: Use `min` to compute the min value of a column

Example: Use `min()` with `groupBy()` to compute the min value of each group