`fillna()`

The na.fill() or fillna() method in Apache Spark is used to replace null or NaN values in a DataFrame with a specified value.

Usage

na.fill() can be used to fill all null values in the DataFrame with a specified value.
It can also target specific columns for filling null values.

Create Spark Session and sample DataFrame

from pyspark.sql import SparkSessionfrom pyspark.sql.functions import col
# Initialize Spark Sessionspark = SparkSession.builder.appName("fillnaExample").getOrCreate()
# Sample DataFrame with Null Valuesdata = [("James", None), ("Anna", 28), (None, 34), ("Robert", None)]columns = ["Name", "Age"]df = spark.createDataFrame(data, columns)df.show()

Output:
+------+----+
|  Name| Age|
+------+----+
| James|NULL|
|  Anna|  28|
|  NULL|  34|
|Robert|NULL|
+------+----+

Example: Use `na.fill()` to Replace Null Values

Replace Null values in the Name column with word "Unknown"

df.na.fill("Unknown", ["Name"]).show()

Output:
+-------+----+
|   Name| Age|
+-------+----+
|  James|NULL|
|   Anna|  28|
|Unknown|  34|
| Robert|NULL|
+-------+----+

Replace Null values in Age column with integer "0"

filled_df_age = df.na.fill({"Age": 0})filled_df_age.show()

Output:
+------+---+
|  Name|Age|
+------+---+
| James|  0|
|  Anna| 28|
|  NULL| 34|
|Robert|  0|
+------+---+

Replace Null values in both columns with different values

df.na.fill({"Name": "Unknown", "Age": 0}).show()

Output:
+-------+---+
|   Name|Age|
+-------+---+
|  James|  0|
|   Anna| 28|
|Unknown| 34|
| Robert|  0|
+-------+---+

# Stop the Spark Sessionspark.stop()

< Previous

Next >

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery

fillna()

Usage

Create Spark Session and sample DataFrame

Example: Use na.fill() to Replace Null Values

Replace Null values in the Name column with word "Unknown"

Replace Null values in Age column with integer "0"

Replace Null values in both columns with different values

Amazing eBook to learn ggplot2 FAST & EASY

`fillna()`

Example: Use `na.fill()` to Replace Null Values