
fillna()
The na.fill()
or fillna()
method in Apache Spark is used to replace null or NaN values in a DataFrame with a specified value.
Usage
na.fill()
can be used to fill all null values in the DataFrame with a specified value.- It can also target specific columns for filling null values.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import col
# Initialize Spark Sessionspark = SparkSession.builder.appName("fillnaExample").getOrCreate()
# Sample DataFrame with Null Valuesdata = [("James", None), ("Anna", 28), (None, 34), ("Robert", None)]columns = ["Name", "Age"]df = spark.createDataFrame(data, columns)df.show()
Output:
+------+----+
| Name| Age|
+------+----+
| James|NULL|
| Anna| 28|
| NULL| 34|
|Robert|NULL|
+------+----+
Example: Use na.fill()
to Replace Null Values
Replace Null values in the Name column with word "Unknown"
df.na.fill("Unknown", ["Name"]).show()
Output:
+-------+----+
| Name| Age|
+-------+----+
| James|NULL|
| Anna| 28|
|Unknown| 34|
| Robert|NULL|
+-------+----+
Replace Null values in Age column with integer "0"
filled_df_age = df.na.fill({"Age": 0})filled_df_age.show()
Output:
+------+---+
| Name|Age|
+------+---+
| James| 0|
| Anna| 28|
| NULL| 34|
|Robert| 0|
+------+---+
Replace Null values in both columns with different values
df.na.fill({"Name": "Unknown", "Age": 0}).show()
Output:
+-------+---+
| Name|Age|
+-------+---+
| James| 0|
| Anna| 28|
|Unknown| 34|
| Robert| 0|
+-------+---+
# Stop the Spark Sessionspark.stop()