`replace()`

The replace() function is used to replace values in a DataFrame. This method allows for the substitution of specific values within one or more columns, which is helpful in data cleaning and transformation processes.

Usage

replace(search_value, replacement_value, column_name) takes three arguments:

search_value: This is the value you want to search to replace.
replacement_value: This is the value that you want to use as a replacement for the search value in the specified column.
column_name(optional): This is the name of the column or a list of columns in which you want to perform the replacement. If not specified, entire dataset will be searched for the serach value and replaced by the replacement value.

Create Spark Session and sample DataFrame

from pyspark.sql import SparkSession
# Initialize Spark Sessionspark = SparkSession.builder.appName("replaceExample").getOrCreate()
# Sample DataFramedata = [("James", "New York"), ("Anna", "California"), ("Robert", "California")]columns = ["Name", "State"]df = spark.createDataFrame(data, columns)df.show()

Output:
+------+----------+
|  Name|     State|
+------+----------+
| James|  New York|
|  Anna|California|
|Robert|California|
+------+----------+

Example: Use `replace()` to search and replace a value

replace("California", "CA", ["State"]): it replaces the word "California" with "CA" in the State column.

replaced_df = df.replace("California", "CA", ["State"])replaced_df.show()

Output:
+------+--------+
|  Name|   State|
+------+--------+
| James|New York|
|  Anna|      CA|
|Robert|      CA|
+------+--------+

# Stop the Spark Sessionspark.stop()

< Previous

Next >

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery

replace()

Usage

Create Spark Session and sample DataFrame

Example: Use replace() to search and replace a value

Amazing eBook to learn ggplot2 FAST & EASY

`replace()`

Example: Use `replace()` to search and replace a value