
withColumnRenamed()
The withColumnRenamed()
function allows you to rename a column in a DataFrame. This is especially helpful for improving the readability of your data, adhering to specific naming standards, or preparing for data integration tasks.withColumnRenamed()
takes two arguments: the existing column name and the new column name. It returns a new DataFrame with the specified column renamed.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSession
# Initialize Spark Sessionspark = SparkSession.builder.appName("selectExample").getOrCreate()
# Create a Spark DataFrame data = [("James", "Smith", "USA", 1), ("Anna", "Rose", "UK", 2), ("Robert", "Williams", "USA", 3)]
columns = ["Firstname", "Lastname", "Country", "ID"]
df = spark.createDataFrame(data, columns)df.show()
Output:
+---------+--------+-------+---+
|Firstname|Lastname|Country| ID|
+---------+--------+-------+---+
| James| Smith| USA| 1|
| Anna| Rose| UK| 2|
| Robert|Williams| USA| 3|
+---------+--------+-------+---+
Example: Use withColumnRenamed()
to rename a column
df.withColumnRenamed("ID", "Index")
: This line of code renames the ID column to Index in the DataFrame df
.
df_renamed = df.withColumnRenamed("ID", "Index")df_renamed.show()
Output:
+---------+--------+-------+-----+
|Firstname|Lastname|Country|Index|
+---------+--------+-------+-----+
| James| Smith| USA| 1|
| Anna| Rose| UK| 2|
| Robert|Williams| USA| 3|
+---------+--------+-------+-----+
Example: Use withColumnRenamed()
to rename multiple columns
By using multiple withColumnRenamed()
together, we're able to rename multiple columns at once.
df_renamed = df.withColumnRenamed("ID", "Index") .withColumnRenamed("Firstname", "firstName") .withColumnRenamed("Lastname", 'lastName')df_renamed.show()
Output:
+---------+--------+-------+-----+
|firstName|lastName|Country|Index|
+---------+--------+-------+-----+
| James| Smith| USA| 1|
| Anna| Rose| UK| 2|
| Robert|Williams| USA| 3|
+---------+--------+-------+-----+
# Stop the Spark Sessionspark.stop()