
drop()
The drop()
function is used to remove columns from a DataFrame. It returns a new DataFrame after dropping the given column.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSession
# Initialize Spark Sessionspark = SparkSession.builder.appName("selectExample").getOrCreate()
# Create a Spark DataFrame data = [("James", "Smith", "USA", 1), ("Anna", "Rose", "UK", 2), ("Robert", "Williams", "USA", 3)]
columns = ["Firstname", "Lastname", "Country", "ID"]
df = spark.createDataFrame(data, columns)df.show()
Output:
+---------+--------+-------+---+
|Firstname|Lastname|Country| ID|
+---------+--------+-------+---+
| James| Smith| USA| 1|
| Anna| Rose| UK| 2|
| Robert|Williams| USA| 3|
+---------+--------+-------+---+
Example: Drop a single column from the DataFrame
df.drop("Country")
: Drops the Country column from the DataFrame df.
df.drop("Country").show()
Output:
+---------+--------+---+
|Firstname|Lastname| ID|
+---------+--------+---+
| James| Smith| 1|
| Anna| Rose| 2|
| Robert|Williams| 3|
+---------+--------+---+
Example: Drop multiple columns from the DataFrame
df.drop("Country", "ID")
: Drops both the 'Country' and 'ID' columns from df.
df.drop("Country", "ID").show()
Output:
+---------+--------+
|Firstname|Lastname|
+---------+--------+
| James| Smith|
| Anna| Rose|
| Robert|Williams|
+---------+--------+
# Stop the Spark Sessionspark.stop()