`drop()`

The drop() function is used to remove columns from a DataFrame. It returns a new DataFrame after dropping the given column.

Create Spark Session and sample DataFrame

from pyspark.sql import SparkSession
# Initialize Spark Sessionspark = SparkSession.builder.appName("selectExample").getOrCreate()
# Create a Spark DataFrame data = [("James", "Smith", "USA", 1),        ("Anna", "Rose", "UK", 2),        ("Robert", "Williams", "USA", 3)]
columns = ["Firstname", "Lastname", "Country", "ID"]
df = spark.createDataFrame(data, columns)df.show()

Output:
+---------+--------+-------+---+
|Firstname|Lastname|Country| ID|
+---------+--------+-------+---+
|    James|   Smith|    USA|  1|
|     Anna|    Rose|     UK|  2|
|   Robert|Williams|    USA|  3|
+---------+--------+-------+---+

Example: Drop a single column from the DataFrame

df.drop("Country"): Drops the Country column from the DataFrame df.

df.drop("Country").show()

Output:
+---------+--------+---+
|Firstname|Lastname| ID|
+---------+--------+---+
|    James|   Smith|  1|
|     Anna|    Rose|  2|
|   Robert|Williams|  3|
+---------+--------+---+

Example: Drop multiple columns from the DataFrame

df.drop("Country", "ID"): Drops both the 'Country' and 'ID' columns from df.

df.drop("Country", "ID").show()

Output:
+---------+--------+
|Firstname|Lastname|
+---------+--------+
|    James|   Smith|
|     Anna|    Rose|
|   Robert|Williams|
+---------+--------+

# Stop the Spark Sessionspark.stop()

< Previous

Next >

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery