`explode()`

The explode() function is used to convert each element in an array or each key-value pair in a map into a separate row. This transformation is particularly useful for flattening complex nested data structures in DataFrames.

Usage

explode() is applied to an array or map column.
In the case of an array, each element becomes a new row.
For a map, each key-value pair is turned into a new row.

Create Spark Session

from pyspark.sql import SparkSessionfrom pyspark.sql.functions import explode
# Initialize Spark Sessionspark = SparkSession.builder.appName("explodeExample").getOrCreate()

Example: Use `explode()` with Array columns

Create a sample DataFrame with an Array column

array_data = [(1, ["Java", "Python", "C++"]), (2, ["Spark", "Java", "C++"]), (3, ["Python", "Scala"])]array_columns = ["Id", "Languages"]array_df = spark.createDataFrame(array_data, array_columns)array_df.show()

Output:
+---+-------------------+
| Id|          Languages|
+---+-------------------+
|  1|[Java, Python, C++]|
|  2| [Spark, Java, C++]|
|  3|    [Python, Scala]|
+---+-------------------+

Use `explode()` on the language column

exploded_df = array_df.select(array_df.Id, explode(array_df.Languages).alias("Language"))exploded_df.show()

Output:
+---+--------+
| Id|Language|
+---+--------+
|  1|    Java|
|  1|  Python|
|  1|     C++|
|  2|   Spark|
|  2|    Java|
|  2|     C++|
|  3|  Python|
|  3|   Scala|
+---+--------+

explode(array_df.Languages): this transforms each element in the Languages Array column into a separate row.
The Id column is retained for each exploded row, and the new Language column contains the individual elements from the arrays.

Example: Use `explode()` with Map columns

Create a sample DataFrame with an Map column

map_data = [(1, {"Java": "JVM", "Python": "CPython"}),            (2, {"C++": "GCC", "Java": "OpenJDK"}),            (3, {"Python": "PyPy", "Scala": "JVM"})]map_columns = ["Id", "LanguageMap"]map_df = spark.createDataFrame(map_data, map_columns)map_df.show(truncate=False)

Output:
+---+--------------------------------+
|Id |LanguageMap                     |
+---+--------------------------------+
|1  |{Java -> JVM, Python -> CPython}|
|2  |{Java -> OpenJDK, C++ -> GCC}   |
|3  |{Scala -> JVM, Python -> PyPy}  |
+---+--------------------------------+

exploded_map_df = map_df.select(map_df.Id, explode(map_df.LanguageMap).alias("Language", "Platform"))exploded_map_df.show()

Output:
+---+--------+--------+
| Id|Language|Platform|
+---+--------+--------+
|  1|    Java|     JVM|
|  1|  Python| CPython|
|  2|    Java| OpenJDK|
|  2|     C++|     GCC|
|  3|   Scala|     JVM|
|  3|  Python|    PyPy|
+---+--------+--------+

explode(map_df.LanguageMap): this transforms each key-value pair in the LanguageMap column into separate rows.
The resulting DataFrame has three columns: Id, Language (the key), and Platform (the value).

# Stop the Spark Sessionspark.stop()

< Previous

Next >

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery

explode()