
element_at()
The element_at()
function is used to fetch an element from an array or a map column based on its index or key, respectively.
Usage
- In the case of an array,
element_at()
takes the array column and an integer index as arguments and returns the element at that index. Array indexing starts at 1. - For a map, it takes the map column and the key, returning the value associated with that key.
Create Spark Session
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import element_at
# Initialize Spark Sessionspark = SparkSession.builder.appName("elementAtExample").getOrCreate()
Example: Use element_at()
with Array columns
Create sample DataFrame with Array Columns
array_data = [(["Java", "Python", "C++"],), (["Spark", "Java", "C++"],), (["Python", "Scala"],)]array_columns = ["Languages"]array_df = spark.createDataFrame(array_data, array_columns)array_df.show()
Output:
+-------------------+
| Languages|
+-------------------+
|[Java, Python, C++]|
| [Spark, Java, C++]|
| [Python, Scala]|
+-------------------+
element_at_array_df = array_df.withColumn("SecondLanguage", element_at(array_df.Languages, 2))element_at_array_df.show(truncate=False)
Output:
+-------------------+--------------+
|Languages |SecondLanguage|
+-------------------+--------------+
|[Java, Python, C++]|Python |
|[Spark, Java, C++] |Java |
|[Python, Scala] |Scala |
+-------------------+--------------+
element_at(array_df.Languages, 2)
: this retrieves the second element from each array in the Languages column.
Example: Use element_at()
with Map Columns
Create sample DataFrame with Map Columns
map_data = [({"Java": "JVM", "Python": "CPython"},), ({"C++": "GCC", "Java": "OpenJDK"},)]map_columns = ["LanguageMap"]map_df = spark.createDataFrame(map_data, map_columns)map_df.show(truncate=False)
Output:
+--------------------------------+
|LanguageMap |
+--------------------------------+
|{Java -> JVM, Python -> CPython}|
|{Java -> OpenJDK, C++ -> GCC} |
+--------------------------------+
element_at_map_df = map_df.withColumn("JavaPlatform", element_at(map_df.LanguageMap, "Java"))element_at_map_df.show(truncate=False)
Output:
+--------------------------------+------------+
|LanguageMap |JavaPlatform|
+--------------------------------+------------+
|{Java -> JVM, Python -> CPython}|JVM |
|{Java -> OpenJDK, C++ -> GCC} |OpenJDK |
+--------------------------------+------------+
element_at(map_df.LanguageMap, "Java")
: this retrieves the value associated with the key "Java" from each map in the LanguageMap column.
# Stop the Spark Sessionspark.stop()