
array_contains()
The array_contains()
function is used to determine if an array column in a DataFrame contains a specific value. It returns a Boolean column indicating the presence of the element in the array.
Usage
array_contains()
takes two arguments: the array column and the value to check for.- It is commonly used in filtering operations or when analyzing the composition of array data.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import array_contains(), col
# Initialize Spark Sessionspark = SparkSession.builder.appName("arrayContainsExample").getOrCreate()
# Sample DataFrame with Array Columndata = [(["Java", "Python", "C++"],), (["Spark", "Java", "C++"],), (["Python", "Scala"],)]columns = ["Languages"]df = spark.createDataFrame(data, columns)df.show()
Output:
+-------------------+
| Languages|
+-------------------+
|[Java, Python, C++]|
| [Spark, Java, C++]|
| [Python, Scala]|
+-------------------+
Example: Use array_contains()
to return a Boolean column
array_contains(col("Languages"), "Python")
: the function takes the Languages column, and will return a Boolean array indicating the presence of word "Python".alias("containsPython")
: it renames the resulted Boolean column to "containsPython".df.select()
: it selects the Languages column and the Boolean column returned by thearrary_contains()
function.
df.select(col("Languages"), array_contains(col("Languages"), "Python").alias("containsPython")).show()
Output:
+-------------------+--------------+
| Languages|containsPython|
+-------------------+--------------+
|[Java, Python, C++]| true|
| [Spark, Java, C++]| false|
| [Python, Scala]| true|
+-------------------+--------------+
Example: Use array_contains()
to filter out rows that contains the word "Python"
df.filter()
: it filters out rows containing the word "Python" based on the Boolean column returned by thearray_contains()
function.
df.filter(array_contains(col("Languages"), "Python")).show(truncate=False)
Output:
+-------------------+
|Languages |
+-------------------+
|[Java, Python, C++]|
|[Python, Scala] |
+-------------------+
# Stop the Spark Sessionspark.stop()