
cast()
The cast()
function is a type conversion function that changes the data type of a DataFrame column to a specified type, such as converting a string to an integer, a float to a double, etc.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import col
# Initialize Spark Sessionspark = SparkSession.builder.appName("castExample").getOrCreate()
# Sample DataFrame with String Typedata = [("1", ), ("2",), ("3",)]columns = ["String_Value"]df = spark.createDataFrame(data, columns)df.show()
Output:
+------------+
|String_Value|
+------------+
| 1|
| 2|
| 3|
+------------+
Example: Cast a String type column to an Integer type column
cast_df = df.withColumn("Integer_Value", col("String_Value").cast("integer"))cast_df.show()
Output:
+------------+-------------+
|String_Value|Integer_Value|
+------------+-------------+
| 1| 1|
| 2| 2|
| 3| 3|
+------------+-------------+
col("String_Value")
:col()
function is used to reference the String_Value column.cast("integer")
: thecast()
function takes the String_Value column and cast each String type value to Integer type.withColumn("Integer_value", )
: this function creates a new column to the DataFrame df named "Integer_Value".
Example: Cast a String type column to a Date type column
Create a new sample DataFrame
data = [("2023-01-01", ), ("2023-02-01",), ("2023-03-01",)]columns = ["String_Value"]df = spark.createDataFrame(data, columns)df.show()
Output:
+------------+
|String_Value|
+------------+
| 2023-01-01|
| 2023-02-01|
| 2023-03-01|
+------------+
Cast "String_Value" column to a Date type column
df.withColumn("DateValue", col("String_Value").cast("date")).show()
Output:
+------------+----------+
|String_Value| DateValue|
+------------+----------+
| 2023-01-01|2023-01-01|
| 2023-02-01|2023-02-01|
| 2023-03-01|2023-03-01|
+------------+----------+
col("String_Value")
:col()
function is used to reference the String_Value column.cast("integer")
: thecast()
function takes the String_Value column and cast each String type value to Date type.withColumn
: this function creates a new column to the DataFrame df named "DateValue".
# Stop the Spark Sessionspark.stop()