
to_date()
The to_date()
function is used to convert a string column to a date column. to_date()
converts a string column to a date column, following the default or specified date format.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import to_date
# Initialize Spark Sessionspark = SparkSession.builder.appName("toDateExample").getOrCreate()
# Sample DataFrame with Date Stringsdata = [("2021-01-01",), ("2021-06-24",), ("07/11/2021",)]columns = ["DateString"]df = spark.createDataFrame(data, columns)df.show()
Output:
+----------+
|DateString|
+----------+
|2021-01-01|
|2021-06-24|
|07/11/2021|
+----------+
Example: Use to_date()
to converting a string type column to Date type column
- The DataFrame df contains a column DateString with date information in string format.
to_date(df.DateString, "yyyy-MM-dd")
: it converts the DateString column to a Date type using the specified formatyyyy-MM-dd
. However, the value in the third row "07/11/2021" doesn't match the specified date format, thus it returnsNull
in the new column.- The converted dates are stored in a new column Date.
date_df = df.withColumn("Date", to_date(df.DateString, "yyyy-MM-dd"))date_df.show()
Output:
+----------+----------+
|DateString| Date|
+----------+----------+
|2021-01-01|2021-01-01|
|2021-06-24|2021-06-24|
|07/11/2021| NULL|
+----------+----------+
# Stop the Spark Sessionspark.stop()