
current_date()
The current_date()
function is used to obtain the current date according to the system on which Spark is running.
Usage
current_date()
does not take any arguments and returns the current system date.- It can be used to add a date column to a DataFrame with the current date.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import current_date
# Initialize Spark Sessionspark = SparkSession.builder.appName("currentDateExample").getOrCreate()
# Sample DataFramedata = [("James",), ("Anna",), ("Robert",)]columns = ["Name"]df = spark.createDataFrame(data, columns)df.show()
Output:
+------+
| Name|
+------+
| James|
| Anna|
|Robert|
+------+
Example: Add Current Date to DataFrame
- The DataFrame df initially contains only a Name column.
withColumn("Current Date", current_date())
: This line of code adds a new column, Current Date, to the DataFrame, containing the current date on each row.- The function
current_date()
automatically retrieves the current system date.
df_with_current_date = df.withColumn("Current Date", current_date())df_with_current_date.show()
Output:
+------+------------+
| Name|Current Date|
+------+------------+
| James| 2023-11-28|
| Anna| 2023-11-28|
|Robert| 2023-11-28|
+------+------------+
# Stop the Spark Sessionspark.stop()