
rand()
The rand()
function is used to generate a column of random numbers. Each row in the DataFrame will have a randomly generated number that follows a uniform distribution between 0 and 1.
Usage
rand()
can be used without any arguments to generate random numbers.- It is often used in scenarios requiring random sampling, testing, or data anonymization.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import rand
# Initialize Spark Sessionspark = SparkSession.builder.appName("randExample").getOrCreate()
# Sample DataFramedata = [("James",), ("Anna",), ("Robert",)]columns = ["Name"]df = spark.createDataFrame(data, columns)df.show()
Output:
+------+
| Name|
+------+
| James|
| Anna|
|Robert|
+------+
Example: Use rand()
to Add Random Numbers
random_df = df.withColumn("Random_Number", rand())random_df.show()
Output:
+------+-------------------+
| Name| Random_Number|
+------+-------------------+
| James| 0.7539710459762303|
| Anna| 0.964547021232646|
|Robert|0.36741199389216905|
+------+-------------------+
# Stop the Spark Sessionspark.stop()