`stddev_samp()`

The stddev_samp() function calculates the sample standard deviation of a given numeric column in a DataFrame. This statistical measure is used to quantify the amount of variation or spread in a set of data values.

Usage

stddev_samp() computes the standard deviation using the formula for a sample of the population, which is useful for inferential statistics.
The function is particularly helpful in data analysis for understanding the variability of data.

Create Spark Session and sample DataFrame

from pyspark.sql import SparkSessionfrom pyspark.sql.functions import stddev_samp
# Initialize Spark Sessionspark = SparkSession.builder.appName("stddevSampExample").getOrCreate()
# Sample DataFramedata = [("James", 23), ("Anna", 30), ("Robert", 34), ("Maria", 29)]columns = ["Name", "Age"]df = spark.createDataFrame(data, columns)df.show()

Output:
+------+---+
|  Name|Age|
+------+---+
| James| 23|
|  Anna| 30|
|Robert| 34|
| Maria| 29|
+------+---+

Example: Use `stddev_samp()` to compute standard deviation of values in a column

stddev_samp("Age"): it computes the standard deviation of values in the Age column.
alias("Sample StdDev Age"): it renames the resulting column as Sample StdDev Age.

stddev_age_df = df.select(stddev_samp("Age").alias("Sample StdDev Age"))stddev_age_df.show()

Output:
+-----------------+
|Sample StdDev Age|
+-----------------+
|4.546060565661952|
+-----------------+

# Stop the Spark Sessionspark.stop()

< Previous

Next >

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery

stddev_samp()

Usage

Create Spark Session and sample DataFrame

Example: Use stddev_samp() to compute standard deviation of values in a column

Amazing eBook to learn ggplot2 FAST & EASY

`stddev_samp()`

Example: Use `stddev_samp()` to compute standard deviation of values in a column