`count()`

The count() function is used to count the total number of rows in a DataFrame, and it can also be used in combination with groupBy() to count the number of rows in each group.

Create Spark Session and sample DataFrame

from pyspark.sql import SparkSession
# Initialize Spark Sessionspark = SparkSession.builder.appName("countExample").getOrCreate()
# Sample DataFramedata = [("James", "Sales"), ("Ana", "Sales"), ("Robert", "IT"), ("Maria", "IT")]columns = ["Employee Name", "Department"]df = spark.createDataFrame(data, columns)df.show()

Output:
+-------------+----------+
|Employee Name|Department|
+-------------+----------+
|        James|     Sales|
|          Ana|     Sales|
|       Robert|        IT|
|        Maria|        IT|
+-------------+----------+

Example: Use `count()` to count the total number of rows in the DataFrame

total_count = df.count()print("Total Row Count:", total_count)

Output:

Total Row Count: 4

Example: Use `count()` with `groupBy()` to count the number of rows in each group

groupBy("Department"): the DataFrame is grouped by the Department column.
count(): count the number of rows in each department.

# Counting Rows by Groupgrouped_count = df.groupBy("Department").count()grouped_count.show()

Output:
Total Row Count: 4+----------+-----+
|Department|count|
+----------+-----+
|     Sales|    2|
|        IT|    2|
+----------+-----+

# Stop the Spark Sessionspark.stop()

< Previous

Next >

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery

count()

Create Spark Session and sample DataFrame

Example: Use count() to count the total number of rows in the DataFrame

Example: Use count() with groupBy() to count the number of rows in each group

Amazing eBook to learn ggplot2 FAST & EASY

`count()`

Example: Use `count()` to count the total number of rows in the DataFrame

Example: Use `count()` with `groupBy()` to count the number of rows in each group