
show()
show()
is a helpful method for visually representing a Spark DataFrame in tabular format within the console. This is especially useful for swiftly inspecting data.
In the context of Databricks, there's another method called display()
that can be utilized to exhibit DataFrame content. It's worth noting that display()
is a Databricks specific funtion, while show()
is an integral component of the standard Spark DataFrame API.
Usage of show()
- Syntax:
DataFrame.show(n=20, truncate=True, vertical=False)
n
: Number of rows to display (default is 20).truncate
: Whether to truncate long strings (default isTrue
).vertical
: Display DataFrame in a vertical format (default isFalse
).
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSession
# Initialize Spark Sessionspark = SparkSession.builder.appName("showExample").getOrCreate()
# Sample DataFramedata = [("James", 34, 'james12345@gmail.com'), ("Anna", 28, 'ann123455@gmail.com'), ("Robert", 45, 'robert23141243@gmail.com')]columns = ["Name", "Age", "Email"]df = spark.createDataFrame(data, columns)
Example: truncate = True vs truncate = False
See the difference in the display of the third person's email.
# truncate is default to Truedf.show()
# set truncate to Falsedf.show(truncate=False)
Output:
+------+---+--------------------+
| Name|Age| Email|
+------+---+--------------------+
| James| 34|james12345@gmail.com|
| Anna| 28| ann123455@gmail.com|
|Robert| 45|robert23141243@gm...|
+------+---+--------------------+
+------+---+------------------------+
|Name |Age|Email |
+------+---+------------------------+
|James |34 |james12345@gmail.com |
|Anna |28 |ann123455@gmail.com |
|Robert|45 |robert23141243@gmail.com|
+------+---+------------------------+
# Stop the Spark Sessionspark.stop()