Save PySpark DataFrame to Files

This tutorial explains how to save PySpark DataFrames to various file types using the spark.write() function.

We'll covers the following topics:

Functions for Writing to Different File Types
Different Modes of File Writing
Examples

Functions for Writing to Different File Type

File Type	Function
CSV	`spark.write.csv(path, options)`
JSON	`spark.write.json(path, options)`
Parquet	`spark.write.parquet(path)`
ORC	`spark.write.orc(path)`
Text	`spark.write.text(path)`

Different Modes of File Writing

The mode() method specifies how data will be written to the target location. There are four commonly used modes when writing to file: overwrite, append, ignore, and errorIfExists.

Mode	Usage
overwrite	Overwrite existing data
append	Append new data to existing data
ignore	Does not write the new data if the target location already exists
errorIfExists	Raises an error if the target location already exists

Create Spark Session and Sample DataFrame

from pyspark.sql import SparkSession
# Initialize Spark Sessionspark = SparkSession.builder.appName("app").getOrCreate()
# Create a sample DataFramedata = [("James", "Sales", 3000), ("Michael", "Sales", 4600)]columns = ["Employee Name", "Department", "Salary"]df = spark.createDataFrame(data, columns)df.show()

Output:
+-------------+----------+------+
|Employee Name|Department|Salary|
+-------------+----------+------+
|        James|     Sales|  3000|
|      Michael|     Sales|  4600|
+-------------+----------+------+

Save DataFrame to CSV

df.write.csv(path="path/to/save/csv_file", header=True, mode="append")

Alternatively, we can put the options ouside the csv() function using mode() and option():

df.write.mode("append").option("header", "true").csv(path="path/to/save/csv_file")

Save DataFrame to Parquet

df.write.parquet(path="path/to/save/file",compression="snappy", mode="overwrite")

Similarly, you can also write like:

df.write.mode("overwrite").option("compression", "snappy").parquet(path="path/to/save/csv_file")

< Previous

Next >

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery