
ltrim()
, rtrim()
and trim()
The ltrim()
and rtrim()
functions are used to remove leading (left-side) whitespaces and tailing (right-side) whitespaces respectively from each string in a column.
Create Spark Session and sample DataFrame
from pyspark.sql import SparkSessionfrom pyspark.sql.functions import ltrim, rtrim, trim
# Initialize Spark Sessionspark = SparkSession.builder.appName("ltrimExample").getOrCreate()
# Sample DataFrame with String Columndata = [(" James",), (" Anna ",), ("Robert ",)]columns = ["Name"]df = spark.createDataFrame(data, columns)df.show()
Output:
+--------+
| Name|
+--------+
| James|
| Anna |
|Robert |
+--------+
Example: Use ltrim()
to Remove Leading Whitespaces
ltrim(df.Name)
: Removes any leading whitespaces from the Name column in the DataFrame df.- The cleaned, trimmed string values are stored in a new column leftTrim.
df.withColumn("leftTrim", ltrim(df.Name)).show(truncate=False)
Output:
+--------+--------+
|Name |leftTrim|
+--------+--------+
| James |James |
| Anna |Anna |
|Robert |Robert |
+--------+--------+
Example: Use rtrim()
to Remove Tailing Whitespaces
rtrim(df.Name)
: Removes any tailing whitespaces from the Name column in the DataFrame df.- The cleaned, trimmed string values are stored in a new column rightTrim.
df.withColumn("rightTrim", rtrim(df.Name)).show(truncate=False)
Output:
+--------+---------+
|Name |rightTrim|
+--------+---------+
| James | James |
| Anna | Anna |
|Robert |Robert |
+--------+---------+
Example: Use trim()
to Remove Whitespaces from both sides
trim(df.Name)
: Removes any whitespaces on both sides from the Name column in the DataFrame df.- The cleaned, trimmed string values are stored in a new column trim.
df.withColumn("trim", trim(df.Name)).show(truncate=False)
Output:
+--------+------+
|Name |trim |
+--------+------+
| James |James |
| Anna |Anna |
|Robert |Robert|
+--------+------+
# Stop the Spark Sessionspark.stop()