Replace Values with Different Values
When you need to replace values within a DataFrame, the replace()
function is a key tool. It enables you to swap specific values with different ones, facilitating tasks like data cleaning and standardization. In our upcoming tutorial, we'll explore how to use replace()
effectively for value replacement in your data.
Usage
replace(to_replace, value)
:
- to_replace: This parameter represents the current value in the dataset that you wish to substitute.
- value: This parameter signifies the replacement value you want to use.
This tutorial uses classic Iris dataset, which can be downloaded here Iris dataset.
import pandas as pddf = pd.read_csv('Iris.csv')
1. Replace One Value with replace()
Function
We'll replace "Iris-setosa" with "IrisSetosa" in the Species column, and assign the result to a new variable df_after.
df_after = df.replace(to_replace='Iris-setosa', value='IrisSetosa')
df_after.head(5)
Output:
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | IrisSetosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | IrisSetosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | IrisSetosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | IrisSetosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | IrisSetosa |
As shown above, all "Iris-setosa" of the Species column has now been changed to "IrisSetosa".
2. Replace Multiple Values with replace()
Function
In this example, we'll replace both the Iris-setosa value to IrisSetosa and Iris-virginica to IrisVirginica.
to_replace
: In our case, this parameter receives a list containing Iris-setosa and Iris-virginica.value
: this parameter receives a list containing IrisSetosa and IrisVirginica.
df_after = df.replace(to_replace=['Iris-setosa', 'Iris-virginica'], value=['IrisSetosa', 'IrisVirginica'])
df_after.head(3)
Output:
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | IrisSetosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | IrisSetosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | IrisSetosa |
df_after.tail(3)
Output:
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
147 | 148 | 6.5 | 3.0 | 5.2 | 2.0 | IrisVirginica |
148 | 149 | 6.2 | 3.4 | 5.4 | 2.3 | IrisVirginica |
149 | 150 | 5.9 | 3.0 | 5.1 | 1.8 | IrisVirginica |
Both Iris-setosa and Iris-virginica have now been changed to IrisSetosa and IrisVirginica respectively.
Great! Next we'll explore how to rename columns.