Transform Data with Custom Functions
In this lesson, we will explore the use of custom functions to perform data transformations on columns.
In Pandas, the apply()
and applymap()
functions allow you to execute your own functions on each element within a column or mutliple columns. This powerful tool enables you to flexibly manipulate data.
This tutorial uses classic Iris dataset, which can be downloaded here Iris dataset.
import pandas as pddf = pd.read_csv('Iris.csv')
1. Use Custom Function to Convert Float Values to Integers via apply()
Function
Let's explore an example of converting the float values in the SepallengthCm column to integers using the apply()
function.
First, we'll begin by creating a custom function called toInteger()
. This function converts a numerical value into an integer using the int()
function.
def toInteger(x): return int(x)
Next, we'll employ the apply()
function and provide our custom function as an argument. It's important to note that the apply()
function is applicable exclusively to Series and not DataFrames. Thus, we'll extract the desired Series by using df['SepalLengthCm']
, and then use the apply()
function on it.
df['SepalLengthCm'].apply(func=toInteger).head(5)
Output:
0 5
1 4
2 4
3 4
4 5
Name: SepalLengthCm, dtype: int64
The values in the SepalLengthCm column has now been converted to integers.
To apply the toIntegers
function to multile columns, we should use the applymap()
method, which is specifically designed for DataFrames:
df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].applymap(toInteger).head(3)
Output:
SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | |
---|---|---|---|---|
0 | 5 | 3 | 1 | 0 |
1 | 4 | 3 | 1 | 0 |
2 | 4 | 3 | 1 | 0 |
2. Use Custom Function to Convert Texts to Uppercase via apply()
Function
In this example, we will demonstrate how to convert texts within the Species column to uppercase letters.
We'll begin with creating a custom function called upperCase
. This function will take string values as input and transform them into uppercase texts.
def upperCase(x): return x.upper()
Next, we use the apply
function to transform the text in the Species column into uppercase.
df['Species'] = df['Species'].apply(upperCase)df.head(5)
Output:
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | IRIS-SETOSA |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | IRIS-SETOSA |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | IRIS-SETOSA |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | IRIS-SETOSA |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | IRIS-SETOSA |
As shown above, all texts in the Species column have been converted to uppercase letters.
Excellent! Once we've mastered the use of the apply()
function, it becomes straightforward to perform customized transformations on columns. In the next tutorial, we'll learn methods of merging DataFrames. Stay tuned!