Delete Columns and Rows
Excluding specific columns and rows is a common requirement in data wrangling. This may involve removing columns that lack relevance, eliminating columns with too many missing data, or rows containing outliers. In this tutorial, we'll explore the drop()
function for removing unwanted columns or rows of a DataFrame.
Syntax
drop(labels=None, axis=0, index=None, columns=None, inplace=False)
:
- labels: This parameter, in combination with the axis parameter, determines which columns or rows to remove. You can provide either column labels or row indexes as a single value or a list of values.
- axis: Use this parameter with a value of 0 or 1 to specify whether you want to delete from rows (axis=0) or columns (axis=1).
- index: Specify the row indexes you want to remove using this parameter. Using this parameter is equivalent to using
labels and axis=0
. Check for example below. - columns: Specify the column label you want to remove using this parameter. Using this parameter is equivalent to using
labels and axis=1
. Check example below. - inplace: This parameter accepts a boolean value. When set to False, the
drop()
function returns a copy of the modified data. If set to True, the function removes columns or rows in place and return None.
This tutorial uses classic Iris dataset, which can be downloaded here Iris dataset.
import pandas as pddf = pd.read_csv('Iris.csv')
1. Drop a Single Column from a DataFrame
First, let's use the labels and axis arguments to drop the SepalLengthCm column from the data.
df.drop(labels='SepalLengthCm', axis=1).head(3)
Output:
Id | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|
0 | 1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 3.2 | 1.3 | 0.2 | Iris-setosa |
The function returns a dataframe without the dropped SepalLengthCm column.
We can also use the columns argument to drop the same column. When using the column argument, it is equivalent to using labels and axis=1 arguments together.
df.drop(columns='SepalLengthCm').head(3)
Output:
Id | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|
0 | 1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 3.2 | 1.3 | 0.2 | Iris-setosa |
2. Drop Multiple Columns
To drop multiple columns, we just need to input a list of column labels intended to drop.
Use labels
and axis
arguments
We'll drop both SepalLengthCm and SepalWidthCm columns.
df.drop(labels=['SepalLengthCm', 'SepalWidthCm'], axis=1).head(3)
Output:
Id | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|
0 | 1 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 1.3 | 0.2 | Iris-setosa |
Use columns
argument
Use the columns
parameter has the same effect of using the labels
and axis=1
parameters.
df.drop(columns=['SepalLengthCm', 'SepalWidthCm']).head(3)
Output:
Id | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|
0 | 1 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 1.3 | 0.2 | Iris-setosa |
3. Drop a Single Row from a DataFrame
Let's see some examples of how to drop rows.
Use labels
and axis
We're going to drop the first row of the data.
df.drop(labels=0, axis=0).head(3)
Output:
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
The default index of a Pandas DataFrame consists of integer numbers starting from 0. Thus, to remove the first row, we specify the index of the first row (0) in the labels argument, along with the axis=0 argument.
After dropping the first row, the row index of the resulting dataframe starts from 1.
Use index
argument
To achieve the same outcome of removing the first row of the data, you may alternatively use the index=0
parameter.
df.drop(index=0).head(3)
Output:
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
Great job! In this tutorial, we've learned the method of dropping columns and rows. In the next tutorial, we'll dive into the way of finding and removing duplicates.