Find Unique Values in Columns

The unique() function allows you to identify distinct values within columns. It works seamlessly with both categorical and numerical columns.

This tutorial uses classic Iris dataset, which can be downloaded here Iris dataset.

import pandas as pddf = pd.read_csv('Iris.csv')

1. Find Unique Values in a Categorical Column

In our dataset, the Species variable is categorical. We can use unique() method to identify the unique species presented in our data.

df['Species'].unique()

Output:
array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)

The result shows that there are 3 distinct species, Iris-setosa, Iris-versicolor and Iris-virginica.

2. Find Unique Values in a Numerical Column

We can also use the unique() function on numerical columns, although its utility may be somewhat limited. Let's explore the results when we apply this function to the SepalLengthCm column.

df['SepalLengthCm'].unique()

Output:

array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.4, 4.8, 4.3, 5.8, 5.7, 5.2, 5.5, 4.5, 5.3, 7. , 6.4, 6.9, 6.5, 6.3, 6.6, 5.9, 6. , 6.1, 5.6, 6.7, 6.2, 6.8, 7.1, 7.6, 7.3, 7.2, 7.7, 7.4, 7.9])

As observed, the SepalLengthCm column contains several distinct values. To quickly know the count of these unique values quickly, we can use the nunique() function

df['SepalLengthCm'].nunique()

Output:

35

The results tells us that the SepalLengthCm column has 35 unique numerical values.

3. Find Unique Values and Their Counts

Alternatively, the value_counts function displays not only all unique values, but also the number of times each value appears in the data:

df['Species'].value_counts()

Output:

Iris-setosa 50 Iris-versicolor 50 Iris-virginica 50 Name: Species, dtype: int64

The result above demonstrates that there are precisely 50 records for each species. This method becomes handy when we want to quickly inspect if the data is balanced in machine learning.

Amazing eBook to learn ggplot2 FAST & EASY

book cover for sliding your way to ggplot2 mastery