Find Unique Values in Columns
The unique()
function allows you to identify distinct values within columns. It works seamlessly with both categorical and numerical columns.
This tutorial uses classic Iris dataset, which can be downloaded here Iris dataset.
import pandas as pddf = pd.read_csv('Iris.csv')
1. Find Unique Values in a Categorical Column
In our dataset, the Species variable is categorical. We can use unique()
method to identify the unique species presented in our data.
df['Species'].unique()
Output:
array(['Iris-setosa', 'Iris-versicolor', 'Iris-virginica'], dtype=object)
The result shows that there are 3 distinct species, Iris-setosa, Iris-versicolor and Iris-virginica.
2. Find Unique Values in a Numerical Column
We can also use the unique()
function on numerical columns, although its utility may be somewhat limited. Let's explore the results when we apply this function to the SepalLengthCm column.
df['SepalLengthCm'].unique()
Output:
array([5.1, 4.9, 4.7, 4.6, 5. , 5.4, 4.4, 4.8, 4.3, 5.8, 5.7, 5.2, 5.5,
4.5, 5.3, 7. , 6.4, 6.9, 6.5, 6.3, 6.6, 5.9, 6. , 6.1, 5.6, 6.7,
6.2, 6.8, 7.1, 7.6, 7.3, 7.2, 7.7, 7.4, 7.9])
As observed, the SepalLengthCm column contains several distinct values. To quickly know the count of these unique values quickly, we can use the nunique()
function
df['SepalLengthCm'].nunique()
Output:
35
The results tells us that the SepalLengthCm column has 35 unique numerical values.
3. Find Unique Values and Their Counts
Alternatively, the value_counts
function displays not only all unique values, but also the number of times each value appears in the data:
df['Species'].value_counts()
Output:
Iris-setosa 50
Iris-versicolor 50
Iris-virginica 50
Name: Species, dtype: int64
The result above demonstrates that there are precisely 50 records for each species. This method becomes handy when we want to quickly inspect if the data is balanced in machine learning.