Compute Numerical Statistics: Min, Max, Sum, Mean and More
When working with numerical data, it is a common practice to calculate statistics that provide aggregated insights for business reports. In this tutorial, we will guide you through the process of calculating various numerical statistics for a numeric column, including the minimum, maximum, sum, mean, count, and additional measures.
This tutorial uses classic Iris dataset, which can be downloaded here Iris dataset.
import pandas as pddf = pd.read_csv('Iris.csv')
1. Find Minimum Value via min()
Find min of one numerical column
df['SepalLengthCm'].min()
Output:
4.3
Find min of multiple numerical columns
df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].min()
Output:
SepalLengthCm 4.3
SepalWidthCm 2.0
PetalLengthCm 1.0
PetalWidthCm 0.1
dtype: float64
2. Find Maximum Value via max()
Find max of one numerical column
df['SepalLengthCm'].max()
Output:
7.9
Find max of multiple numerical columns
df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].max()
Output:
SepalLengthCm 7.9
SepalWidthCm 4.4
PetalLengthCm 6.9
PetalWidthCm 2.5
dtype: float64
Calculate Mean Value via mean()
Calculate mean one numerical column
df['SepalLengthCm'].mean()
Output:
5.843333333333335
Calculate mean values of multiple numerical columns
df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].mean()
Output:
SepalLengthCm 5.843333
SepalWidthCm 3.054000
PetalLengthCm 3.758667
PetalWidthCm 1.198667
dtype: float64
3. Calculate Sum via sum()
Calculate sum of one numerical column
df['SepalLengthCm'].sum()
Output:
876.5
Calculate sum of multiple numerical columns
df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].sum()
Output:
SepalLengthCm 876.5
SepalWidthCm 458.1
PetalLengthCm 563.8
PetalWidthCm 179.8
dtype: float64
4. Calculate Counts via count()
Calculate total number of records of the data
df.count()
returns the number of records of each column in the dataset.
df.count()
Output:
Id 150
SepalLengthCm 150
SepalWidthCm 150
PetalLengthCm 150
PetalWidthCm 150
Species 150
dtype: int64
5. More Statistic Metrics
In addition to the statistics above, other metrics offered by pandas are median (median()
), mode (mode()
), varaince(var()
), standard deviation (std()
), kurtosis (kurt()
) and skewness (skew()
).
Quick examples
Calculate Median of Multiple Columns via median()
df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].median()
Output:
SepalLengthCm 5.80
SepalWidthCm 3.00
PetalLengthCm 4.35
PetalWidthCm 1.30
dtype: float64
Calculate Variance of Multiple Columns via var()
df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].var()
Output:
SepalLengthCm 0.685694
SepalWidthCm 0.188004
PetalLengthCm 3.113179
PetalWidthCm 0.582414
dtype: float64
Calculate Standard Deviation of Multiple Columns via std()
df[['SepalLengthCm', 'SepalWidthCm', 'PetalLengthCm', 'PetalWidthCm']].std()
Output:
SepalLengthCm 0.828066
SepalWidthCm 0.433594
PetalLengthCm 1.764420
PetalWidthCm 0.763161
dtype: float64
Excellent! Now that we've explored the methods for calculating statistics, we'll shift our focus towards learning methods of finding unique values.