Get Ready for the Data Wrangling Tutorial
To better prepare ourselves for following courses, we need to set up our environment and get dataset ready.
1. Download sample dataset
We'll be using the classic machine learning dataset named Iris for exploring data wrangling in Python. Click here to download the dataset.
2. Install Python
To proceed with this tutorial, ensure that Python is installed in your environment. If you haven't already installed Python, you can follow this guide: How to Install Python.
3. Install pandas library
To use the pandas library, we'll need to install it within Python environment. Running the following code in either terminal or Jupyter Notebook:
pip install pandas
4. Import pandas library
In your IDE or Jupyter notebook, initiate a new Python script and input the following code to import the pandas package.
import pandas as pd
5. Load data as a pandas dataframe
To initiate your exploration of the Iris data, execute the following code snippet. Be sure to substitute Iris.csv
with the accurate file path reflecting where you've saved the Iris dataset you downloaded:
df = pd.read_csv('Iris.csv')
6. Check if the dataset loaded successfully
Once you've effectively imported the Iris data into a Pandas DataFrame, confirm this by invoking df.head()
, returning the initial 5 rows of the dataset as illustrated below:
df.head()
Output:
Id | SepalLengthCm | SepalWidthCm | PetalLengthCm | PetalWidthCm | Species | |
---|---|---|---|---|---|---|
0 | 1 | 5.1 | 3.5 | 1.4 | 0.2 | Iris-setosa |
1 | 2 | 4.9 | 3.0 | 1.4 | 0.2 | Iris-setosa |
2 | 3 | 4.7 | 3.2 | 1.3 | 0.2 | Iris-setosa |
3 | 4 | 4.6 | 3.1 | 1.5 | 0.2 | Iris-setosa |
4 | 5 | 5.0 | 3.6 | 1.4 | 0.2 | Iris-setosa |
Great job!🚀 You've successfully completed setting up the environment. Continue reading to dive deeper into exploration!