Concatenate DataFrames
Concatenating DataFrames is an alternative method for merging DataFrames. Unlike joining, it doesn't require shared common columns between two DataFrames.
This tutorial uses classic Iris dataset, which can be downloaded here Iris dataset.
import pandas as pddf = pd.read_csv('Iris.csv')
We will split the Iris data into two DataFrames. The first DataFrame, named df1 will consist of the first 75 rows of records, while the remaining data will be assigned to df2. In both DataFrames, we will retain only the first two columns.
df1 = df.loc[:74, ['Id', 'SepalLengthCm']].reset_index(drop=True)df1
Output:
Id | SepalLengthCm | |
---|---|---|
0 | 1 | 5.1 |
1 | 2 | 4.9 |
2 | 3 | 4.7 |
3 | 4 | 4.6 |
4 | 5 | 5.0 |
... | ... | ... |
70 | 71 | 5.9 |
71 | 72 | 6.1 |
72 | 73 | 6.3 |
73 | 74 | 6.1 |
74 | 75 | 6.4 |
75 rows × 2 columns
We apply the reset_index(drop=True)
method to df2
to remove its index inherited from df
. This ensures that there is no interference with the performance of the pd.concat()
function in our subsequent examples.
df2 = df.loc[75:, ['Id','SepalLengthCm']].reset_index(drop=True)df2
Output:
Id | SepalLengthCm | |
---|---|---|
0 | 76 | 6.6 |
1 | 77 | 6.8 |
2 | 78 | 6.7 |
3 | 79 | 6.0 |
4 | 80 | 5.7 |
... | ... | ... |
70 | 146 | 6.7 |
71 | 147 | 6.3 |
72 | 148 | 6.5 |
73 | 149 | 6.2 |
74 | 150 | 5.9 |
75 rows × 2 columns
1. Combine the Two DataFrames Vertically (along rows)
To merge df1 and df2 vertically, use the pd.concat()
function and set the parameter axis=0
. This operation will result in a total of 150 rows of records after concatenation.
df_vertical = pd.concat([df1, df2], axis=0)df_vertical
Output:
Id | SepalLengthCm | |
---|---|---|
0 | 1 | 5.1 |
1 | 2 | 4.9 |
2 | 3 | 4.7 |
3 | 4 | 4.6 |
4 | 5 | 5.0 |
... | ... | ... |
70 | 146 | 6.7 |
71 | 147 | 6.3 |
72 | 148 | 6.5 |
73 | 149 | 6.2 |
74 | 150 | 5.9 |
150 rows × 2 columns
2. Concatenate the Two DataFrames Horizontally (along columns)
To combine df1 and df2 horizontally, use the pd.concat()
function and set the parameter axis=1
.
df_horizontal = pd.concat([df1, df2], axis=1)df_horizontal
Output:
Id | SepalLengthCm | Id | SepalLengthCm | |
---|---|---|---|---|
0 | 1 | 5.1 | 76 | 6.6 |
1 | 2 | 4.9 | 77 | 6.8 |
2 | 3 | 4.7 | 78 | 6.7 |
3 | 4 | 4.6 | 79 | 6.0 |
4 | 5 | 5.0 | 80 | 5.7 |
... | ... | ... | ... | ... |
70 | 71 | 5.9 | 146 | 6.7 |
71 | 72 | 6.1 | 147 | 6.3 |
72 | 73 | 6.3 | 148 | 6.5 |
73 | 74 | 6.1 | 149 | 6.2 |
74 | 75 | 6.4 | 150 | 5.9 |
75 rows × 4 columns
This operation results in a new DatFrame with 4 columns and 75 rows, as the df1 and df2 have been combined horizontally.
Excellent job! We have successfully covered the techniques for concatenating DataFrames, concluding our Data Wrangling series. Nevertheless, it's important to note that this is just a fraction of what Python offers in the realm of data analysis and data science. Please explore our other tutorials on this site for further insights. We believe that you have found value in your learning journey with us. Reward yourself with some snacks to acknowledge and celebrate your dedicated efforts! 🍦🍫🍦