Lesson 10: Merging & Joining Datasets

1. Why Merge Data?

Real ML projects combine multiple sources (users + purchases + logs).

Pandas merge = SQL JOIN

Types:

Exercise 1

What does pd.merge(df1, df2, on="id", how="inner") do?

2. Merge Examples

# Inner merge on id
pd.merge(users, orders, on="user_id", how="inner")

# Left merge (keep all users)
pd.merge(users, orders, on="user_id", how="left")

# Different column names
pd.merge(df1, df2, left_on="id", right_on="user_id")

# Concat vertically (same columns)
pd.concat([df1, df2], ignore_index=True)

Used in ML: merge features from different tables.

Exercise 2

To keep all rows from df_customers even if no match in df_orders:
pd.merge(df_customers, df_orders, on="customer_id", how="")

Exercise 3

Which are valid merge strategies?
← Previous Lesson (9) Next Lesson (11) →