Lesson 6: Handling Missing Values – Detection & Strategies

1. Why Missing Values Matter

Missing data (NaN) is common in real datasets — sensors fail, surveys skip questions.

ML models usually cannot handle NaN → must fix before training.

Common causes: data entry errors, non-response, sensor failure.

2. Detecting Missing Values

df.isnull().sum()          # count NaN per column
df.isna().sum()             # same as isnull()
df.notnull().sum()          # non-missing count
df[df["age"].isnull()]      # rows with missing age

Visual: import seaborn as sns; sns.heatmap(df.isnull())

Exercise 1

What does df.isnull().sum() return?

3. Strategies to Handle Missing Data

Rule: never just fill with 0 without thinking — can distort data.

Exercise 2

To fill missing ages with column mean:
df["age"] = df["age"].fillna(df["age"].())

Exercise 3

When might you drop rows with missing values?
← Previous Lesson (5) Next Lesson (7) →