Missing data (NaN) is common in real datasets — sensors fail, surveys skip questions.
ML models usually cannot handle NaN → must fix before training.
Common causes: data entry errors, non-response, sensor failure.
df.isnull().sum() # count NaN per column
df.isna().sum() # same as isnull()
df.notnull().sum() # non-missing count
df[df["age"].isnull()] # rows with missing age
Visual: import seaborn as sns; sns.heatmap(df.isnull())
df.dropna() — remove rows/columns with NaNdf.fillna(value) — e.g. 0, mean, mediandf.fillna(method="ffill")df.interpolate() — linear guessRule: never just fill with 0 without thinking — can distort data.