Lesson 15: Mini-Project – Explore & Visualize a Real Dataset

1. Project Goal

Explore the Titanic dataset (classic ML dataset): predict survival.

Tasks:

Load & inspect data
Clean missing values
Visualize survival by age, class, gender
Find correlations

Dataset: titanic.csv (download from Kaggle or Seaborn)

2. Step-by-Step Guide

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset("titanic")
print(df.head())
print(df.info())

# Clean
df["age"] = df["age"].fillna(df["age"].median())
df = df.drop(["deck", "embark_town", "alive"], axis=1)

# Visualize
sns.countplot(data=df, x="survived", hue="sex")
plt.title("Survival by Gender")
plt.show()

sns.histplot(data=df, x="age", hue="survived", multiple="stack")
plt.title("Age Distribution by Survival")
plt.show()

sns.heatmap(df.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()

Lesson 15: Mini-Project – Explore & Visualize a Real Dataset

1. Project Goal

2. Step-by-Step Guide

Exercise 1

Exercise 2

Exercise 3