Lesson 15: Mini-Project – Explore & Visualize a Real Dataset

1. Project Goal

Explore the Titanic dataset (classic ML dataset): predict survival.

Tasks:

Dataset: titanic.csv (download from Kaggle or Seaborn)

2. Step-by-Step Guide

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

df = sns.load_dataset("titanic")
print(df.head())
print(df.info())

# Clean
df["age"] = df["age"].fillna(df["age"].median())
df = df.drop(["deck", "embark_town", "alive"], axis=1)

# Visualize
sns.countplot(data=df, x="survived", hue="sex")
plt.title("Survival by Gender")
plt.show()

sns.histplot(data=df, x="age", hue="survived", multiple="stack")
plt.title("Age Distribution by Survival")
plt.show()

sns.heatmap(df.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()

Insights: women survived more, children prioritized, class matters.

Exercise 1

In Titanic, what does df["survived"].value_counts() show?

Exercise 2

To plot survival by passenger class:
sns.countplot(data=df, x="", hue="survived")

Exercise 3

What insights can you get from this project?
← Previous Lesson (14) Next Lesson (16) →