Explore the Titanic dataset (classic ML dataset): predict survival.
Tasks:
Dataset: titanic.csv (download from Kaggle or Seaborn)
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
df = sns.load_dataset("titanic")
print(df.head())
print(df.info())
# Clean
df["age"] = df["age"].fillna(df["age"].median())
df = df.drop(["deck", "embark_town", "alive"], axis=1)
# Visualize
sns.countplot(data=df, x="survived", hue="sex")
plt.title("Survival by Gender")
plt.show()
sns.histplot(data=df, x="age", hue="survived", multiple="stack")
plt.title("Age Distribution by Survival")
plt.show()
sns.heatmap(df.corr(numeric_only=True), annot=True, cmap="coolwarm")
plt.title("Correlation Matrix")
plt.show()
Insights: women survived more, children prioritized, class matters.