Lesson 15: Mini ML Project – Iris Classification (Scikit-learn Intro)

1. Project Overview

We'll build a simple classifier to predict Iris flower species using Scikit-learn — your first real ML model!

Dataset: Iris (150 samples, 4 features: sepal/petal length/width, 3 classes)

Goal: Train model → predict species → evaluate accuracy.

2. Step-by-Step Code

# 1. Import libraries
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score

# 2. Load data
iris = load_iris()
df = pd.DataFrame(iris.data, columns=iris.feature_names)
df["species"] = iris.target

# 3. Split data
X = df.drop("species", axis=1)
y = df["species"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# 4. Train model
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

# 5. Predict & evaluate
predictions = model.predict(X_test)
accuracy = accuracy_score(y_test, predictions)
print(f"Accuracy: {accuracy:.2f}")

Typical accuracy: ~0.93–0.97

Exercise 1

What does train_test_split do?

Exercise 2

In the code, X = df.drop("species", axis=1) means:
X contains all columns "species"
axis=1 means drop

Exercise 3

Why do we split data into train/test?
← Previous Lesson (14) Next Lesson (16) →