Lesson 5: Exploring Data – head, tail, info, describe

1. Quick Look Methods

Always start here when you get a new dataset.

Exercise 1

What does df.info() show?

2. Statistical Summary – describe()

df.describe()
# Output:
#        age       salary
# count   100.0     100.0
# mean     32.5   65000.0
# std       5.2   12000.0
# min      22.0   40000.0
# 25%      28.0   55000.0
# 50%      32.0   65000.0
# 75%      37.0   75000.0
# max      45.0  100000.0

Helps spot outliers (max >> mean), skewness, etc.

Exercise 2

For df["age"], df.describe() shows mean = 32.5, std = 5.2.
What is the approximate 68% range (mean ± 1 std)?
to

Exercise 3

What can you learn from df.describe()?
← Previous Lesson (4) Next Lesson (6) →