Lesson 5: Exploring Data – head, tail, info, describe

1. Quick Look Methods

df.head(n=5) — first n rows
df.tail(n=5) — last n rows
df.sample(n=5) — random n rows
df.info() — types, non-null count, memory
df.describe() — stats (count, mean, std, min/max, quartiles)

Always start here when you get a new dataset.

Exercise 1

2. Statistical Summary – describe()

df.describe()
# Output:
#        age       salary
# count   100.0     100.0
# mean     32.5   65000.0
# std       5.2   12000.0
# min      22.0   40000.0
# 25%      28.0   55000.0
# 50%      32.0   65000.0
# 75%      37.0   75000.0
# max      45.0  100000.0

Helps spot outliers (max >> mean), skewness, etc.

Exercise 2

Exercise 3

← Previous Lesson (4) Next Lesson (6) →