Demographic statistics popularized by Hans Rosling’s TED talks.
library(gapminder)
gapminder
head(gapminder, n = 6)
## Systemic view
str(gapminder)
Hey, I’m a professional~ I wanna see the data in a systematic way, such as finding out
gapminder
nrow(gapminder)
ncol(gapminder)
names(gapminder)
str(gapminder)
Q: Tell me something about the population variable in the dataset, like, how many countries’ population we have, what the average, who has the largest and smallest population, and many other things! Btw, what type the pop is stored?
head(gapminder$year, n = 10)
mean(gapminder$year, na.rm = TRUE)
median(gapminder$year)
min(gapminder$year)
max(gapminder$year)
length(gapminder$year)
summary(gapminder$year)
class(gapminder$gdpPercap)
typeof(gapminder$gdpPercap)
Welcome to the Tidyverse

Prevalent toolkit for data manipulation
Installation:
## install.packages("tidyverse")
library("tidyverse")
We focus on dplyr today.
dplyrThey do one thing, but they do it well.


Making codes more readable.
Shortcut for %>%:
You still remember str(), right?
str(gapminder)
glimpse(gapminder)
Q: Which countries have the largest populations? And the smallest?


gapminder
gapminder %>%
arrange(pop)
arrange(gapminder, desc(pop))
Q: How many observations do we have in each continent? Do we have same number of observations in each countries in the same continent?
gapminder %>%
count(continent)
# gapminder %>%
# add_count(continent)
gapminder %>%
count(continent, country)
What does count() give?
Q: What was the average GDP per capita and median life expectancy?


gapminder %>%
summarise(mean_gdp = mean(gdpPercap), median_life = median(lifeExp))
Q: What was the average GDP per capita and median life expectancy in each continent?


gapminder %>%
group_by(continent) %>%
summarise(mean_gdp = mean(gdpPercap), median_life = median(lifeExp))
Q: Which countries had the largest population in 2007?


gapminder %>%
arrange(desc(pop))
gapminder %>%
filter(year == 2007) %>%
arrange(desc(pop))
How about which country had the largest population in the decade ending with 2007? (Tip: using %in% as a condition)
Q: If I want


gapminder %>%
select(country, year, pop)
gapminder %>%
select(-continent)
gapminder %>%
select(starts_with("co"))
Q: What’s the life expectancy of the country that had the largest population in 2007—showing the country name, population, and life expectancy together, please?


gapminder
gapminder %>%
filter(year == 2007) %>%
arrange(desc(pop)) %>%
select(country, pop, lifeExp)
Q: What’s the total GDP of each country?


gapminder %>%
mutate(gdp = pop * gdpPercap) %>%
select(country, pop, gdpPercap, gdp)
Q: How do we only keep the integers for all the numeric variables?
gapminder %>%
mutate_if(is.double, round, digits = 0)
When doing gapminder %>% ..., you are NOT adding or changing anything of the gapminder. If you want to save the changes, send the result to an object.
gapminderNew <- gapminder %>% ...
dplyr functions wisely and in combo;
arrange, count, summarisefilter, select, mutategroup_by and mutate_ifQ: I want to fill the missing in the x, and combine y and z to one variable?
df_toy %>%
mutate(x = coalesce(x, 0L),
yz = coalesce(y, z))