---
title: "Welcome to taylor"
description: |
Learn about the package and what it can do!
vignette: >
%\VignetteIndexEntry{Introduction to taylor}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r, child="children/chunk-options.txt"}
```
taylor is an R package for accessing and exploring data related to Taylor Swift's discography.
It provides built in data sets containing information on the audio characteristics and lyrics of Taylor's songs.
Additionally, taylor offers some helper functions for creating Taylor Swift-themed data visualizations.
This document introduces you to taylor's functionality, and shows you how to use them to learn about Taylor Swift's music.
```{r setup}
library(taylor)
```
## I trace the evidence
The main data set is `taylor_all_songs`.
This data set contains audio features from Spotify and lyrics from Genius for each of Taylor Swift's songs.
```{r}
taylor_all_songs
```
The audio features include the danceability, energy, and valence of each track, which are described in the [documentation for the Spotify API](https://developer.spotify.com/documentation/web-api/reference/get-audio-features).
The data set also includes meta data for each track such as the key, tempo, time signature, and duration.
Finally, the lyrics for each track are included in a nested list column.
The lyrics can be accessed by using `tidyr::unnest()`, or by using `purrr::map()` to apply a function to each set of lyrics.
For a detailed description of accessing lyrics, see `vignette("lyrics")`.
A related data set is `taylor_album_songs`.
This data set contains all of the same information as `taylor_all_songs`, but is filtered to only include tracks that are on official studio albums.
This means that standalone singles (e.g., "Only The Young") and features (e.g., Big Red Machine's "Renegade") are not included.
We also exclude albums Taylor doesn't own, but for which a *Taylor's Version* has been released.
For example, *1989* is excluded in favor of *1989 (Taylor's Version)*, but *Taylor Swift* (debut) is included because a *Taylor's Version* of that album has not been released.
taylor also include a small data set called `taylor_albums`.
This data set includes the release date for each album, as well as critic and user ratings from [Metacritic](https://www.metacritic.com/person/taylor-swift/).
```{r}
taylor_albums
```
Finally, there is a data set dedicated to The Eras Tour, specifically the surprise songs that Taylor plays at each show.
The data set, `eras_tour_surprise`, contains the date and location of each show, the color dress Taylor wore during the acoustic set, and the song that was performed on each instrument (piano and guitar).
The data set also includes information on any additional songs that were performed as mashups and guests that Taylor brought out for a performance.
```{r}
eras_tour_surprise
```
## Just another picture to burn
Often as we explore data, we want to create data visualizations.
Naturally, if we're exploring data for Taylor Swift, we need Taylor Swift-themed visualizations.
taylor includes several color palettes and helper functions for `{ggplot2}` to facilitate these visualizations.
First, there are color palettes inspired by each album stored in `album_palettes`.
For example, we can look at a color palette based on the cover art for the *Lover* album.
```{r}
album_palettes$lover
```
There is also a color palette that contains one color for each album, which is useful when comparing albums to each other.
For a complete description of color palette functionality in taylor, see `vignette("palettes")`.
```{r}
album_compare
```
taylor also includes several functions for using the built-in palettes for color and fill scales with ggplot2.
As an example, `scale_color_albums()` to map the `album_compare` palette to geometries that have color mapped to the album name.
In the following plot, we display the cumulative number of surprise songs played from each album and use `scale_color_albums()` to highlight each album within its respective facet.
Plot code
```{r eras-plot, eval = FALSE}
library(dplyr)
library(tidyr)
library(ggplot2)
leg_labels <- unique(eras_tour_surprise$leg)
leg_labels <- gsub("South America", "South\nAmerica", leg_labels)
surprise_song_count <- eras_tour_surprise %>%
nest(dat = -c(leg, date, city, night)) %>%
arrange(date) %>%
mutate(leg = factor(leg, levels = unique(eras_tour_surprise$leg),
labels = leg_labels)) %>%
mutate(show_number = seq_len(n()), .after = night) %>%
unnest(dat) %>%
left_join(distinct(taylor_album_songs, track_name, album_name),
join_by(song == track_name),
relationship = "many-to-one") %>%
count(leg, date, city, night, show_number, album_name) %>%
complete(nesting(leg, date, city, night, show_number), album_name) %>%
mutate(n = replace_na(n, 0)) %>%
arrange(album_name, date, night) %>%
mutate(surprise_count = cumsum(n), .by = album_name) %>%
left_join(select(taylor_albums, album_name, album_release),
by = "album_name") %>%
mutate(surprise_count = case_when(
album_name == "THE TORTURED POETS DEPARTMENT" &
date < album_release ~ NA_integer_,
.default = surprise_count
)) %>%
add_row(leg = factor("Europe"), album_name = "THE TORTURED POETS DEPARTMENT",
show_number = 83.5, surprise_count = 0L) %>%
mutate(album_name = replace_na(album_name, "Other"),
album_group = album_name,
album_name = factor(album_name, c(album_levels, "Other"),
labels = c(gsub("POETS DEPARTMENT",
"POETS\nDEPARTMENT",
album_levels), "Other")))
ggplot(surprise_song_count) +
facet_wrap(~ album_name, ncol = 3) +
geom_line(data = ~select(.x, -album_name),
aes(x = show_number, y = surprise_count, group = album_group),
color = "grey80", na.rm = TRUE) +
geom_line(aes(x = show_number, y = surprise_count, color = album_group),
show.legend = FALSE, linewidth = 2, na.rm = TRUE) +
scale_color_albums(na.value = "grey80") +
scale_x_continuous(breaks = c(1, seq(20, 500, 20))) +
labs(x = "Show", y = "Songs Played") +
theme_minimal() +
theme(strip.text.x = element_text(hjust = 0, size = 10),
axis.title = element_text(size = 9))
```
Plot code
```{r eras-1989, eval = FALSE}
library(patchwork)
missing_firsts <- tibble(date = as.Date(c("2023-11-01",
"2024-02-01",
"2024-05-01",
"2024-10-01")))
day_ones <- surprise_song_count %>%
slice_min(date, by = c(leg, album_name)) %>%
select(leg, date, album_name) %>%
mutate(date = date - 1)
surprise_dat <- surprise_song_count %>%
bind_rows(missing_firsts) %>%
arrange(date) %>%
fill(leg, .direction = "up") %>%
bind_rows(day_ones) %>%
arrange(album_name, date) %>%
group_by(album_name) %>%
fill(surprise_count, .direction = "down")
tour1 <- surprise_dat %>%
filter(leg %in% c("North America (Leg 1)", "South\nAmerica")) %>%
ggplot() +
facet_grid(cols = vars(leg), scales = "free_x", space = "free_x") +
geom_line(aes(x = date, y = surprise_count, group = album_name),
color = "grey80", na.rm = TRUE) +
geom_line(data = ~filter(.x, album_name == "1989 (Taylor's Version)"),
aes(x = date, y = surprise_count, color = album_name),
show.legend = FALSE, linewidth = 2, na.rm = TRUE) +
scale_color_albums() +
scale_x_date(breaks = "month", date_labels = "%b\n%Y", expand = c(.02, .02)) +
expand_limits(y = c(0, 37)) +
labs(x = NULL, y = "Songs Played") +
theme_minimal() +
theme(strip.text.x = element_text(hjust = 0, size = 10),
axis.title = element_text(size = 9))
tour2 <- surprise_dat %>%
filter(!leg %in% c("North America (Leg 1)", "South\nAmerica")) %>%
ggplot() +
facet_grid(cols = vars(leg), scales = "free_x", space = "free_x") +
geom_line(aes(x = date, y = surprise_count, group = album_name),
color = "grey80", na.rm = TRUE) +
geom_line(data = ~filter(.x, album_name == "1989 (Taylor's Version)"),
aes(x = date, y = surprise_count, color = album_name),
show.legend = FALSE, linewidth = 2, na.rm = TRUE) +
scale_color_albums() +
scale_x_date(breaks = "month", date_labels = "%b\n%Y", expand = c(.02, .02)) +
expand_limits(y = c(0, 37)) +
labs(x = NULL, y = "Songs Played") +
theme_minimal() +
theme(strip.text.x = element_text(hjust = 0, size = 10),
axis.title = element_text(size = 9))
tour1 / tour2 + plot_layout(axes = "collect")
```