## packages
require(tidyverse)
require(readxl)
1 The data!
All data paths are relative to the root of the GitHub repository
1.1 The Lore dataset
The album_info_metadata.xlsx
file includes the fan lore: sentiment, message, keywords, muse, color meaning, notes, secret messages, color mentions and their meanings (between positive, neutral or negative). The sentiments were chosen from a list of feelings compiled by the Hoffman Institute Foundation (May/2015 review). Soon it was realized that a single sentiment was not enough to completely differentiate between songs and the message and keywords were also created to add more information to single out a song. For example, while Tim McGraw and Back to December both have the overall nostalgic feeling, the first carries a falling in love message, while the latter is about longing. Likewise, Tim McGraw keywords are romantic, first love, country music, while Back to December keywords are breakup, regretful, heartbreak.
<- readxl::read_excel("raw_data/album_info_metadata.xlsx")[,1:29]
allSongsMetadata allSongsMetadata
# A tibble: 240 × 29
album_name ep album_release track_number track_name artist featuring
<chr> <lgl> <dttm> <dbl> <chr> <chr> <chr>
1 Red FALSE 2021-11-12 00:00:00 6 "22" Taylo… <NA>
2 1989 FALSE 2023-10-27 00:00:00 17 "\"Slut!\… Taylo… <NA>
3 reputation FALSE 2017-11-10 00:00:00 1 "...Ready… Taylo… <NA>
4 Taylor Sw… FALSE 2006-10-24 00:00:00 14 "A Perfec… Taylo… <NA>
5 Taylor Sw… FALSE 2006-10-24 00:00:00 4 "A Place … Taylo… <NA>
6 Lover FALSE 2019-08-23 00:00:00 15 "Afterglo… Taylo… <NA>
7 Red FALSE 2021-11-12 00:00:00 5 "All Too … Taylo… <NA>
8 Red FALSE 2021-11-12 00:00:00 30 "All Too … Taylo… <NA>
9 1989 FALSE 2023-10-27 00:00:00 5 "All You … Taylo… <NA>
10 Midnights FALSE 2022-10-21 00:00:00 3 "Anti-Her… Taylo… <NA>
# ℹ 230 more rows
# ℹ 22 more variables: bonus_track <lgl>, promotional_release <dttm>,
# single_release <dttm>, track_release <dttm>, danceability <dbl>,
# energy <dbl>, key <dbl>, loudness <dbl>, mode <dbl>, speechiness <dbl>,
# acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
# tempo <dbl>, time_signature <dbl>, duration_ms <dbl>, explicit <lgl>,
# key_name <chr>, mode_name <chr>, key_mode <chr>, lyrics <chr>
1.2 Surprise Songs Data Set
Parallel to the development of the metadata (aka lore) database, a few details about each surprise song performance were noted down in the surprise_songs.xlsx
data set. Each concert had a few rows, one per surprise song (Taylor performed two surprise songs in each of the concerts, the first one on the guitar and the second one on the piano). Besides the two regular surprise songs, she occasionally started mashing up songs in this acoustic set; although the first mashup happened in her second night in Ohio (July 1st, 2023), they became more common from the second night of her Melbourne concert (February 17th, 2024). Thus, three columns account for it: Mashups, with the options none, one or two; Mashup
, with the name of the first song mashed up with Song title; and, Mashup2
, with yet a third song that was mashed up with Song title and Mashup (in the case of Mashups = Two).
Besides the song titles and mashups, the surprise songs data set includes the names of the city, state, country, stadium and dates in which she performed. Moreover, the name of the dress she was wearing is included in the column DressName
, and its color in descriptive terms is found on Colour1
, its HEX formatting on ColourHex1
, and its RGB formatting on ColourRGB1
. As some dresses like Flamingo pink and Sunset orange are made up of an ombre of two colors, their name and codes are also found on Colour2
, ColourHex2
and ColourRGB2
. Lastly, other details are also included: who she was dating at the time in the Relationship
column; which leg of the tour (First legs, European, Final leg), which night on that city, which instrument she played while singing said song, special guests in the audience, and notes for overall remarks such as: on July 9th, she sang Last Kiss as one the surprise songs, and that date is mentioned on the song.
## reading in data
<- readxl::read_excel("raw_data/surprise_songs.xlsx", sheet = "List")
surpriseSongsDressColours $Date <- as.Date(surpriseSongsDressColours$Date)
surpriseSongsDressColours surpriseSongsDressColours
# A tibble: 443 × 26
`Song title` Mashups Mashup Mashup2 Guest City State Country Stadium
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 mirrorball None <NA> <NA> <NA> Glen… Ariz… US State …
2 Tim McGraw None <NA> <NA> <NA> Glen… Ariz… US State …
3 State Of Grace None <NA> <NA> <NA> Glen… Ariz… US State …
4 this is me trying None <NA> <NA> <NA> Glen… Ariz… US State …
5 Our Song None <NA> <NA> <NA> Las … Neva… US Allegi…
6 Snow On The Beach None <NA> <NA> <NA> Las … Neva… US Allegi…
7 cowboy like me None <NA> <NA> Marc… Las … Neva… US Allegi…
8 White Horse None <NA> <NA> <NA> Las … Neva… US Allegi…
9 Ours None <NA> <NA> <NA> Arli… Texas US AT&T
10 Sad Beautiful Tragic None <NA> <NA> <NA> Arli… Texas US AT&T
# ℹ 433 more rows
# ℹ 17 more variables: Date <date>, DressName <chr>, Legs <chr>,
# Relationship <chr>, Start <dttm>, End <dttm>, Colour1 <chr>,
# ColourHex1 <chr>, ColourRGB1 <chr>, Colour2 <chr>, ColourHex2 <chr>,
# ColourRGB2 <chr>, `Night #` <dbl>, Order <dbl>, Instrument <chr>,
# `Special Annoucement` <chr>, Notes <chr>
1.2.1 An overview of surprise song dresses across the whole tour
## Need only consider first element of each concerts as the
## same outfit was worn for all surprise songs
## for anyone concert
<- surpriseSongsDressColours %>%
oneRowPerConcert group_by(Date) %>%
arrange(Date, Order) %>%
slice(1) %>%
ungroup()
oneRowPerConcert
# A tibble: 147 × 26
`Song title` Mashups Mashup Mashup2 Guest City State Country Stadium
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 mirrorball None <NA> <NA> <NA> Glen… Ariz… US State …
2 this is me trying None <NA> <NA> <NA> Glen… Ariz… US State …
3 Our Song None <NA> <NA> <NA> Las … Neva… US Allegi…
4 cowboy like me None <NA> <NA> Marc… Las … Neva… US Allegi…
5 Sad Beautiful Tragic None <NA> <NA> <NA> Arli… Texas US AT&T
6 Death By A Thousand… None <NA> <NA> <NA> Arli… Texas US AT&T
7 Speak Now None <NA> <NA> <NA> Tampa Flor… US Raymon…
8 The Great War None <NA> <NA> Aaro… Tampa Flor… US Raymon…
9 mad woman None <NA> <NA> <NA> Tampa Flor… US Raymon…
10 Wonderland None <NA> <NA> <NA> Hous… Texas US NRG
# ℹ 137 more rows
# ℹ 17 more variables: Date <date>, DressName <chr>, Legs <chr>,
# Relationship <chr>, Start <dttm>, End <dttm>, Colour1 <chr>,
# ColourHex1 <chr>, ColourRGB1 <chr>, Colour2 <chr>, ColourHex2 <chr>,
# ColourRGB2 <chr>, `Night #` <dbl>, Order <dbl>, Instrument <chr>,
# `Special Annoucement` <chr>, Notes <chr>
1.3 Simple colour sentiment dataset
The album_info_metadata_neutral.xlsx
dataset…
<- "raw_data/album_info_metadata_neutral.xlsx"
allSongsMetadata <- readxl::read_excel(allSongsMetadata, sheet = "metadata")
allSongsMetadata source("code/colour_palletts.r")
<- data.frame(
rawColorData colour = trimws(unlist(strsplit(allSongsMetadata$colour_MK, ";"))),
meaning = trimws(unlist(strsplit(allSongsMetadata$colour_meaningMK, ";")))
%>% filter(!is.na(colour) & !is.na(meaning))
)
<- rawColorData %>%
colorSentimentScores mutate(
meaning = trimws(meaning),
score = case_when(
tolower(meaning) == "positive" ~ 1,
tolower(meaning) == "neutral" ~ 0.5,
tolower(meaning) == "negative" ~ 0,
TRUE ~ NA_real_
)
)
## Calculate average sentiment for each individual color
<- colorSentimentScores %>%
individualColorSentiments group_by(colour) %>%
summarise(
avgSentiment = mean(score, na.rm = TRUE),
mentionCount = n()
%>%
) ungroup()
$colourGroup <- colorGroups[individualColorSentiments$colour]
individualColorSentiments$colourHexColour <-sapply(individualColorSentiments$colour, \(x) colorPaletteColours[[x]])
individualColorSentiments$colourGroupColour <-sapply(individualColorSentiments$colourGroup, \(x) colorPaletteGroups[[x]])
individualColorSentiments
individualColorSentiments
# A tibble: 69 × 6
colour avgSentiment mentionCount colourGroup colourHexColour
<chr> <dbl> <int> <chr> <chr>
1 amber 0.5 1 yellows #FFBF00
2 aquamarine 1 1 blues #7FFFD4
3 aurora borealis green 1 2 greens #78E08F
4 black 0.5 9 blacks #000000
5 black and white 0.25 4 black and wh… #C0C0C0
6 blackout 1 1 blacks #1A1A1A
7 bleached 0.5 1 whites #F5F5DC
8 blood monlit 1 1 reds #8A0303
9 blood-soaked 0 2 reds #8B0000
10 blue 0.409 22 blues #0000FF
# ℹ 59 more rows
# ℹ 1 more variable: colourGroupColour <chr>