1  The data!

Note

All data paths are relative to the root of the GitHub repository

## packages 
require(tidyverse)
require(readxl)

1.1 The Lore dataset

The album_info_metadata.xlsx file includes the fan lore: sentiment, message, keywords, muse, color meaning, notes, secret messages, color mentions and their meanings (between positive, neutral or negative). The sentiments were chosen from a list of feelings compiled by the Hoffman Institute Foundation (May/2015 review). Soon it was realized that a single sentiment was not enough to completely differentiate between songs and the message and keywords were also created to add more information to single out a song. For example, while Tim McGraw and Back to December both have the overall nostalgic feeling, the first carries a falling in love message, while the latter is about longing. Likewise, Tim McGraw keywords are romantic, first love, country music, while Back to December keywords are breakup, regretful, heartbreak.

allSongsMetadata <- readxl::read_excel("raw_data/album_info_metadata.xlsx")[,1:29]
allSongsMetadata 
# A tibble: 240 × 29
   album_name ep    album_release       track_number track_name artist featuring
   <chr>      <lgl> <dttm>                     <dbl> <chr>      <chr>  <chr>    
 1 Red        FALSE 2021-11-12 00:00:00            6 "22"       Taylo… <NA>     
 2 1989       FALSE 2023-10-27 00:00:00           17 "\"Slut!\… Taylo… <NA>     
 3 reputation FALSE 2017-11-10 00:00:00            1 "...Ready… Taylo… <NA>     
 4 Taylor Sw… FALSE 2006-10-24 00:00:00           14 "A Perfec… Taylo… <NA>     
 5 Taylor Sw… FALSE 2006-10-24 00:00:00            4 "A Place … Taylo… <NA>     
 6 Lover      FALSE 2019-08-23 00:00:00           15 "Afterglo… Taylo… <NA>     
 7 Red        FALSE 2021-11-12 00:00:00            5 "All Too … Taylo… <NA>     
 8 Red        FALSE 2021-11-12 00:00:00           30 "All Too … Taylo… <NA>     
 9 1989       FALSE 2023-10-27 00:00:00            5 "All You … Taylo… <NA>     
10 Midnights  FALSE 2022-10-21 00:00:00            3 "Anti-Her… Taylo… <NA>     
# ℹ 230 more rows
# ℹ 22 more variables: bonus_track <lgl>, promotional_release <dttm>,
#   single_release <dttm>, track_release <dttm>, danceability <dbl>,
#   energy <dbl>, key <dbl>, loudness <dbl>, mode <dbl>, speechiness <dbl>,
#   acousticness <dbl>, instrumentalness <dbl>, liveness <dbl>, valence <dbl>,
#   tempo <dbl>, time_signature <dbl>, duration_ms <dbl>, explicit <lgl>,
#   key_name <chr>, mode_name <chr>, key_mode <chr>, lyrics <chr>

1.2 Surprise Songs Data Set

Parallel to the development of the metadata (aka lore) database, a few details about each surprise song performance were noted down in the surprise_songs.xlsx data set. Each concert had a few rows, one per surprise song (Taylor performed two surprise songs in each of the concerts, the first one on the guitar and the second one on the piano). Besides the two regular surprise songs, she occasionally started mashing up songs in this acoustic set; although the first mashup happened in her second night in Ohio (July 1st, 2023), they became more common from the second night of her Melbourne concert (February 17th, 2024). Thus, three columns account for it: Mashups, with the options none, one or two; Mashup, with the name of the first song mashed up with Song title; and, Mashup2, with yet a third song that was mashed up with Song title and Mashup (in the case of Mashups = Two).

Besides the song titles and mashups, the surprise songs data set includes the names of the city, state, country, stadium and dates in which she performed. Moreover, the name of the dress she was wearing is included in the column DressName, and its color in descriptive terms is found on Colour1, its HEX formatting on ColourHex1, and its RGB formatting on ColourRGB1. As some dresses like Flamingo pink and Sunset orange are made up of an ombre of two colors, their name and codes are also found on Colour2, ColourHex2 and ColourRGB2. Lastly, other details are also included: who she was dating at the time in the Relationship column; which leg of the tour (First legs, European, Final leg), which night on that city, which instrument she played while singing said song, special guests in the audience, and notes for overall remarks such as: on July 9th, she sang Last Kiss as one the surprise songs, and that date is mentioned on the song.

## reading in data
surpriseSongsDressColours <-  readxl::read_excel("raw_data/surprise_songs.xlsx", sheet = "List")
surpriseSongsDressColours$Date <- as.Date(surpriseSongsDressColours$Date)
surpriseSongsDressColours
# A tibble: 443 × 26
   `Song title`         Mashups Mashup Mashup2 Guest City  State Country Stadium
   <chr>                <chr>   <chr>  <chr>   <chr> <chr> <chr> <chr>   <chr>  
 1 mirrorball           None    <NA>   <NA>    <NA>  Glen… Ariz… US      State …
 2 Tim McGraw           None    <NA>   <NA>    <NA>  Glen… Ariz… US      State …
 3 State Of Grace       None    <NA>   <NA>    <NA>  Glen… Ariz… US      State …
 4 this is me trying    None    <NA>   <NA>    <NA>  Glen… Ariz… US      State …
 5 Our Song             None    <NA>   <NA>    <NA>  Las … Neva… US      Allegi…
 6 Snow On The Beach    None    <NA>   <NA>    <NA>  Las … Neva… US      Allegi…
 7 cowboy like me       None    <NA>   <NA>    Marc… Las … Neva… US      Allegi…
 8 White Horse          None    <NA>   <NA>    <NA>  Las … Neva… US      Allegi…
 9 Ours                 None    <NA>   <NA>    <NA>  Arli… Texas US      AT&T   
10 Sad Beautiful Tragic None    <NA>   <NA>    <NA>  Arli… Texas US      AT&T   
# ℹ 433 more rows
# ℹ 17 more variables: Date <date>, DressName <chr>, Legs <chr>,
#   Relationship <chr>, Start <dttm>, End <dttm>, Colour1 <chr>,
#   ColourHex1 <chr>, ColourRGB1 <chr>, Colour2 <chr>, ColourHex2 <chr>,
#   ColourRGB2 <chr>, `Night #` <dbl>, Order <dbl>, Instrument <chr>,
#   `Special Annoucement` <chr>, Notes <chr>

1.2.1 An overview of surprise song dresses across the whole tour

## Need only consider first element of each concerts as the
## same outfit was worn for all surprise songs
## for anyone concert
oneRowPerConcert <- surpriseSongsDressColours %>%
    group_by(Date) %>%
    arrange(Date, Order) %>% 
    slice(1) %>%
    ungroup()
oneRowPerConcert
# A tibble: 147 × 26
   `Song title`         Mashups Mashup Mashup2 Guest City  State Country Stadium
   <chr>                <chr>   <chr>  <chr>   <chr> <chr> <chr> <chr>   <chr>  
 1 mirrorball           None    <NA>   <NA>    <NA>  Glen… Ariz… US      State …
 2 this is me trying    None    <NA>   <NA>    <NA>  Glen… Ariz… US      State …
 3 Our Song             None    <NA>   <NA>    <NA>  Las … Neva… US      Allegi…
 4 cowboy like me       None    <NA>   <NA>    Marc… Las … Neva… US      Allegi…
 5 Sad Beautiful Tragic None    <NA>   <NA>    <NA>  Arli… Texas US      AT&T   
 6 Death By A Thousand… None    <NA>   <NA>    <NA>  Arli… Texas US      AT&T   
 7 Speak Now            None    <NA>   <NA>    <NA>  Tampa Flor… US      Raymon…
 8 The Great War        None    <NA>   <NA>    Aaro… Tampa Flor… US      Raymon…
 9 mad woman            None    <NA>   <NA>    <NA>  Tampa Flor… US      Raymon…
10 Wonderland           None    <NA>   <NA>    <NA>  Hous… Texas US      NRG    
# ℹ 137 more rows
# ℹ 17 more variables: Date <date>, DressName <chr>, Legs <chr>,
#   Relationship <chr>, Start <dttm>, End <dttm>, Colour1 <chr>,
#   ColourHex1 <chr>, ColourRGB1 <chr>, Colour2 <chr>, ColourHex2 <chr>,
#   ColourRGB2 <chr>, `Night #` <dbl>, Order <dbl>, Instrument <chr>,
#   `Special Annoucement` <chr>, Notes <chr>

1.3 Simple colour sentiment dataset

The album_info_metadata_neutral.xlsx dataset…

allSongsMetadata <- "raw_data/album_info_metadata_neutral.xlsx"
allSongsMetadata <- readxl::read_excel(allSongsMetadata, sheet = "metadata")
source("code/colour_palletts.r")
rawColorData <- data.frame(
    colour = trimws(unlist(strsplit(allSongsMetadata$colour_MK, ";"))),
    meaning = trimws(unlist(strsplit(allSongsMetadata$colour_meaningMK, ";")))
) %>% filter(!is.na(colour) & !is.na(meaning))

colorSentimentScores <- rawColorData %>%
    mutate(
        meaning = trimws(meaning),  
        score = case_when(
            tolower(meaning) == "positive" ~ 1,
            tolower(meaning) == "neutral" ~ 0.5,
            tolower(meaning) == "negative" ~ 0,
            TRUE ~ NA_real_
        )
    )

## Calculate average sentiment for each individual color
individualColorSentiments <- colorSentimentScores %>%
    group_by(colour) %>%
    summarise(
        avgSentiment = mean(score, na.rm = TRUE),
        mentionCount = n()
    ) %>%
    ungroup()

individualColorSentiments$colourGroup <- colorGroups[individualColorSentiments$colour]
individualColorSentiments$colourHexColour <-sapply(individualColorSentiments$colour, \(x) colorPaletteColours[[x]])
individualColorSentiments$colourGroupColour <-sapply(individualColorSentiments$colourGroup, \(x) colorPaletteGroups[[x]])

individualColorSentiments
# A tibble: 69 × 6
   colour                avgSentiment mentionCount colourGroup   colourHexColour
   <chr>                        <dbl>        <int> <chr>         <chr>          
 1 amber                        0.5              1 yellows       #FFBF00        
 2 aquamarine                   1                1 blues         #7FFFD4        
 3 aurora borealis green        1                2 greens        #78E08F        
 4 black                        0.5              9 blacks        #000000        
 5 black and white              0.25             4 black and wh… #C0C0C0        
 6 blackout                     1                1 blacks        #1A1A1A        
 7 bleached                     0.5              1 whites        #F5F5DC        
 8 blood monlit                 1                1 reds          #8A0303        
 9 blood-soaked                 0                2 reds          #8B0000        
10 blue                         0.409           22 blues         #0000FF        
# ℹ 59 more rows
# ℹ 1 more variable: colourGroupColour <chr>