2016: A Year In Lyrics

6 min readApr 4, 2018

The debate over lyrics vs melody has been going on since the first musician put words to a song. Or maybe it was a poet who put music to their verse. Ever since, there have been many arguing that lyrics are the most important and others defending the merits of melody. I have always found myself on the lyrics side of the line. In 2016, I analyzed the lyrics from the songs I heard to see what the lyrical data revealed about me as a listener.

Decoding Songwriting with Data

I worked with my friend, Paul, who is a songwriter, to understand the qualitative side of songwriting. By understanding the songwriting perspective, I could extract more relevant stories from the data. We spent a few weeks analyzing a sample set of lyrics from 312 different songs. The result of this analysis became a panel at SXSW in 2017 titled Decoding Songwriting with Data. (written summary)

Data — the Long, Hard, Painful Way

Next I collected the lyrics of all 2,549 songs. During the initial analysis, it took about 30 seconds per song, to collect the lyrics and make them ready to process and analyze. At that rate it would take close to 22 hours to collect all the lyrics. It ended up taking longer, much longer.

Manually collecting and cleaning the lyrics

The lyric collection process became very zen-like. My fingers trained to move over only the keys needed to keep things moving. My eyes narrowing in to see only the things they need to see. Each command precise, no wasted effort. Each night I sat down on the couch to pull another set of lyrics, I would become automation incarnate. In these moments of computational zen, my mind would drift away from the monotonous task. I began to see things in the data that I may have otherwise never found. One such pattern was the number of geographic locations appearing in the data. Another was different versions of the same songs, with slightly different lyrics. Both became rich areas for investigation and visualization.

Comparing the lyrics in 4 different versions of “Atlantic City”

The example above shows the difference in lyrics between 4 different versions of the song “Atlantic City”, written and originally performed by Bruce Springsteen. Each vertical line represents a word in the song, starting from the first word on the left to the last word on the right. The height of each line is mapped to the number of times that word appears in the song. That detail makes the verses and choruses more apparent, since the verses repeat words, raising the count of those words. The 4 versions are very consistent through the first two verses and choruses. Then they really start to deviate as the song moves into the bridge and the final choruses. What stands out is how different each song becomes due to the repetition of specific phrases.

The step by step process of collecting and processing song lyrics to turn them into usable data

Though the process was arduous, it was fruitful. I was able to extract 93.3% of the lyrics I needed. From there, I ran them through a handful of different web tools and APIs in order to collect the rest of the data. In all, there were 91 columns of data and 2,549 rows leading to nearly 232,000 data points.

Sample Set vs Entire Data Set

A few things stood out from the SXSW analysis that I wanted to carry over to my personal research. The first was pronouns, which set songwriters apart based on the pronouns they use. The second was lyrical density, which shows if the song is beat dominant or syllable dominant.

Pronouns

In the SXSW research I discovered, unsurprisingly, that “I” was the most used pronoun collectively. What did surprise me was that the most frequently used pronoun by artists was not always “I”. For half of the artists it was “you”. That same pattern held true in the larger data set. In fact, the order of most to least common pronouns was almost identical in the two data sets. The point of pronouns is to let the listener into the song by letting them decide who the pronouns represent. Each artist has a unique way or employing pronouns to do that.

In the chart of pronouns, the size of each rectangle relates to the number of times each pronoun appears. The subdivisions in the rectangles represent contracted versions of each pronoun such as “I’ve” or “she’d”. The dominance of “I” and “you” is clear. What surprised me was the small number of plural pronouns — “we” and “us”. That shows how interpersonal the lyrics are of the songs I listen to. They are written from the perspective on one person relating to another, “me to you” or “you and I”.

Lyrical Density

Most of the music I heard in 2016 was beat dominant, which means the word density was low. Genre typically mapped to word density, with hip hop being the most lyrically dense. While that is no revelation, what was surprising was the range of density among hip hop artists. While most artist’s songs stayed within a narrow range of word density, many hip hop artists stretched across a much wider range. A Tribe Called Quest had the widest range of all artists. Other prolific musicians like Bob Dylan and Beyonce had a large variety of word density in their songs as well.

Each vertical line on the chart above represents an artist. Each dot along the line is a song by that artist that I heard in 2016. The horizontal line represents equilibrium between syllables per minute (SPM) and beats per minute (BPM). 88.3% of the songs fall below that line, meaning that they have a higher BPM than SPM. The blue lines represent the artists who made my 10 favorite albums of the year.

Crafting The Print

Finding the stories in the data is a labor intensive process. Translating those findings into a printed report take a similar amount of rigor and refinement. From the layout to the colors to the typography, each aspect of the print gets iterated on until it’s right.

Time lapse of each iteration of the print file

On this particular print I went with a full color background with white and contrasting color type and graphics. The refinement of the colors in each iteration goes from dark shades at first to more pale tones before arriving at a saturated palette.

What’s Next

In charting the geographical references, I saw a lack of diversity in the places referenced. This is a symptom of a lack of diversity in the music I listen to. In 2018, I’m focused on broadening my music discovery and finding new sources of discovery that expose me to a diversity of new music.

There is a mountain of opportunity to dig deeper into lyrics. Especially in the catalogs of prolific artists like The Beatles, David Bowie, Tribe Called Quest, and Kendrick Lamar.

But first, I have another report coming later this year with a focus on all music recommendations in 2017. I gave a preview of that work at SXSW. Look for that report later this year.

Find Eric’s data + music work on his website: www.ericboam.com. Purchase copies of this printed report here.