
Hacktivists with the group Anna’s Archive — a search engine for shadow libraries, which are unauthorized collections of digital content — say they’ve found a way to download virtually the entirety of Spotify for preservation.
In a blog post detailing their work, the archivists say they’ve archived the audio of some 86 million songs so far, representing 99.6 percent of total listens on the streaming service. They scraped metadata from nearly the entire Spotify library, however, which is some 300 terabytes in size, spanning 256 million tracks. There are 15.43 million artists represented, and 58.6 million albums.
According to the blog post, it constitutes the “largest publicly available music metadata database” to date, and is the first step toward building a “preservation archive” for music.
Compared to books or articles, popular music is already pretty well archived, a fact the activists acknowledge. However, they say current preservation collections are too focused on the most popular commercial songs — the ones already widely available, compared to, say, experimental art music — and too focused on the highest quality file formats.
Along with the files, the hacktivists sorted the song metadata and analyzed it in their blog post. The result is a fascinating birds-eye view of the Spotify catalogue that was previously unavailable to the public.
For example, the data on song popularity leads to the absolutely wild revelation that the top three songs on Spotify have a higher number of streams than the bottom “20-100 million songs combined,” the hacktivists write.
Interestingly, this leads to the question of how much of the Spotify library is made up of AI generated slop, an issue that human artists say is crowding them out of the platform. As the team writes, “we expect this number [of listens] to be higher if you filter to only human-created songs.”
Had they opted to capture the 0.04 percent of songs that have less than 1,000 listens, they say, the total dataset would exceed 700 terabytes of data — albeit “for minimal benefit,” as it would have been difficult to sort the AI junk from tunes produced by humans.
There’s also some interesting findings on genres, for example that electronic dance artists makes up nearly a quarter of all musicians on the platform. After that is rock, followed by world/traditional, Latin, rap, pop, and classical, in that order.
Spotify’s internal data features also analyze each song by key, leading to the conclusion that the most common key is C Major — making up 9.3 percent of songs on the platform — while the least common is D# or Eb minor, making up 1.3 percent.
While the Spotify scrape represents a significant step for music preservation, it only represents a portion of the world’s music. Still, as the hacktivists write, “it’s a great start.”
More on Spotify: King Gizzard Pulled Their Music From Spotify in Protest, and Now Spotify Is Hosting AI Knockoffs of Their Songs