1 Characteristics of the Music Recommender Domain

  • Duration of consumption: music is short, so the amount of time the user takes to form an opinion about it is also short -> music items are more disposable

  • Catalog size: there is more music than movies or TV series. So it is important for commercial MRS’s to be scalable.

  • Different representations and abstraction levels:

    • Music is not only in the form of audio, but could also be music videos or digital score sheets

    • It could also be at different levels of granularity: artists, album, or song

    • Non-standard recommendation task: radio stations or concert venues

  • Repeated consumption: repetition of recommendations are tolerated or even appreciated in music than movies or books

  • Sequential consumption: songs are usually consumed in sequence

    • Sequence-aware recommendation problems: automatic playlist continuation or next-track recommendation

    • There are unique constraints and modelling assumptions related to serial consumption

    • Evaluation criteria is also different

  • Passive consumption: music is played in the background, so the implicit feedback based on user’s reaction to a certain song might not be reflective of their true preference indications.

  • Importance of content: MRS focuses more on content-based approaches (such as content-based filtering) compared with movies, which uses collaborative filtering techniques more.

    • Hence, extracting semantic information from music is crucial: the audio signal, artist or song name, album cover, lyrics, album reviews, score sheet

    • Then leverage the similarities between items and user profiles to make recommendations

    • Explicit rating data is relatively rare (Anthony Fantano can’t rate all the music out there possible)

2 Types of MRS Use Cases

3 Use cases: basic music recommendation, lean-in, and lean-back

2.1 Basic Music Recommendation

Basic music recommendation aims at providing recommendations to support users in browsing a music collection, which is similar to recommendation systems in other domains. There are two functionalities: 1) item-to-item recommendations and 2) generate personalized recommendation lists on the platform’s landing page

Item to item recommendations

  • Provide relevant artists, albums, and tracks when user browses item pages

  • Relies on similarity inferred from the consumption patterns of the users, usually show up as “Fans also like” or “People who played that also played this”

Personalized recommendation lists on the platform’s landing page

  • Engages a user in a session without their active navigation of content

  • Based on the user’s previous behavior on the platform

  • It’s also UI/UX design research in the industry

Interaction & Feedback Data

The goal is to predict explicit user ratings and reactions for items (likes, favorites, skips) for items in the system’s music catalog.

  • Music streaming services gather implicit user feedback like user listening events, play counts or total time listened for different items, track skips

    • These event metrics can be normalized and thresholded to define relevant concepts
  • The implicit feedback might not accurately reflect users’ preference but it is the only information available

Evaluation Metrics and Competitions

Evaluation is usually done in the offline setting, using the user behavior data as the ground truth. There are a couple of evaluation metrics:

  • Measure the error in relevance predictions: RMSE

  • Assess the quality of generated ranked lists: Precision at k, Recall at k, mean average precision (MAP) at k, average percentile rank, or normalized discounted cumulative gain.

There has been competitions set up in the realm.

  • KDD Cup 2011: purely collaborative filtering task. Data is anonymized and no metadata is available.

  • The Million Song Dataset Challenge from Kaggle, which works with wide variety of data sources like web crawling, audio analysis, and metadata.

2.2 Lean In Exploration

As the name suggests, it emphasizes more active and engaged user interactions. For example, creating a playlist.

Interfaces

  • Interfaces for lean-in exploration provide more controls by the users, such as the search functionalities.

    • To allow for this search functions, we need to index songs with knowledge graphs (release date, relationship with other artists, record labels), community metadata, or playlist titles given by other users.

    • This enriches the search queries also introduces information on context and usage purposes. It allow for queries like “chill music with boy band for sleep”.

  • Lean in exploration interfaces can also be used for other tasks, such as exploration of musical structure, beat structure, or chords of a track. But not widely used in commercial MRS due to it not being the most common uses for an average user.

Evaluation Metrics

Because users are more actively involved in lean-in sessions. Information retrieval metrics mentioned for basic recommendation can be meaningfully calculated, such as precision and recall.

Spotify organized a 2018 RecSys Challenge

Benefit

  • Lean-in use case is great for exploring new items because we can prioritize probing the user with potentially negatively perceived items over optimizing for positive feedback

    • This results in a longer term reward of developing the user profile for future recommendations than just exploiting the items known to please the user

2.3 Lean-Back Listening

  • User interaction is minimized. For example: automatic playlist generation or streaming radio

  • Interface severely limits how the user can control the system

  • Relies on implicit feedback such as song completion, skipping, or session termination a lot more

Lean-Back Data and Evaluation

  • Lean-back recommenders are used in specific contexts like exercising or working.

    • While it is obvious that the user behavior might not reflect their preference perfectly, it is common to assume that these user behavior provide weak signals that can be used for model development and evaluations.

    • But these interactions take place within a particular context, and we should be careful not to take the interactions outside of their context

  • Interfaces are usually designed around concepts of playlists or radio stations. Usually start with a genre or artist to seed the session. Most of the research are done on modeling playlists

  • Playlists provides an attractive source of high-quality positive interactions:

    • Co-occurence of songs in a playlist can be a strong positive indicator of similarity

    • Usually algorithms that can predict which songs should be in a playlist conditioning on context, user preference, or pre-selected songs are useful for generating lean-back recommendations

    • NOTE! Evaluating a playlist generation method relies on comparisons to playlist authors, not playlist consumers. This is because playlist authors has the privilege of active choosing songs to be included, but playlist listeners do not.

  • Competition: Spotify Sequential Skip Prediction Challenge - a paper explaining the approach

2.4 Other MRS Use Case

  • Music event recommendation

  • Playlist discovery and playlist recommendation (different from playlist continuation)

  • Recommending background music for video (pretty sure it’s being done by TikTok)

    • Multi-modal analysis of audio and video
  • Music production: sound recommendation

    • Building more intelligent DAWs with sound, loop, and audio effect recommendation (requires audio analysis and domain knowledge in music composition)
  • Recommend digital score sheets

3 Types of MRS

3.1 Collaborative Filtering

  • Operate solely on data about user-item interactions (explicit ratings or implicit feedback). They are pretty domain-agnostic

  • Prone to biases: devising methods for debiasing is one of the current big challenges in MRS research

    • Data bias

      • Community bias: users of a certain MRS platform do not form a representative sample of the population at large

      • Popularity bias: occurs when certain items receive many more user interactions than others

      • Record label associations

    • Algorithmic bias

      • Effect of age and gender on recommendation performance of CF algorithms

      • Considerable personality bias in MRS algorithms

3.2 Content-Based Filtering

  • Recommendations are made based on the similarity between the user profile and the item representations in a top-k fashion

    • User profile is created from the item content interacted with by the user.

    • In other words, given the content-based user profile of the target user, the items with best matching content representations are recommended

    • The reverse can also be done to directly predict the preference of a user for an item

  • (!!) Content information representation is usually a feature vector that consists of:

    • Handcrafted features

      • Computational audio descriptors of rhythm, melody, harmony, timbre

      • Genre, style, or epoch (either extracted from user-generated tags or provided as editorial information)

    • Latent embeddings from deep learning tasks such as auto tagging

      • Usually computed from low-level audio signal representations such as spectrograms

      • Textual metadata such as words in artist biographies

      • High-level semantic descriptors retrieved from audio features can also be used

  • Example of deep-learning based CBF: predicts whether a track fits a given user’s listening profile

    • Tracks -> feature vector ->track vectors fed into CNN to transform both tracks and profiles into a latent factor space

    • Profiles are fed by averaging the network’s output of their constituting tracks

    • Tracks and user profiles are eventually represented in a single vector space

3.3 Hybrid Approaches

  • Standard of categorizing a recommendation approach into a hybrid one:

    • Consideration of several, complementary data sources OR

      • Including when only a single recommendation technique is used (both acoustic clues + textual info)
    • Combination of two or more recommendation techniques

      • Traditionally - Integrate a CBF with a CF component. Usually used in a fusion manner where the two models generate recommendations separately and then merged by an aggregation function to create the final recommendation list

      • Current - deep learning approaches integrate into a deep neural network architecture with audio content information and collaborative information

        • They can also incorporate other types of textual metadata

3.4 Context-Aware Approaches

  • Many ways to define context and context-aware approaches, here we refer to:

    • Item-related context:

      • Ex: position of a track in a playlist or listening session
    • User-related context:

      • Ex: demographics, cultural background, activity, or mood of the music listener
    • Interaction-related context:

      • Ex: characteristics of the very listening event (location, time, etc.)
  • Strategies to integrate context information:

    • Contextual prefiltering

      • Only the portion of the data that fits the user context is chosen OR

      • Users or items can be duplicated and considered in different contexts

    • Contextual postfiltering

      • A recommendation model that disregards context information is first created

      • Then predictions made by this model are adjusted, conditioned on context to make the final recommendations

    • Extend the latent factor models by contextual dimensions

      • Instead of matrix factorization (a matrix of user-item ratings), we do tensor factorization (a tensor of user-item-context ratings)
  • Recent advancements: deep neural network approaches to context-aware MRS

    • Concatenate the content- or interaction-based input vector to the network with a contextual feature vector

    • Could also integrate context through a gating mechasim

      • Computing the element wise product between context embeddings and the NN’s hidden state (Variational autoencoder architecture)

3.5 Sequential Recommendation

  • Example: next track recommendation or automatic playlist continuation

  • Most state-of-the-art algorithms employ variants of recurrent or convolutional neural networks or autoencoders

3.6 Psychology-Inspired Approaches

  • Combine aforementioned data-driven techniques with psychological constructs. Examples include:

    • psychological models of human memory

      • adaptive control of thought-rational -

        Two important factors for remembering music: 1) frequency of exposure 2) recentness of exposure

      • the inverted-U model

    • personality traits

      • Ex. adapts the level of diversity in the recommendation list according to the personality traits of the user and reranking the CF system results
    • affective state (mood or emotion) of the user

      • rely on studies in music psychology and the correlation between perceived or induced emotion and musical properties of the music listened to

      • Could either be done through NLP techniques on user microblogs or wearable sensors

4 Challenges

4.1 Ensure and Measure the Quality of MRS

  • Similarity vs diversity

    • This level of similarity vs diversity can be personalized through using linear weighting on similarity and diversity metrics

    • Diversity measurement approach: compute the inverse of average pairwise similarities between all items in the recommendation list

    • Similarity and diversity models should be multi-faceted.

      • User may want to receive recommendations of songs with the same rhythm (similarity) but different artists or genres (diversity)
  • Novelty vs Familiarity

    • Meanwhile, it is hard to indicate if the user has listened to a song or has forgotten about a song
  • Popularity, Hotness and Trendiness

    • All three terms can be used interchangeably but popularity is considered time-independent whereas the other two are considered time-dependent sometimes

    • Popular songs could mitigate cold start and keep the users up to date about trending music. So popularity is usually used as a recommendation baseline

    • Mainstreamness is also leveraged in MRS, and users have different preference for it

  • Serendipity - not a lot of attention received recently. Recommends users songs that they would usually not listen to

  • Sequential coherence: common theme, story, or intention (soundtracks, workout music), common era, similar artist, genre, style, or orchestration.

    • Recent study shows that tempo, energy, or loudness (deemed important by 64% of participants), diversity of artists (57%), lyrics’ fit to the playlist’s topic (37%), track order (25%), transition between tracks (24%), popularity (21%), and freshness (20%)

:bulb: This post is my notes to reading the MRS chapter of the Recommender System Handbook