Transforming Recommendations at ASOS
ASOS is the destination for fashion-loving 20-somethings around the world, with a purpose to give its customers the confidence to be whoever they want to be. Through its app and mobile/desktop web experience, available in nine languages and in over 200 markets, ASOS customers can shop a curated edit of nearly 50,000 products, sourced from nearly 900 global and local third-party brands alongside a mix of fashion-led own-brand labels.
The AI team at ASOS is a cross-functional team of Machine Learning Engineers and Scientists, Data Scientists and Product Managers, that uses machine learning to improve the customer experience, improve retail efficiency, and drive growth.
Recommendations at ASOS
With a catalogue of nearly 50,000 products at any given time — and with hundreds of new products being introduced every week — our personalisation system is critical to surfacing the right product, to the right customer, at the right time. We leverage machine learning to help our customers find their dream product by providing high-quality personalisation through multiple touch points during the customers’ journey (see figure below).
Our aim is to optimally rank products to increase customer engagement, which we measure through clicks and purchases. We do this by using huge amounts of customer interaction data (e.g. purchases/saved for later/added to bag) together with deep learning to produce a personalised ranking of products, recommendations, for each customer¹.
While our current recommender achieves good online results and scales efficiently, serving 5 billion requests a day through a variety of different touch points², its performance can be improved by extracting more meaning from the customer-product interactions.
In this post we explain how we are levelling up our fashion recommender by using cutting edge transformer technology to better capture a customer’s style and infer the relative importance of interactions over time.
Recommendations with transformers
Transformers, the T in ChatGPT, is a model architecture that revolutionised a multitude of areas of machine learning from natural language processing (NLP) to computer vision. While transformers were designed to solve tasks in Natural Language, recent research has extended their use onto recommender systems. Transformer models enable us to better capture a customer’s style with a mechanism called self-attention and infer the relative importance of a customer’s interactions over time using positional awareness.
Self-attention enables a machine learning model to build context-aware representations of inputs within a sequence. By weighing the importance of all other inputs within a sequence with respect to itself when processing, the model is able to build a richer understanding of each input and thus the whole sequence. For instance a pair of heels can be interpreted differently when grouped with a set of causal clothes vs more formal wear. This additional context allows the model to better interpret a product and capture the essence of a customer’s style.
Positional awareness allows the model to interpret the order of a customer’s past product interactions and decipher the relative importance of a product given its position in the sequence e.g. a model may give more importance to a product purchased yesterday vs a product viewed three months ago. This enables the model to better serve our customers relevant recommendations as their context switches between browsing sessions and as their style evolves over time.
Self-attention in NLP
Self-attention first became prominent in Natural Language Processing³. In NLP an input sequence is commonly a sentence. Transformers use self-attention to build a rich representation of each word in a sentence by considering the importance of the other words around it. It is simple to see why context is important within the two following sentences:
Sentence 1: The lamp emitted dim light which illuminated the room.
Sentence 2: He found the suitcase was surprisingly light as he lifted it.
Both sentences contain the word light however due to context they have two completely different meanings. Self-attention is able to capture each meaning distinctively.
This visualisation shows how self-attention weighs different words in the two sentences with respect to the word light. The shade of the lines and boxes indicates these weights. It is intuitive to understand that the first sentence focuses on lamp, emitted and illuminated to enrich the light representation to mean illumination. Whilst in the second sentence weight is given to suitcase, surprising and lifted to build its encoding meaning weight.
An example use of the rich representation of the word light could be a synonym generator. Given the context with the first sentence as input, a model could output gleam, luminance or illumination. Given the context of the second sentence as input the model could output easy, manageable or featherweight. These synonyms would not make sense if used in the alternate sentence.
Self-attention in recommendations
You must be thinking “enough about lights— what has this got to do with recommendations at ASOS?”
Instead of a sentence (sequence of words) we now have a customer’s interactions history (sequence of products) but the intuition behind self-attention remains the same.
Much like words in a sentence clothing has different meaning or style depending on the context of the product. Let’s look at another example.
Customer 1: [Dare2b Waterproof Jacket, ASOS DESIGN sport socks, Salomon trainers, Columbia insulated trousers, Calvin Klein T-shirt]
Customer 2: [Selected Homme sweatshirt, Carhartt WIP jacket, Salomon trainers, ASOS DESIGN trunks, Dickies trousers]
Whilst both customer histories contain the Salomon trainers their histories suggest different styles. Let’s look at how self-attention may weigh the other products in each customer history to build its own representation of the Salomon trainers.
For customer 1 (left) we imagine a keen outdoors person, the Salomon trainers purchased for their original hiking purpose. Self-attention gives weight to waterproof jacket and insulated trousers to capture a hiking style of the Salomons in this context. For customer 2 (right) we imagine someone styling Salomon trainers with streetwear. The self-attention here focuses on the Carhartt jacket and Dickies trousers to capture this. Similarly to example above in NLP, we now have a rich representation of our Salomons that is flexible depending on context within a sequence.
Suppose we now query our model to return alternative shoe recommendations for both customers (quite like our synonym generator from before), perhaps we could expect something like this:
Transformers extend this further to something called multi-head self-attention where this processing is performed multiple times to capture all the different nuances in the relationships in the input sequence. Self-attention can create powerful representations of inputs as well as introducing advantages over previous architectures like parallelisation and long-term dependencies.
Positional awareness in recommendations
“But what about the order of interactions that you mentioned earlier?”
Many traditional recommendation models, including our current model, consider all products equally, regardless of when a product was interacted with. By encoding positional information with the input, transformers are able to exploit the sequential nature of customers’ interactions to recognise longer term style shifting and shorter term context switching.
Longer term style shifting could be specific for a customer or more generally for a product. A particular customer may shift away from a certain type of style over time or a particular product may shift to being styled differently. On the other hand, for more immediate changes in customer browsing intent, the model can pick up distinct switches in product interactions that are in the sequence. Perhaps in one context a customer is shopping for a holiday in Europe’s winter sun and the following week is preparing for England’s colder months (see figure below). In both of these cases the transformer uses positional awareness of these product interactions through their sequential nature to more appropriately recommend products to a customer.
Building transformers at ASOS
Our recommendations team built a transformer recommendations system in-house with the Transformers4Rec library and support from our partners at NVIDIA. “Transformers4Rec is a flexible and efficient library for sequential and session-based recommendations and can work with PyTorch.”⁵
Transformers4Rec provides a framework to exploit the advantages of transformer architectures for sequential recommendations having been adapted from their initial construction for NLP tasks. Using ASOS customers’ past interactions we were able to train a recommendations system that utilises the concepts of self-attention and positional awareness explained above within a transformer architecture.
So, does this really transform recommendations?
We compare our new transformer recommendations model to our current asymmetric matrix factorisation online model. The two keys differences, explained above, between our current model and this new transformer model is self-attention and positional awareness. Our current model considers all products equally in a customer’s interaction history regardless of position in their history and regardless of relative importance of the products. Whilst transformers consider both these factors carefully leading to their superior performance.
Offline evaluation results show that we have indeed transformed recommendations at ASOS with this new architecture. We show in our offline results that we increased our evaluation metric by over 20% over our baseline model when using our new transformer model!
Wow, pretty transformer-tive I’d say!
[1] Getting Personal At ASOS, Jacob Lang, https://medium.com/asos-techblog/getting-personal-at-asos-bc1599e0c2a9
[2] Implementing Model Serving At Scale, ASOS AI Engineering, https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s52123/
[4] BertViz, Visualize Attention in NLP Models, Jesse Vig, github.com/jessevig/bertviz#attention-head-view
[5] Transformers4Rec, NVIDIA Merlin, https://github.com/NVIDIA-Merlin/Transformers4Rec
This article was written by Ed Harris — a Machine Learning Scientist at ASOS.com. In his spare time he enjoys running and watching sports. Thank you to the NVIDIA Merlin team for their partnership on this project. Thank you to all colleagues in the ASOS Recommendations team, with special thanks to those who worked on this project — Manon Deprette, Duncan Little, Jacob Lang and Mason Cusack. Further thanks to the aforementioned and Dawn Rollocks for reviewing this blog.