Transforming Recommendations at ASOS

Published in

ASOS Tech Blog

8 min readApr 24, 2024

ASOS is the destination for fashion-loving 20-somethings around the world, with a purpose to give its customers the confidence to be whoever they want to be. Through its app and mobile/desktop web experience, available in nine languages and in over 200 markets, ASOS customers can shop a curated edit of nearly 50,000 products, sourced from nearly 900 global and local third-party brands alongside a mix of fashion-led own-brand labels.

The AI team at ASOS is a cross-functional team of Machine Learning Engineers and Scientists, Data Scientists and Product Managers, that uses machine learning to improve the customer experience, improve retail efficiency, and drive growth.

Recommendations at ASOS

With a catalogue of nearly 50,000 products at any given time — and with hundreds of new products being introduced every week — our personalisation system is critical to surfacing the right product, to the right customer, at the right time. We leverage machine learning to help our customers find their dream product by providing high-quality personalisation through multiple touch points during the customers’ journey (see figure below).

Just some of our recommendations touch points on site. From left to right: ranking on category pages, ranking on search pages, “You Might Also Like” & “People Also Bought” carousels on product pages, “A Little Something Extra” carousel on bag pages.

Our aim is to optimally rank products to increase customer engagement, which we measure through clicks and purchases. We do this by using huge amounts of customer interaction data (e.g. purchases/saved for later/added to bag) together with deep learning to produce a personalised ranking of products, recommendations, for each customer¹.

While our current recommender achieves good online results and scales efficiently, serving 5 billion requests a day through a variety of different touch points², its performance can be improved by extracting more meaning from the customer-product interactions.

In this post we explain how we are levelling up our fashion recommender by using cutting edge transformer technology to better capture a customer’s style and infer the relative importance of interactions over time.

Recommendations with transformers

Transformers, the T in ChatGPT, is a model architecture that revolutionised a multitude of areas of machine learning from natural language processing (NLP) to computer vision. While transformers were designed to solve tasks in Natural Language, recent research has extended their use onto recommender systems. Transformer models enable us to better capture a customer’s style with a mechanism called self-attention and infer the relative importance of a customer’s interactions over time using positional awareness.

Self-attention enables a machine learning model to build context-aware representations of inputs within a sequence. By weighing the importance of all other inputs within a sequence with respect to itself when processing, the model is able to build a richer understanding of each input and thus the whole sequence. For instance a pair of heels can be interpreted differently when grouped with a set of causal clothes vs more formal wear. This additional context allows the model to better interpret a product and capture the essence of a customer’s style.

Positional awareness allows the model to interpret the order of a customer’s past product interactions and decipher the relative importance of a product given its position in the sequence e.g. a model may give more importance to a product purchased yesterday vs a product viewed three months ago. This enables the model to better serve our customers relevant recommendations as their context switches between browsing sessions and as their style evolves over time.

Self-attention in NLP

Self-attention first became prominent in Natural Language Processing³. In NLP an input sequence is commonly a sentence. Transformers use self-attention to build a rich representation of each word in a sentence by considering the importance of the other words around it. It is simple to see why context is important within the two following sentences:

Sentence 1: The lamp emitted dim light which illuminated the room.
Sentence 2: He found the suitcase was surprisingly light as he lifted it.

Both sentences contain the word light however due to context they have two completely different meanings. Self-attention is able to capture each meaning distinctively.

This visualisation shows how self-attention weighs different words in the two sentences with respect to the word light. The shade of the lines and boxes indicates these weights. It is intuitive to understand that the first sentence focuses on lamp, emitted and illuminated to enrich the light representation to mean illumination. Whilst in the second sentence weight is given to suitcase, surprising and lifted to build its encoding meaning weight.

An example use of the rich representation of the word light could be a synonym generator. Given the context with the first sentence as input, a model could output gleam, luminance or illumination. Given the context of the second sentence as input the model could output easy, manageable or featherweight. These synonyms would not make sense if used in the alternate sentence.

Self-attention in recommendations

You must be thinking “enough about lights— what has this got to do with recommendations at ASOS?”

Instead of a sentence (sequence of words) we now have a customer’s interactions history (sequence of products) but the intuition behind self-attention remains the same.

Much like words in a sentence clothing has different meaning or style depending on the context of the product. Let’s look at another example.

Customer 1: [Dare2b Waterproof Jacket, ASOS DESIGN sport socks, Salomon trainers, Columbia insulated trousers, Calvin Klein T-shirt]
Customer 2: [Selected Homme sweatshirt, Carhartt WIP jacket, Salomon trainers, ASOS DESIGN trunks, Dickies trousers]

Whilst both customer histories contain the Salomon trainers their histories suggest different styles. Let’s look at how self-attention may weigh the other products in each customer history to build its own representation of the Salomon trainers.

Example of how self-attention in fashion e-commerce may look. Subject product, Salomon trainers, are highlighted in the grey box. Customer 1 is shown on the left-hand side of the figure and Customer 2 on the right-hand side.

For customer 1 (left) we imagine a keen outdoors person, the Salomon trainers purchased for their original hiking purpose. Self-attention gives weight to waterproof jacket and insulated trousers to capture a hiking style of the Salomons in this context. For customer 2 (right) we imagine someone styling Salomon trainers with streetwear. The self-attention here focuses on the Carhartt jacket and Dickies trousers to capture this. Similarly to example above in NLP, we now have a rich representation of our Salomons that is flexible depending on context within a sequence.

Suppose we now query our model to return alternative shoe recommendations for both customers (quite like our synonym generator from before), perhaps we could expect something like this:

Example footwear recommendations based on the **Salomon trainers** representation given the customer context

Transformers extend this further to something called multi-head self-attention where this processing is performed multiple times to capture all the different nuances in the relationships in the input sequence. Self-attention can create powerful representations of inputs as well as introducing advantages over previous architectures like parallelisation and long-term dependencies.

Positional awareness in recommendations

“But what about the order of interactions that you mentioned earlier?”

Many traditional recommendation models, including our current model, consider all products equally, regardless of when a product was interacted with. By encoding positional information with the input, transformers are able to exploit the sequential nature of customers’ interactions to recognise longer term style shifting and shorter term context switching.

Longer term style shifting could be specific for a customer or more generally for a product. A particular customer may shift away from a certain type of style over time or a particular product may shift to being styled differently. On the other hand, for more immediate changes in customer browsing intent, the model can pick up distinct switches in product interactions that are in the sequence. Perhaps in one context a customer is shopping for a holiday in Europe’s winter sun and the following week is preparing for England’s colder months (see figure below). In both of these cases the transformer uses positional awareness of these product interactions through their sequential nature to more appropriately recommend products to a customer.

Example of context switching in user interactions captured with positional awareness

Building transformers at ASOS

Our recommendations team built a transformer recommendations system in-house with the Transformers4Rec library and support from our partners at NVIDIA. “Transformers4Rec is a flexible and efficient library for sequential and session-based recommendations and can work with PyTorch.”⁵

Transformers4Rec provides a framework to exploit the advantages of transformer architectures for sequential recommendations having been adapted from their initial construction for NLP tasks. Using ASOS customers’ past interactions we were able to train a recommendations system that utilises the concepts of self-attention and positional awareness explained above within a transformer architecture.

So, does this really transform recommendations?

We compare our new transformer recommendations model to our current asymmetric matrix factorisation online model. The two keys differences, explained above, between our current model and this new transformer model is self-attention and positional awareness. Our current model considers all products equally in a customer’s interaction history regardless of position in their history and regardless of relative importance of the products. Whilst transformers consider both these factors carefully leading to their superior performance.

Offline evaluation results show that we have indeed transformed recommendations at ASOS with this new architecture. We show in our offline results that we increased our evaluation metric by over 20% over our baseline model when using our new transformer model!

Offline Evaluation Results of ASOS’ Recommendations Models

Wow, pretty transformer-tive I’d say!

[1] Getting Personal At ASOS, Jacob Lang, https://medium.com/asos-techblog/getting-personal-at-asos-bc1599e0c2a9

[2] Implementing Model Serving At Scale, ASOS AI Engineering, https://www.nvidia.com/en-us/on-demand/session/gtcspring23-s52123/

[3] Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems (NIPS’17). Curran Associates Inc., Red Hook, NY, USA, 6000–6010.

[4] BertViz, Visualize Attention in NLP Models, Jesse Vig, github.com/jessevig/bertviz#attention-head-view

[5] Transformers4Rec, NVIDIA Merlin, https://github.com/NVIDIA-Merlin/Transformers4Rec

This article was written by Ed Harris — a Machine Learning Scientist at ASOS.com. In his spare time he enjoys running and watching sports. Thank you to the NVIDIA Merlin team for their partnership on this project. Thank you to all colleagues in the ASOS Recommendations team, with special thanks to those who worked on this project — Manon Deprette, Duncan Little, Jacob Lang and Mason Cusack. Further thanks to the aforementioned and Dawn Rollocks for reviewing this blog.