H&M Personalized Product Recommendations

Built a product recommendation system to recommend products based on previous transactions, as well as from customer and product meta data using the data provided by H&M.

H&M Personalized Product Recommendations
Photo by Sei / Unsplash
  • Built a product recommendation system to recommend products based on previous transactions, as well as from customer and product meta data using the data provided by H&M.

  • The data contains 1,05,542 unique products with information on 24 characteristics for each product.

  • The data contains information on 13,71,980 consumers and 3,17,88,324 client transactions from 2018 to 2020.

  • A custom lightweight candidate retrieval method was created using a combination of retrieval of candidates that were purchased together in the last week as well as
    most popular candidates based on age group.

  • The candidates were ranked using a LightGBM model based on features created using the frequency of product purchase as well as the percentage of customers that purchased that product.

  • A fine tuned recommendation system using a custom candidate retrieval method and LightGBM Ranking model was used to make final predictions which yielded an MAP@12 score of 0.345 and an overall AUC of 0.76.

Data

The purchase history of customers across time, along with supporting metadata has been provided. The goal is to predict what articles each customer will purchase in the 7-day period immediately after the training data ends.

Files provided:

  • articles.csv - detailed metadata for each article_id available for purchase
  • customers.csv - metadata for each customer_id in dataset
  • transactions.csv - data consisting of the purchases each customer for each date, as well as additional information. Duplicate rows correspond to multiple purchases of the same item.

The dataset can be downloaded from Kaggle.

Analysis

The complete analysis can be viewed in this notebook.

Distribution of number of Transactions per day:

  • October 2019 recorded the highest number of transactions in duration of 2018 to 2020.
  • There is a quarterly seasonal spike of transactions.
  • There tends to be a large number of transactions in the month of December every year.
Distribution of number of Transactions per day | Image by Author

Distribution of number of Transactions per day grouped by Sales Channel:

  • Sales Channel 1 has daily consistent number of transactions per day with rarely any large spikes.
  • Sales Channel 2 consistently outperforms Sales Channel 1 throughout 2018 to 2020.
  • The quarterly seasonal spike of transactions is caused by transactions through Sales Channel 2.
Distribution of number of Transactions per day grouped by Sales Channel | Image by Author

Distribution of number of unique Articles sold per day grouped by Sales Channel:

  • Sales Channel 1 has daily consistent number of unique Articles sold per day with rarely any spikes.
  • Sales Channel 2 consistently sells more unique products per day than Sales Channel 1 throughout 2018 to 2020.
Distribution of number of unique Articles sold per day grouped by Sales Channel | Image by Author

After seeing the distribution of transactions and unique articles sold per day, we get the intuition that Sales Channel 1 customers are more consistent and conservative buyers.

On the other hand, customers that use Sales Channel 2 are ready to try out new products and also purchase products only in specific months during the year.

Distribution of Customers across Age Group:

  • The highest number of customers are aged 21 years.
  • A large proportion of the customer demographic are young adults aged 19 to 26.
  • There is also a significant customer base that is aged 46 to 56 years.
Distribution of Customers across Age Group | Image by Author

Distribution of Customers who have subscribed for Fashion News Alerts:

  • 67% of all the customers have not subscribed for Fashion News Alerts.
  • 32% of the customers have subscribed for regular updates.
  • 1% of the customers have subscribed for monthly updates
Distribution of Customers who have subscribed for Fashion News Alerts | Image by Author

Product Groups with the highest number of Product Types:

The product group 'Accessories' has the highest number of product types followed by 'Shoes' and 'Upper Body Garments'.

Product Groups with the highest number of Product Types | Image by Author

Product Types with the highest number of unique articles:

The product type 'Trousers' has the highest number of unique articles closely followed by 'Dress'.

Product Types with the highest number of unique articles | Image by Author

Product Departments with the highest number of unique articles:

The product type 'Jersey' has the highest number of unique articles closely followed by 'Knitwear'.

Product Departments with the highest number of unique articles | Image by Author

Product Graphical Appearance Names with the highest number of unique articles:

The highest number of articles are of 'Solid' appearance followed by 'All over pattern'.

Product Graphical Appearance Names with the highest number of unique articles | Image by Author

Product Index with the highest number of unique articles:

The index named 'Ladieswear' has the highest number of unique articles closely followed by 'Divided'.

Product Index with the highest number of unique articles | Image by Author

Product Colour Group Names with the highest number of unique articles:

The highest number of articles are of 'Black' colour group followed by 'Dark Blue' and 'White'.

Product Colour Group Names with the highest number of unique articles | Image by Author

Experiments:

Intuition behind the custom retrieval strategy can be found here.

Candidate Retrieval

A lightweight candidate retrieval method was created that was a combination of the following retrieval strategies.

  • Recommend Items Purchased Together in the last week:

    • Get 5 pairs of each article that were sold in the past week.
    • Ignore any article that wasn't sold within the past week.
    • Ignored any pair purchased by less than 2 customers.
  • Recommend Items Purchased Together in the last few weeks:

    • The number of previous weeks was tuned.
  • Recommend most popular items based on age group

Experiments:

The custom retrieval strategy was developed based on careful consideration and intuitive understanding, which can is explored in detail in this notebook.

Candidate Retrieval

A sophisticated candidate retrieval method was devised by combining multiple effective retrieval strategies. The following approaches were employed:

Recommend Items Purchased Together in the Last Week:

  • Extracted 5 pairs of articles that were sold in the past week.
  • Excluded any article that was not sold within the past week.
  • Filtered out pairs purchased by fewer than 2 customers.

Recommend Items Purchased Together in the Last Few Weeks:

  • Adjusted the number of previous weeks based on empirical analysis and optimization.

Recommend Most Popular Items Based on Age Group:

  • Identified the most popular items considering the specific age group of the customers.

This candidate retrieval method is manually designed and time-aware, incorporating valuable trend information for enhanced performance.

Candidate Ranking:

To further refine the recommendation process, a comprehensive feature creation process was undertaken for candidate ranking. The features utilized are as follows:

Percentage of Customers the Pair was Based on: Calculated the proportion of customers who purchased the particular pair of items.

Recency of Article Purchase: Considered how recently the article was bought by customers.

Number of Times the Pair of Products was Purchased: Accounted for the frequency of purchases made for the given pair.

To rank the candidates, a powerful LightGBM ranking model was employed. The LightGBM model's hyperparameters, such as n_estimators and num_leaves, were meticulously tuned to ensure optimal performance.

Results:

The recommendation system, finely tuned using the custom candidate retrieval method and the advanced LightGBM Ranking model, delivered impressive results. The system achieved an outstanding MAP@12 score of 0.345, showcasing its ability to accurately recommend relevant items to users. Furthermore, the overall AUC (Area Under the Curve) reached an impressive value of 0.76, confirming the system's effectiveness in capturing user preferences and generating high-quality recommendations.

Run Locally

The code to run the project can be found here: H&M Personalized Product Recommendations Github.

  1. Install required libraries:
  pip install -r requirements.txt
  1. Generate local cv:
  python hm-cv.py
  1. Fine-tune models and generate predictions:
  python hm-custom-retrieval-pred.py.py

Feedback

If you have any feedback, please reach out to me.