logo le blog invivoo blanc

Discovering recommendation systems

27 July 2019 | Beyond Data, Big Data, Machine Learning | 0 comments


We all wonder how Amazon or Netflix came to such “power” and success? How can Netflix know about our movie preferences? How did Amazon know the unconditional Games of Thrones’ fan that I am, that I love The North Face and Geography? Without spilling the beans, it’s thanks to what we call recommendation systems! Today, Netflix knows all about its users: what they watch, their favorite video as well as the frequency with which we watch movies and series. This huge database could have boosted their algorithm.

Amazon could break the whole market! Amazon was able to attract the largest number of Americans. In the book “The Four”, Scott Galloway mentions that 52% of Americans have an Amazon Prime account when 49% own a gun.

First of all, a recommendation system is a business tool, it reinforces up to 30% of a company’s income. Today, a user does not want to be offered on the Internet products he has already bought or that do not interest him. This is why recommendation systems aim to understand the behavior of the user. This will make his life easier, allowing the site and/or application to gain his trust.

Today many sectors use recommendation systems. Given proof, they are present in online stores such as Amazon, streaming services such as Netflix or Spotify or specific recommendation systems for content-based advertising. These recommendation systems share the same principle, which is to filter in advance among a large mass of objects likely to interest the buyer (whether it may be products, books, movies etc.).

There are several types of filtering for the recommendation. In this article we will discuss these different types as well as their advantages and disadvantages.


This is the recommendation of products based on the interests of a large group of users. This method tries to find a group of users that has the same taste and preferences of the target user. So, he uses this group to recommend their products. Starting from the assumption that users with similar taste have the same preferences.

Two approaches are used in this method:

  • Based on the user (user-based recommendation)
  • Based on the product (item-based recommendation)


In this approach, we build a matrix A: [User x Product]

recommendation systems

In this matrix, we find that User 1 and User 2 are correlated (3 out of 5 products have the same score). This is how we recommend to one of the users the favorite products of the second. But how can we calculate the correlation or the similarity between two users?

Similarity methods:

  • Cosine similarity

The correlation of two users is important for such values of u

  • Jaccard similarity

This method consists in ignoring the scores

The advantages:

  • No information on the products and their features

The inconveniences:

  • A New user starts without having a recommendation since his scores are all zero (similarly for new products).
  • In the case of a lare number of products, the matric becomes hollow. So it’s harder to “match” correlated users (who have the same preferences).


In this approach, we also build the same matrix A. But the difference is in correlation between the products. We look for products that are potentially correlated. And we offer a product that is related to other products (having the most degree of correlation).


  • No information on products and their features
  • Finding a correlation between a limited number of product is better than finding a correlation between a very large number of users.
  • Products are easier to correlate than users and are more meaningful.


  • A new user starts without having a recommendation since his scores are all zero (similarly for new products).
  • In the case of a large number of products, the matrix becomes hollow. So, it’s harder to “match” correlated users (who have the same preferences).


This type of recommendation system is based on profiles. In fact, we build profiles for users and for products.

For example, we build a profile of a user A, who prefers action and romance genre series (case of Netflix). Then, we try to recommend products that are the same section as A prefers. (We no longer rely on the opinions of other users).


  • No need for data on other users.
  • Ability to recommend products that are unknown or new
  • Ability to recommend products to users, with unique or rare taste.

The inconveniences:

  • Finding the right “features” is not always easy.
  • How can we create a profile for new users?



Imagine for a user A, that we want to estimate the score of a movie B knowing that user A has never given a score for films similar to movie B. The approach is therefore to take the average score of user A.

For example, user A’s average score is 3.7 and movie B, 0.5 above the score average. Usually, user A gives a score of 0.2 above the average score.

So user A will estimate the score at: 3.7 + 0.5 – 0.2 = 4


As seen before, user A will estimate the score of movie B to 4. By using the collaborative filtering, user A gave for a movie C (which is of the same kind as movie B) a score of 1 (bad score).

Hence the final value of the score of movie B given by user A is: 4 + (-1) = 3


In recent years, recommendation systems have a particularly important role in online marketing. Thanks to them, e-commerce companies have been able to differentiate themselves from their competitors, facilitate the lives of current customers and reach their potential customers.

Depending on the strategies of the companies, several recommendation techniques are integrated to adapt the needs. As we have seen in the article, these methods have different advantages and disadvantages; no solution between them can therefore answer all the problems. In reality, companies use several approaches and combine them to have a better recommendation, evaluated by certain predefined criteria in their context and their objectives.

While the operation of a recommendation system is fairly simple, its implementation is complicated. These challenges lie in aspects such as collection and selection of relevant data, data size and quality, data scarcity, construction of user profiles, prediction for new user profiles or new products.