recommendation system

1. Concept

Recommender systems are a subclass of information filtering system that seek to predict the “rating” or “preference” that a user would give to an item.

2. Approaches

2.1 Collaborative filtering

Collaborative filtering methods are based on collecting and analyzing a large amount of information on users’ behaviors, activities or preferences and predicting what users will like based on their similarity to other users.

Advantages
A distinct advantage of collaborative filtering is its broad applicability; collaborative filtering algorithms don’t need to understand the essence of a particular item, and are capable of accurately recommending a wide range of products. Further, collaborative filtering allows brands to enable serendipity shopping for their consumers by presenting items they would not have necessarily sought to purchase.

How to measure user similarity?
The k-nearest neighbor (k-NN) approach or the Pearson Correlation.

Data Collection
When building a model from a user’s behavior, a distinction is often made between explicit and implicit forms of data collection.
explicit就是公开收集,implicit就是默默视奸收集
Examples of explicit data collection include the following:

  • Asking a user to rate an item on a sliding scale.
  • Asking a user to search.
  • Asking a user to rank a collection of items from favorite to least favorite.
  • Presenting two items to a user and asking him/her to choose the better one of them.
  • Asking a user to create a list of items that he/she likes.

Examples of implicit data collection include the following:

  • Observing the items that a user views in an online store.
  • Analyzing item/user viewing times.
  • Keeping a record of the items that a user purchases online.
  • Obtaining a list of items that a user has listened to or watched on his/her computer.
  • Analyzing the user’s social network and discovering similar likes and dislikes.

Problems
Collaborative filtering approaches often suffer from three problems: cold start, scalability, and sparsity.

  • Cold start: These systems often require a large amount of existing data on a user in order to make accurate recommendations.
  • Scalability: In many of the environments in which these systems make recommendations, there are millions of users and products. Thus, a large amount of computation power is often necessary to calculate recommendations.
  • Sparsity: The number of items sold on major e-commerce sites is extremely large. The most active users will only have rated a small subset of the overall database. Thus, even the most popular items have very few ratings.

A particular type of collaborative filtering algorithm uses matrix factorization, a low-rank matrix approximation technique.

For more detailed in Collaborative Filtering, please go to the article Collaborative Filtering.

2.2 Content-based filtering

Content-based filtering methods are based on a description of the item and a profile of the user’s preference. In other words, these algorithms try to recommend items that are similar to those that a user liked in the past (or is examining in the present).
它的核心思想是根据推荐物品或内容的元数据,发现物品或者内容的相关性,然后基于用户以往的喜好记录,推荐给用户相似的物品。

How to abstract the features of the items
A widely used algorithm is the tf–idf representation (also called vector space representation).

To create a user profile, the system mostly focuses on two types of information: 1. A model of the user’s preference. 2. A history of the user’s interaction with the recommender system.

Basically, these methods use an item profile (i.e., a set of discrete attributes and features) characterizing the item within the system. The system creates a content-based profile of users based on a weighted vector of item features. The weights denote the importance of each feature to the user and can be computed from individually rated content vectors using a variety of techniques. Simple approaches use the average values of the rated item vector while other sophisticated methods use machine learning techniques such as Bayesian Classifiers, cluster analysis, decision trees, and artificial neural networks in order to estimate the probability that the user is going to like the item.

2.3 Hybrid recommender systems

A hybrid recommender system is one that combines multiple techniques together to achieve some synergy between them.

  • Collaborative: The system generates recommendations using only information about rating profiles for different users. Collaborative systems locate peer users with a rating history similar to the current user and generate recommendations using this neighborhood.
    也就是说先基于内容一致得到weighted feature向量,在通过CF找向量相似的user做推荐。
  • Content-based: The system generates recommendations from two sources: the features associated with products and the ratings that a user has given them. Content-based recommenders treat recommendation as a user-specific classification problem and learn a classifier for the user’s likes and dislikes based on product features.
  • Demographic: A demographic recommender provides recommendations based on a demographic profile of the user. Recommended products can be produced for different demographic niches, by combining the ratings of users in those niches.
  • Knowledge-based: A knowledge-based recommender suggests products based on inferences about a user’s needs and preferences. This knowledge will sometimes contain explicit functional knowledge about how certain product features meet user needs.

3. Recommender system evaluation

References

  1. https://en.wikipedia.org/wiki/Recommender_system
  2. https://www.dynamicyield.com/glossary/collaborative-filtering/
  3. https://nychent.gitbooks.io/movie-rating-with-collaborative-filtering/content/02prerequisites/Recommender.html