collaborative filtering

Collaborative Filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).

1. Types

1.1 Memory-based

1.1.1 User-based Collaborative Filtering

  1. Look for users who share the same rating patterns with the active user (the user whom the prediction is for).
  2. Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user.
    A specific application of this is the user-based Nearest Neighbor algorithm.

Another form of collaborative filtering can be based on implicit observations of normal user behavior (as opposed to the artificial behavior imposed by a rating task).

Typical examples of this approach are neighbourhood-based CF and item-based/user-based top-N recommendations.

1.2 Model-based

1.1.2 Item-based Collaborative Filtering

  1. Build an item-item matrix determining relationships between pairs of items.
  2. Infer the tastes of the current user by examining the matrix and matching that user’s data.

1.3 Hybrid

2. Algorithm

2.1 Item-based Collaborative Filtering Algorithm

  1. Similarity Computation
  2. Prediction Generation

How to Compute Similarity

  • Cosine-based Similarity
    Drawback: the differences in rating scale between different users are not taken into account.
  • Correlation-based Similarity
  • Adjusted Cosine Similarity
    Offsets this drawback by subtracting the corresponding user average from each co-rated pair.

How to Compute Prediction

  • Weighted Sum
  • Regression

Item-item similarity is computed by looking into co-rated items only.

Core Codes

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
## Adjusted Cosine Similarity
adj_cos_similarity<-function(ij,R){
co_index<-apply(is.na(R[,ij]),1,sum)==0
S<-R[co_index,ij]
R[is.na(R)]=0
a<-sweep(S,1,apply(R,1,mean)[co_index])[,1]
b<-sweep(S,1,apply(R,1,mean)[co_index])[,2]
return(sum(a*b)/(sqrt(sum(a^2))*sqrt(sum(b^2))))
}
## Prediction Computation
user_rating_prediction<-function(u_index, R, k, r){
predict_index<-which(is.na(R[u_index,]))
rated_index<-which(!is.na(R[u_index,]))
pred_computation<-function(pred_i,u_index,k){
i<-dim(R)[2]
comb<-t(combn(c(pred_i,rated_index),2))
ij<-matrix(comb[apply(comb==pred_i,1,sum)!=0],ncol=2)
sim<-apply(ij,1,adj_cos_similarity,R)
sim_index<-order(sim,decreasing=T) [1:k]
a<-sum(abs(sim[sim_index]))
b<-sum(sim[sim_index]*R[u_index,ij[sim_index,2]])
return(a/b)
}
user_pred<-rep(0,i)
user_pred[predict_index]<-apply(as.matrix(predict_index,ncol=1),1,pred_computation,u_index,k)
return(list(user_prediction=user_pred,movie_recommend=colnames(R)[order(user_pred,decreasing = T)[1:r]]))
}

疑问

  1. 怎么知道评分是0的是没打分还是没买过啊

References

  1. https://en.wikipedia.org/wiki/Collaborative_filtering