collaborative filtering

Collaborative Filtering is a method of making automatic predictions (filtering) about the interests of a user by collecting preferences or taste information from many users (collaborating).

1. Types

1.1 Memory-based

1.1.1 User-based Collaborative Filtering

Look for users who share the same rating patterns with the active user (the user whom the prediction is for).

Use the ratings from those like-minded users found in step 1 to calculate a prediction for the active user.
A specific application of this is the user-based Nearest Neighbor algorithm.

Another form of collaborative filtering can be based on implicit observations of normal user behavior (as opposed to the artificial behavior imposed by a rating task).

Typical examples of this approach are neighbourhood-based CF and item-based/user-based top-N recommendations.

1.2 Model-based

1.1.2 Item-based Collaborative Filtering

Build an item-item matrix determining relationships between pairs of items.

Infer the tastes of the current user by examining the matrix and matching that user’s data.

1.3 Hybrid

2. Algorithm

2.1 Item-based Collaborative Filtering Algorithm

Similarity Computation

Prediction Generation

How to Compute Similarity

Cosine-based Similarity
Drawback: the differences in rating scale between different users are not taken into account.
Correlation-based Similarity
Adjusted Cosine Similarity
Offsets this drawback by subtracting the corresponding user average from each co-rated pair.

How to Compute Prediction

Weighted Sum
Regression

Item-item similarity is computed by looking into co-rated items only.

Core Codes

## Adjusted Cosine Similarity
adj_cos_similarity<-function(ij,R){
  co_index<-apply(is.na(R[,ij]),1,sum)==0
  S<-R[co_index,ij]
  R[is.na(R)]=0
  a<-sweep(S,1,apply(R,1,mean)[co_index])[,1]
  b<-sweep(S,1,apply(R,1,mean)[co_index])[,2]
  return(sum(a*b)/(sqrt(sum(a^2))*sqrt(sum(b^2))))
}
## Prediction Computation
user_rating_prediction<-function(u_index, R, k, r){
  predict_index<-which(is.na(R[u_index,]))
  rated_index<-which(!is.na(R[u_index,]))
  pred_computation<-function(pred_i,u_index,k){
    i<-dim(R)[2]
    comb<-t(combn(c(pred_i,rated_index),2))
    ij<-matrix(comb[apply(comb==pred_i,1,sum)!=0],ncol=2)
    sim<-apply(ij,1,adj_cos_similarity,R)
    sim_index<-order(sim,decreasing=T) [1:k]
    a<-sum(abs(sim[sim_index]))
    b<-sum(sim[sim_index]*R[u_index,ij[sim_index,2]])
    return(a/b)
  }
  user_pred<-rep(0,i)
  user_pred[predict_index]<-apply(as.matrix(predict_index,ncol=1),1,pred_computation,u_index,k)
  return(list(user_prediction=user_pred,movie_recommend=colnames(R)[order(user_pred,decreasing = T)[1:r]]))
}

疑问

怎么知道评分是0的是没打分还是没买过啊

References

https://en.wikipedia.org/wiki/Collaborative_filtering