
减少可解释分类器的数据量
Introduction
Selecting a small number of “representative” samples from a large data set may be of greater interpretative value than generating some “optimal” linear combination of all the elements of a data set.
In this paper, we motivate a particular method for selecting prototypes in the classification setting.
Formulation as an optimization problem
The intuition
A good set of prototypes:
- Every train-ing example should have a prototype of its same class in its neighborhood
- No point should have a prototype of a different class in its neighborhood
- There should be as few prototypes as possible
The notion of neighborhood:
For a given choice of Pl ⊆ X , we consider the set of ε-balls centered at each xj ∈ Pl .
- Covers as many training points of class l as possible
- Covers as few training points as possible of classes other than l
- Is sparse
set cover integer program
Set cover can be seen as a clustering problem in which we wish to find the smallest number of clusters such that every point is within ε of at least one cluster center.
From intuition to integer program.
Express the three properties(notion of neighborhood)as an integer program
Solving the problem
Present two algorithms for approximately solving our problem, both based on standard approximation algorithms for set cover.
LP relaxation with randomized rounding.
A greedy approach.
Examples on simulated and real data
Mixture of Gaussion simulation

ZIP code digits data


参考资料
- Bien J, Tibshirani R. Prototype selection for interpretable classification[J]. The Annals of Applied Statistics, 2011: 2403-2424.




近期评论