prototype_selection

减少可解释分类器的数据量

Introduction

Selecting a small number of “representative” samples from a large data set may be of greater interpretative value than generating some “optimal” linear combination of all the elements of a data set.

In this paper, we motivate a particular method for selecting prototypes in the classification setting.

Formulation as an optimization problem

The intuition

A good set of prototypes:

Every train-ing example should have a prototype of its same class in its neighborhood
No point should have a prototype of a different class in its neighborhood
There should be as few prototypes as possible

The notion of neighborhood:

For a given choice of Pl ⊆ X , we consider the set of ε-balls centered at each xj ∈ Pl .

Covers as many training points of class l as possible
Covers as few training points as possible of classes other than l
Is sparse

set cover integer program

Set cover can be seen as a clustering problem in which we wish to find the smallest number of clusters such that every point is within ε of at least one cluster center.

From intuition to integer program.

Express the three properties（notion of neighborhood）as an integer program