General
Large Margin Nearest Neighbor Classifiction is a NIPS05 paper in which we show how to learn a Mahanalobis distance metric for k-nearest neighbor (kNN) classification by semidefinite programming. The metric is trained with the goal that the k-nearest neighbors always belong to the same class while examples from different classes are separated by a large margin. On seven data sets of varying size and difficulty, we find that metrics trained in this way lead to significant improvements in kNN classification---for example, achieving a test error rate of 1.3% on the MNIST handwritten digits. Our approach has many parallels to support vector machines, including a convex objective function based on the hinge loss, but does not require modifications for problems with large numbers of classes.
MATLAB CODE
NEW VERSION!!!! [ICML 2008] Now supports multiple metrics, validation data sets, metric trees, dimensionality reduction and more .... mLMNN.zip
\
(Old version: here)
install.m (to compile the mex files)
or
installWINDOWS.m (in case you are using Windows)
setpaths.m (To add the paths into your working directory)
demo.m (To see a running demo of LMNN on the bal data set)
The code is based on the very simple alternating projection algorithm. Please let me know about any problems you might encounter with the implementation.
Paper + Talk
The original paper is available here.
You can view the corresponding talk as [PDF] [Keynote (zipped)] or [Quicktime Animation].
This material is based upon work supported by the National Science Foundation under Grant No. 0238323.