This tutorial is an introduction to an instance based learning called K-Nearest Neighbor or KNN algorithm. KNN is part of supervised learning that has been used in many applications in the field of data mining, statistical pattern recognition, image processing and many others. Some successful applications are including recognition of handwriting, satellite image and EKG pattern. Instead of using sophisticated software or any programming language, I will use only spreadsheet functions of Microsoft Excel, without any macro. You can free download the spreadsheet companion of this tutorial.
First, you will learn KNN for classification, then we will extend the same method for smoothing and prediction in solving time series data.
Topics of this tutorial (click any of them to enter):
What is K-Nearest Neighbor (KNN) Algorithm?
How K-Nearest Neighbor (KNN) Algorithm works?
Numerical Example (hand computation)
KNN for Smoothing and Prediction
How do we use the spreadsheet for KNN?
Strength and Weakness of K-Nearest Neighbor Algorithm
Resources for K Nearest Neighbors Algorithm
Give your feedback and rate this tutorial
From: http://people.revoledu.com/kardi/tutorial/KNN/index.html
================
Let us start with K-nearest neighbor algorithm for classification. K-nearest neighbor is a supervised learning algorithm where the result of new instance query is classified based on majority of K-nearest neighbor category. The purpose of this algorithm is to classify a new object based on attributes and training samples. The classifiers do not use any model to fit and only based on memory. Given a query point, we find K number of objects or (training points) closest to the query point. The classification is using majority vote among the classification of the K objects. Any ties can be broken at random. K Nearest neighbor algorithm used neighborhood classification as the prediction value of the new query instance.
For example
We have data from the questionnaires survey (to ask people opinion) and objective testing with two attributes (acid durability and strength) to classify whether a special paper tissue is good or not. Here is four training samples
X1 = Acid Durability (seconds) |
X2 = Strength
(kg/square meter) |
Classification |
7 |
7 |
Bad |
7 |
4 |
Bad |
3 |
4 |
Good |
1 |
4 |
Good |
Now the factory produces a new paper tissue that pass laboratory test with X1 = 3 and X2 = 7. Without another expensive survey, can we guess what the classification of this new tissue is? Fortunately, k nearest neighbor (KNN) algorithm can help you to predict this type of proble
------------------------------------------------