# kNN algorithm for beginners- Building Iris species prediction model

If you are a beginner, this is a step-by-step guide for KNN algorithm where we use KNN to predict the **species of iris flower**. KNN is the simplest supervised ML algorithm. Left is a demo of an Iris plant. It has 3 species on the basis of its sepal and petal

Here we build a simple model to predict the given dimensions will be of which specie with an explanation of the code to help you throughout

**Step 1- loading the necessary libraries**

**Pandas**- Have heard a lot about it but what actually is it?**Pandas name was taken from “Panel data”**which is much like an excel sheet. So pandas is a way to see the data and make changes to it much like an excel file. It is built around a data structure called DataFrame (modeled after R DataFrame). You can always ingest data from formats like .CSV or SQL.*( a great book for learning pandas would be Python for Data Analysis by Wes Mckinney)***Numpy**-used for**scientific computing**in Python. It allows the use of multidimensional arrays and high level mathematical functions like Fourier transformation, pseudorandom number generators.**Scikit learn**- it is an open source project(so free to use and distribute). It contains all the machine learning algorithms and also the documentation for learning. In scikit learning,**NumPy array is the fundamental DS**.**Matplotlib**- As the name suggests, this is the primary scientific plotting library in Python. You can use it to make histograms, scatter plots etc.**Sci-Py**- It is a collection of functions for scientific computing in Python. Most important part of**SciPy is the scipy.sparse**: this provides the sparse matrices (another DS apart from NumPy array).

SO down below is a picture where I load all the libraries and check their versions.

## STEP 2- MEET THE DATA

what happened over here?

**the load_iris() function returns a Bunch object which is like a dictionary. It contains keys and values.** Next I print the keys since it is like a dictionary so I can access the keys.

*What is DESCR? It is a short description of the iris dataset.*

A sneak peek into out data : (don’t be confused, 0,1,2 are labels of iris flowers.

0 is for setosa

1 is for versicolor

3 is for virginica)

## STEP-3 SPLIT INTO TRAINING AND TEST

** train_test_split** function shuffles the dataset using a pseudorandom number generator (why did we need this? Answer in the comments).

*Random state*is just the seed generator. Now the output is X_train, X_test, y_train, y_test (these are all NumPy arrays).

**The train data set containing 75% of the data and test contains 25%.**

## INTERMEDIATE STEP- LET US TAKE ONE LOOK AT DATA

we use a scatter matrix which shows our data like:

## STEP-4 BUILDING THE KNN MODEL

A lot going on here, let us see step by step:

**knn**— it is an object which now holds the algorithm used to build the model on training set and build predictions on new data points.**KNeighborsClassifier**is just for storing training set.- As told earlier
**, Sklearn will contains all algorithms**so we import KNeighborsClassifier from it. **n_neighbors**tell us how many neighbors we want in our model.**knn.fit**- returns a knn object but gives a string representation of our Classifier.

## STEP-5 MAKING PREDICTIONS(PROBABLY THE LAST STEP. PROBABLY :))

I gave the information to my data set as 5,2.9,1,0.2. You can give as per your liking. *.shape()* method tells the number rows and columns which happen to be 1 and 4 in the data i gave right now.

To make prediction, we use predict() method of knn objects

As you can see, it gives me the prediction of** 0 which is a label **and tells that it **belongs to ‘setosa**’. Our model is made!

Last step is to see how accurate our model is. You can do it in 2 ways:

- using score() method of knn object
- np.mean() method

**As you see our model gives 0.97 as output which means 97% accuracy. So, it works pretty well.**

I hope this article was useful for you. I am open to solving any doubts which you may have.

Keep Learning.