Quick links
Latest articles
Internship
Members
New...
Algorithms
Discrete Mathematics
Big data
Languages
C
C++
C++ STL
Java
Data Structure
C#.Net
Android
Kotlin
SQL
Web
PHP
Python
JavaScript
CSS
Ajax
Node.js
Web prog.
Programs
C
C++
DS
Java
C#
Python
Aptitude
C
C++
Java
DBMS
Interview
C
Embedded C
Java
SEO
HR
CS Subjects
CS Basics
O.S.
Networks
DBMS
Embedded Systems
Cloud Computing
Machine learning
CS Organizations
Linux
DOS
More...
Articles
Puzzles
News/Updates

Home » Machine Learning/Artificial Intelligence

Validation before Testing | Machine Learning



In this article, we are going to learn about the validation before testing in Machine learning.
Submitted by Raunak Goswami, on August 03, 2018

In my previous article, we have discussed about the need to train and test our model and we wrote a code to split the given data into training and test sets.

Before moving to the validation portion, we need to see what is the need to use validation procedure before performing the testing procedure in the given data set. At times when we are dealing with a huge amount of data there is a certain chance that maybe the data used by our model during learning produced a biased result and in this case as we use the test set to check the accuracy of our model the following 2 cases can arise:

  1. Under fitting of the test data
  2. Over fitting of the test data
Over and Under fitting of the test data

Image source: https://docs.aws.amazon.com/machine-learning/latest/dg/images/mlconcepts_image5.png

So then how do we deal with such a problem? Well, the answer is pretty simple if we can somehow use a 3rd data set to validate the results obtained from the training set so that we can adjust the various hyperparameters like learning rate and batch values to get a balanced result on the validation set which will, in turn, increase the accuracy of our model in estimating the target values from the test set.

Over and Under fitting of the test data

Image source: https://rpubs.com/charlydethibault/348566

Here, you can see that the validation set is nothing but a subset of the training data set that we create. Here do remember that when we create a partition from a dataset. The data present in the datasets are shuffled randomly to remove biased results.

So, let us write a simple code to create a validation data set in python:

File: headbrain.CSV

Here is the code:

# -*- coding: utf-8 -*-
"""
Created on Wed Aug  1 22:18:11 2018

@author: Raunak Goswami
"""

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

#reading the data
"""here the directory of my code and the headbrain.csv 
file is same make sure both the files are stored in the same folder
or directory""" 
data=pd.read_csv('headbrain.csv')

#this will show the first five records of the whole data
data.head()

#this will create a variable x which has the feature values i.e brain weight
x=data.iloc[:,2:3].values 
#this will create a variable y which has the target value i.e brain weight
y=data.iloc[:,3:4].values 


#splitting the data into training and test
"""
the following statement written below will split x and y into 2 parts:
1.training variables named x_train and y_train
2.test variables named x_test and y_test
The splitting will be done in the ratio of 1:4 as we have mentioned 
the test_size as 1/4 of the total size
"""
from sklearn.cross_validation import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=1/4,random_state=0)

#Here we again split the training data further 
##into training and validating sets.
#observe that the size of the validating set is 
#1/4 of the training set and not of the whole dataset
from sklearn.cross_validation import train_test_split
x_training,x_validate,y_training,y_validate=train_test_split(x_train,y_train,test_size=1/4,random_state=0)

After running this python code on your Spyder tool provided by the Anaconda distribution just cross check your variable explorer:

Variable explorer

On the image above you can see that we have split the train variables into training variables and validate variables.

So, guys that is it for today hope you liked this article. Have a great day ahead.






Quick links:
C FAQ(s) C Advance programs C/C++ Tips & Tricks Puzzles JavaScript CSS Python Linux Commands PHP Android Articles More...

Featured post:
Introduction to Linux (Its modes, Safety, Most popular Applications)
Linux Best Distribution Software (Distros) of 2018

Was this page helpful? Please share with your friends...

Are you a blogger? Join our Blogging forum.

Comments and Discussions



Languages: » C » C++ » C++ STL » Java » Data Structure » C#.Net » Android » Kotlin » SQL
Web Technologies: » PHP » Python » JavaScript » CSS » Ajax » Node.js » Web programming/HTML
Solved programs: » C » C++ » DS » Java » C#
Aptitude que. & ans.: » C » C++ » Java » DBMS
Interview que. & ans.: » C » Embedded C » Java » SEO » HR
CS Subjects: » CS Basics » O.S. » Networks » DBMS » Embedded Systems » Cloud Computing » Machine learning » CS Organizations » Linux » DOS
More: » Articles » Puzzles » News/Updates


© https://www.includehelp.com (2015-2018), Some rights reserved.