Home » Machine Learning/Artificial Intelligence

Attribute Selection | Machine Learning

In this article, will be looking at the method for selecting the desired set of attributes that will form the basis of our machine learning model.
Submitted by Raunak Goswami, on September 02, 2018

All machine learning models are based on two types of values:

  1. Feature columns
  2. Target values contained in a separate column

Let us understand what do these two categories of values mean...

Feature columns:

Feature columns are the columns which are independent entities or in other words, we can say that these values help in predicting the target values. While making the prediction the model generally uses these set of feature values present in feature columns to predict the required target value. There can be multiple feature columns for a single target value however it is not always necessary that all the columns apart from the one containing the target values would be used as feature values.

Target values:

This is the dependent values, in other words, we can say that these are the values that are being predicted by our machine learning model. While training our model the data set we use contains the feature columns along with the target values, however when it comes to testing our model the data set used will generally not contain the predefined target values but at times in order to cross-check the prediction made even the test datasets contain the target values.

Let us understand this with the help of an example:

Consider this data set: headbrain.arff

Sample data set 1

Suppose here we want to predict brain weight of a person but on analyzing the data we get to know that there is no relation between the age, gender and the brain weight of a person so in this case age and gender cannot be used as the feature columns so now the question arises that how do we remove these attributes?

One solution is that we remove these attributes and create a new .arff file where we will only have the column having the target value i.e. the brain weight and one attribute column i.e. the Head size column.

To remove the non-essential attributes we would be using the remove method from the class weka.filters.unsupervised.attribute.Remove

Let us now write the java code for the following in the Eclipse IDE:


import weka.core.Instances;
import weka.core.converters.ArffSaver;
import java.io.File;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;

public class attrib{
	public static void main(String args[]) throws Exception{
		//load the dataset
		DataSource source = new DataSource("headbrain.arff");
		//this stores the values of the data in the dataset instances 
		Instances dataset = source.getDataSet();
		
			
		String[] opts = new String[]{ "-R", "1"};
		//create a Remove object to remove the non required attributes
		Remove remove = new Remove();
		//set the filter options
		remove.setOptions(opts);
		//pass the dataset to the filter
		remove.setInputFormat(dataset);
		//apply the filter to remove the first column which is the gender
		Instances newData = Filter.useFilter(dataset, remove);
		//applying the same filter again will remove the second column which is the age range
		remove.setInputFormat(newData);
		Instances newData1 = Filter.useFilter(newData, remove);
		

		
		ArffSaver saver = new ArffSaver();
		saver.setInstances(newData1);
		saver.setFile(new File("C:\\Users\\Logan\\Desktop\\ML\\headbraina.arff"));
		saver.writeBatch();
		System.out.println("The final dataset after removing non essential attributes is as follows");
		System.out.println(newData1);
	}
}

Output

attribute selection example output

In this way, we can successfully remove the non-essential attributes using java programming in order to make our data look less complex, which will help in the easy analysis of the data set. This was all for today guys hope you liked it. Have a great day ahead.






Comments and Discussions

Ad: Are you a blogger? Join our Blogging forum.
Learn PCB Designing: PCB DESIGNING TUTORIAL




Languages: » C » C++ » C++ STL » Java » Data Structure » C#.Net » Android » Kotlin » SQL
Web Technologies: » PHP » Python » JavaScript » CSS » Ajax » Node.js » Web programming/HTML
Solved programs: » C » C++ » DS » Java » C#
Aptitude que. & ans.: » C » C++ » Java » DBMS
Interview que. & ans.: » C » Embedded C » Java » SEO » HR
CS Subjects: » CS Basics » O.S. » Networks » DBMS » Embedded Systems » Cloud Computing » Machine learning » CS Organizations » Linux » DOS
More: » Articles » Puzzles » News/Updates

© https://www.includehelp.com some rights reserved.