Attribute Selection (with Java) in Machine Learning

Machine Learning | Attribute Selection: In this tutorial, we will learn the method for selecting the desired set of attributes that will form the basis of our machine learning model. By Raunak Goswami Last updated : April 17, 2023

Overview

All machine learning models are based on two types of values:

  1. Feature columns
  2. Target values contained in a separate column

Let us understand what do these two categories of values mean...

What are Feature Columns?

Feature columns are the columns which are independent entities or in other words, we can say that these values help in predicting the target values. While making the prediction the model generally uses these set of feature values present in feature columns to predict the required target value. There can be multiple feature columns for a single target value however it is not always necessary that all the columns apart from the one containing the target values would be used as feature values.

What are Target Values?

This is the dependent values, in other words, we can say that these are the values that are being predicted by our machine learning model. While training our model the data set we use contains the feature columns along with the target values, however when it comes to testing our model the data set used will generally not contain the predefined target values but at times in order to cross-check the prediction made even the test datasets contain the target values.

Attribute Selection Example

Let us understand this with the help of an example:

Consider this data set: file name: "headbrain.arff"

@relation headbrain

@attribute Gender numeric
@attribute 'Age Range' numeric
@attribute 'Head Size(cm^3)' numeric
@attribute 'Brain Weight(grams)' numeric

@data
1,1,4512,1530
1,1,3738,1297
1,1,4261,1335
1,1,3777,1282
1,1,4177,1590
1,1,3585,1300
1,1,3785,1400
1,1,3559,1255
1,1,3613,1355
1,1,3982,1375
1,1,3443,1340
1,1,3993,1380
1,1,3640,1355
1,1,4208,1522
1,1,3832,1208
1,1,3876,1405
1,1,3497,1358
1,1,3466,1292
1,1,3095,1340
1,1,4424,1400
1,1,3878,1357
1,1,4046,1287
1,1,3804,1275
1,1,3710,1270
1,1,4747,1635
1,1,4423,1505
1,1,4036,1490
1,1,4022,1485
1,1,3454,1310
1,1,4175,1420
1,1,3787,1318
1,1,3796,1432
1,1,4103,1364
1,1,4161,1405
1,1,4158,1432
1,1,3814,1207
1,1,3527,1375
1,1,3748,1350
1,1,3334,1236
1,1,3492,1250
1,1,3962,1350
1,1,3505,1320
1,1,4315,1525
1,1,3804,1570
1,1,3863,1340
1,1,4034,1422
1,1,4308,1506
1,1,3165,1215
1,1,3641,1311
1,1,3644,1300
1,1,3891,1224
1,1,3793,1350
1,1,4270,1335
1,1,4063,1390
1,1,4012,1400
1,1,3458,1225
1,1,3890,1310
1,2,4166,1560
1,2,3935,1330
1,2,3669,1222
1,2,3866,1415
1,2,3393,1175
1,2,4442,1330
1,2,4253,1485
1,2,3727,1470
1,2,3329,1135
1,2,3415,1310
1,2,3372,1154
1,2,4430,1510
1,2,4381,1415
1,2,4008,1468
1,2,3858,1390
1,2,4121,1380
1,2,4057,1432
1,2,3824,1240
1,2,3394,1195
1,2,3558,1225
1,2,3362,1188
1,2,3930,1252
1,2,3835,1315
1,2,3830,1245
1,2,3856,1430
1,2,3249,1279
1,2,3577,1245
1,2,3933,1309
1,2,3850,1412
1,2,3309,1120
1,2,3406,1220
1,2,3506,1280
1,2,3907,1440
1,2,4160,1370
1,2,3318,1192
1,2,3662,1230
1,2,3899,1346
1,2,3700,1290
1,2,3779,1165
1,2,3473,1240
1,2,3490,1132
1,2,3654,1242
1,2,3478,1270
1,2,3495,1218
1,2,3834,1430
1,2,3876,1588
1,2,3661,1320
1,2,3618,1290
1,2,3648,1260
1,2,4032,1425
1,2,3399,1226
1,2,3916,1360
1,2,4430,1620
1,2,3695,1310
1,2,3524,1250
1,2,3571,1295
1,2,3594,1290
1,2,3383,1290
1,2,3499,1275
1,2,3589,1250
1,2,3900,1270
1,2,4114,1362
1,2,3937,1300
1,2,3399,1173
1,2,4200,1256
1,2,4488,1440
1,2,3614,1180
1,2,4051,1306
1,2,3782,1350
1,2,3391,1125
1,2,3124,1165
1,2,4053,1312
1,2,3582,1300
1,2,3666,1270
1,2,3532,1335
1,2,4046,1450
1,2,3667,1310
2,1,2857,1027
2,1,3436,1235
2,1,3791,1260
2,1,3302,1165
2,1,3104,1080
2,1,3171,1127
2,1,3572,1270
2,1,3530,1252
2,1,3175,1200
2,1,3438,1290
2,1,3903,1334
2,1,3899,1380
2,1,3401,1140
2,1,3267,1243
2,1,3451,1340
2,1,3090,1168
2,1,3413,1322
2,1,3323,1249
2,1,3680,1321
2,1,3439,1192
2,1,3853,1373
2,1,3156,1170
2,1,3279,1265
2,1,3707,1235
2,1,4006,1302
2,1,3269,1241
2,1,3071,1078
2,1,3779,1520
2,1,3548,1460
2,1,3292,1075
2,1,3497,1280
2,1,3082,1180
2,1,3248,1250
2,1,3358,1190
2,1,3803,1374
2,1,3566,1306
2,1,3145,1202
2,1,3503,1240
2,1,3571,1316
2,1,3724,1280
2,1,3615,1350
2,1,3203,1180
2,1,3609,1210
2,1,3561,1127
2,1,3979,1324
2,1,3533,1210
2,1,3689,1290
2,1,3158,1100
2,1,4005,1280
2,1,3181,1175
2,1,3479,1160
2,1,3642,1205
2,1,3632,1163
2,2,3069,1022
2,2,3394,1243
2,2,3703,1350
2,2,3165,1237
2,2,3354,1204
2,2,3000,1090
2,2,3687,1355
2,2,3556,1250
2,2,2773,1076
2,2,3058,1120
2,2,3344,1220
2,2,3493,1240
2,2,3297,1220
2,2,3360,1095
2,2,3228,1235
2,2,3277,1105
2,2,3851,1405
2,2,3067,1150
2,2,3692,1305
2,2,3402,1220
2,2,3995,1296
2,2,3318,1175
2,2,2720,955
2,2,2937,1070
2,2,3580,1320
2,2,2939,1060
2,2,2989,1130
2,2,3586,1250
2,2,3156,1225
2,2,3246,1180
2,2,3170,1178
2,2,3268,1142
2,2,3389,1130
2,2,3381,1185
2,2,2864,1012
2,2,3740,1280
2,2,3479,1103
2,2,3647,1408
2,2,3716,1300
2,2,3284,1246
2,2,4204,1380
2,2,3735,1350
2,2,3218,1060
2,2,3685,1350
2,2,3704,1220
2,2,3214,1110
2,2,3394,1215
2,2,3233,1104
2,2,3352,1170
2,2,3391,1120

Sample data set 1

Question

Suppose here we want to predict brain weight of a person but on analyzing the data we get to know that there is no relation between the age, gender and the brain weight of a person so in this case age and gender cannot be used as the feature columns so now the question arises that how do we remove these attributes?

Solution

One solution is that we remove these attributes and create a new .arff file where we will only have the column having the target value i.e. the brain weight and one attribute column i.e. the Head size column.

To remove the non-essential attributes we would be using the remove method from the class weka.filters.unsupervised.attribute.Remove

Let us now write the java code for the following in the Eclipse IDE:

Java Code

import weka.core.Instances;
import weka.core.converters.ArffSaver;
import java.io.File;
import weka.core.converters.ConverterUtils.DataSource;
import weka.filters.Filter;
import weka.filters.unsupervised.attribute.Remove;

public class attrib {
  public static void main(String args[]) throws Exception {
    //load the dataset
    DataSource source = new DataSource("headbrain.arff");
    //this stores the values of the data in the dataset instances 
    Instances dataset = source.getDataSet();

    String[] opts = new String[] {
      "-R",
      "1"
    };
    //create a Remove object to remove the non required attributes
    Remove remove = new Remove();
    //set the filter options
    remove.setOptions(opts);
    //pass the dataset to the filter
    remove.setInputFormat(dataset);
    //apply the filter to remove the first column which is the gender
    Instances newData = Filter.useFilter(dataset, remove);
    //applying the same filter again will remove the second column which is the age range
    remove.setInputFormat(newData);
    Instances newData1 = Filter.useFilter(newData, remove);

    ArffSaver saver = new ArffSaver();
    saver.setInstances(newData1);
    saver.setFile(new File("headbraina.arff"));
    saver.writeBatch();
    System.out.println("The final dataset after removing non essential attributes is as follows");
    System.out.println(newData1);
  }
}

Output

attribute selection example output

In this way, we can successfully remove the non-essential attributes using java programming in order to make our data look less complex, which will help in the easy analysis of the data set.




Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.