Factors in R Language

In this tutorial, we are going to learn about the Factors in R Language, Characteristics of a factor, Creation of a factor, etc., with examples.
Submitted by Bhavya Sri Khandrika, on December 06, 2020

The term 'factor' is defined as a structure of data, which is used for areas that receive only predefined determinate values of a certain specific category. The factors are the variables, which alters a limited amount of diverse standards. The factors of language 'R' are the data objects, which are made use to sort the figures and to store it on numerous levels.

These factors are used to store both the integers and the strings. They are expedient in the columns, in which it is having a limited degree of the sum of values of unique numbers, and R always categorizes stages in alphabetical order by default. For example, we can take the following "Man, "Woman" and False, True, etc. The primary usage of these factors is that they are beneficial in the analysis of data for modeling the statistical representation.

Characteristics of a factor in R language

The following are the qualities of a factor:

  • X: It is to be taken as an input vector, which is to be transmuted into a factor.
  • Labels: They are the character vectors, which resembles the number of labels.
  • Level: A level is an input vector, which characterizes the set of values of unique numbers, which are engaged as X, to be transformed into a factor.
  • Ordered: It is a rational characteristic of a factor, which regulates if the levels are ordered or not.
  • Exclude: It is made usage to stipulate the value, which we desire to be excluded.
  • Nmax: Nmax is made use to postulate the superior bound for the supreme quantity of level.

Example:

# Initially we need to create an input for our program
data <- c("E","W","E","N","N","E","W","W","W","E","N")
print(data)

#Checks whether the considered input is vector or factor
print(is.factor(data))

# Apply the factor function to the above created vector.
factor_data <- factor(data)

#The factor function enables the user to 
#convert the vector into a factor.
print(factor_data)

#Checks whether the displayed output 
#is vector or factor
print(is.factor(factor_data))

Output:

[1] "E" "W" "E" "N" "N" "E" "W" "W" "W" "E" "N"
[1] FALSE
 [1] E W E N N E W W W E N
Levels: E N W
[1] TRUE

Creation of a factor in R Language

In the R language, it is fairly modest to make a factor. There are two steps in which the process of creating a factor of 'R' language is completed.

  1. The creation of a vector, which we use as an input.
  2. The process in which the vector is converted into a factor.

R arranges the factor() function to alter the vector into factors.

Syntax:

Factor data<- factor (vector)

Let us perceive an instance to apprehend how factor() function is used:

Example 1:

# create a vector to use it as an input.  
data <- c ("hot", "cold", "wet", "cold", "hot", "dry", "cold", "hot", "dry", "wet", "dry")

print(data)  

print(is.factor(data))  

# applying the factor function.  
factor_data<- factor (data)  

print(factor_data)  

print(is.factor(factor_data))  

Output:

[1] "hot"  "cold" "wet"  "cold" "hot"  "dry"  "cold" "hot"  "dry"  "wet" 
[11] "dry" 
[1] FALSE
 [1] hot  cold wet  cold hot  dry  cold hot  dry  wet  dry 
Levels: cold dry hot wet
[1] TRUE

Example 2:

# We should Make two vectors which are not similar in their lengths.
data <- c ("bat", "hat", "bat", "cat", "cat", "bat", "hat", "hat", "hat", "bat", "hat")

print(data)

print(is.factor(data))

# Now we need the factor function to be applied.
factordata <- factor(data)

print(factordata)

print(is.factor(factordata))

Output:

[1] "bat" "hat" "bat" "cat" "cat" "bat" "hat" "hat" "hat" "bat" "hat"
[1] FALSE
 [1] bat hat bat cat cat bat hat hat hat bat hat
Levels: bat cat hat
[1] TRUE

The function of factors in Data Frame

The R extravagances the text column as a data, which is definite and creates the factors on it, on designing any data frame with a column of text data.

Example 1:

# We should Make two vectors which are not similar 
#in their lengths.
height <- c (134,154,164,133,163,145,125)
weight <- c (49, 48, 67, 54, 68, 51, 41)
gender <- c("female", "female", " male ", " male ", "female", " male ", "female")

# Now we need to create the data frame.
input_data <-  data.frame(height, weight, gender)
print(input_data)

# Examine if the column of sexual category is a factor.
print(is.factor(input_data$gender))  

# Print the gender column so see the levels.
print(input_data$gender)

Output:

S.no     height  weight gender
1            134     49        female
2            154     48        female
3            164     67        male
4            133     54        male
5            163     68        female
6            145     51        male
7            125     41       female
[1] TRUE
[1] female   female   male male female   male female  
Levels: male female

Example 2:

# We should Make two vectors which are not 
#similar in their lengths.
marks <- c (133,155,163,134,143,155,145)
rollno <- c (47, 46, 65, 58, 68, 53, 49)
student <- c("classA", " classB ", " classC ", "classA ", " classB ", " classA ", " classC")

# Now we need to create the data frame.
inputdata <- data.frame(marks, rollno, student)
print(inputdata)

# Examine if the column of sexual category is a factor.
print(is.factor(inputdata$student))

# Print the gender column so see the levels.
print(inputdata$student)

Output:

  marks rollno student
1   133     47   classA
2   155     46  classB
3   163     65   classC
4   134     58    classA
5   143     68  classB
6   155     53    classA
7   145     49  classC
[1] TRUE
[11] classA  classB  classC  classA  classB  classA classC
Levels: classA  classB  classC

The amendment of factor in the R Language

Just like the data frames, R allows us to change the factor in it. We can adapt the assessment of a factor by merely passing on it again. In R, we cannot select values outside of its pre distinct stages that means we cannot enclosure value if its level does not exist on it. For this persistence, we have to make a level of that value, and then we can sum it up to our factor.

Let us see an instance to comprehend how the alteration is complete in factors:

Example 1:

# we need to create a vector which can be used as input.  
data<- c("ball", "dad", "cat", "dad", "ball")   

# Now we need to apply the factor function.  
factordata<- factor(data)  

# we should Print all elements of the factor  
print(factordata)

# Then we must Change the 4th element of factor with Sumit  
factordata[4] <-"cat"  
print(factordata)  

#printing the fourth element  
factordata[4]   

# we should not allocate values outside the levels  
print(factordata)

# the next step is to add the value to the level  
levels(factordata) <- c(levels(factordata),"mom")

# We need to Add a new level  
factordata[4] <- "mom"  
print(factordata) 

Output:

[1] ball dad  cat  dad  ball
Levels: ball cat dad
[1] ball dad  cat  cat  ball
Levels: ball cat dad
[1] cat
Levels: ball cat dad
[1] ball dad  cat  cat  ball
Levels: ball cat dad
[1] ball dad  cat  mom  ball
Levels: ball cat dad mom

Example 2:

# we need to create a vector which can be used as input.  
data <- c ("hot", "bot", "pot", "bot", "hot")   

# Now we need to apply the factor function.  
factordata<- factor(data)  

# we should Print all elements of the factor  
print(factordata)

# Then we must Change the 4th element of factor with map  
factordata[4] <-"pot"  
print(factordata)  

#printing the fourth element  
factordata[4]

# we should not allocate values outside the levels  
print(factordata)

# the next step is to add the value to the level  
levels(factordata) <- c(levels(factordata),"map")

# We need to Add a new level  
factordata[4] <- "map"  
print(factordata) 

Output:

[1] hot bot pot bot hot
Levels: bot hot pot
[1] hot bot pot pot hot
Levels: bot hot pot
[1] pot
Levels: bot hot pot
[1] hot bot pot pot hot
Levels: bot hot pot
[1] hot bot pot map hot
Levels: bot hot pot map

The Generation of Factor Levels in R

We can make the levels in factor by using the gl() function in R. This function takes place in the point of view of three parameters and they are n, k, and labels. Here, n, and k are the integers, which specify how many levels we desire and how many intervals each level is requisite.

Syntax:

Gl (n, k, labels)

Parameters:

  1. n(int): specifies the number of levels.
  2. k(int): specifies the number of replications.
  3. label(string) is a vector of labels for the resulting factor levels.

Example 1:

gen_factor<- gl (3, 5, labels=c ("cat ", "net", "pen"))  
gen_factor  

Output:

[1] cat  cat  cat  cat  cat  net  net  net  net  net  pen  pen  pen  pen  pen 
Levels: cat  net pen


Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.