Data Frames in the R Programming Language

In this tutorial, we are going to learn about the Data Frames in the R Language, characteristics of data frames, how to create a data frame, extracting the data from the data frame, etc.
Submitted by Bhavya Sri Khandrika, on December 07, 2020

Data Frames

Data frame is nothing but a two-dimensional array. The structure resembles the arrays or a table. Each column of this particular data frame consists of variables and the corresponding row will contain the set of values for that specific column considered. The data frame is also taken as the list in R. The data frames are considered as the special type of list in which each component of the data frame consists of the identical length. We all are acquainted with the fact that the data frames are widely employed in R with a prime motive to store the values of the variables. In addition to that, the data frames can also store the data present in the form of tables, vectors that correspond to a list in a data frame. In simple terms, the data frames are more precisely stated as the list with the equal no of vectors that means the vectors are of the same length in the data frame.

Characteristics of Data Frames

Coming to the characteristics of the data frames here are some of them listed in the below lines. Take a look over them:

  • The names of the columns should not be empty.
  • The row names must be unique.
  • The data stored in a data frame can be either a numeric value, factors, or even character type too.
  • The number of data items stored in every column should be the same. That means each column should have an equal number of data items in them.

How to create a Data frame?

That function is a frame() function used to create a data frame in R. Using this function one can easily create a data frame with the desired parameters in it. The frame() function has a provision such that it can store any type of data types like that of numeric values or characters, in addition to those even integers that can be stored in a data frame.

Example 1:

# Data frame for student data…
student.data<- data.frame(
    roll_no. = c (1011:1017), 
    student_name = c("Shiva", "Arpita", "Rishitha", "Gunjan", "Suman", "Ramya", "Divya"),
    percentage = c(75.7, 90.03, 67, 54.98, 87.2, 89.99, 92.04), stringsAsFactors = FALSE )

# Printing the data frame.          
print(student.data)

Output:

    roll_no.	student_name	percentage
1     1011       	 Shiva      		75.70
2     1012       	Arpita      		90.03
3     1013     	    Rishitha     		67.00
4     1014        	Gunjan      		54.98
5     1015        	Suman      		87.20
6     1016        	Ramya      		89.99
7     1017        	Divya      		92.04

Example 2:

Here, we take the data from a survey that is based on the animals in a zoo. Our task is to create a data frame with the labels as the name of the animal, the date it entered the zoo, age of the animal, along with the weight of the animal. For this let us consider the character vector along with the other integer and numeric vectors.

# Creating the data frame for the following 
# data using the frame() function.
ani.data<- data.frame(
    animal_id = c (1:5), 
    animal_name = c("zebra","elephant","giraffe"," tiger","ostrich"),
    age = c(5,4,6,8,7), 
    entered_date = as.Date(c("2007-05-03", "2010-08-02", "2008-11-25", "2014-03-07", "2006-02-16")),
    stringsAsFactors = FALSE )

# Printing the data frame considered above.
print(ani.data)

Output:

   animal_id       animal_name age entered_date
1         1     zebra       5   2007-05-03
2         2     elephant    4   2010-08-02
3         3     giraffe     6   2008-11-25
4         4     tiger       8   2014-03-07
5         5     ostrich     7   2006-02-16

Extracting the Data From the Data Frame

One of the main important parameters while working with the programs on R is the data. Therefore, proper care must be taken to make sure that the data is extracted completely from the data frame. Also, the central idea of this extracting process is to perform the manipulation of the data considered.

There are three ways in which the data can be extracted. They are:

  1. Extracting data available in the columns using column name.
  2. Extracting data by using the rows, using row names.
  3. Extracting data using particular rows that are corresponding to the columns.

The following example will depict the exact concept of the extraction of the data that is available in the data frames in R.

Extracting the particular columns from the data frame considered

# Creating the data frame using the frame() function in the R.
emp.data<- data.frame( 
    employee_id = c (1:7),  
    employee_name = c("Shiva","Arpita","Rishitha","Gunjan","Suman","Ramya","Divya"), 
    sal = c(683.6,817.2,671.9,925.6,783.65,782.67,927.54),  
    starting_date = as.Date(c("2012-04-06", "2013-08-20", "2014-06-11", "2014-09-25", "2015-02-27", "2013-02-19", "2012-05-12" )), stringsAsFactors = FALSE ) 

# Extracting the particular columns from the data frame considered
final <- data.frame(emp.data$employee_id,emp.data$sal) 
print(final)

Output:

       emp.data.employee_id	emp.data.sal
1                	1   683.60
2                	2   817.20
3                	3   671.90
4                	4   925.60
5                	5   783.65
6                	6   782.67
7                	7   927.54

Extracting the rows and columns from the data frame as per the user requirement

# Creating the data frame using the frame() function in the R 	
emp.data<- data.frame( 
	employee_id = c (1:7),  
	employee_name = c("Shiva","Arpita","Rishitha","Gunjan","Suman","Ramya","Divya"), 
	sal = c(683.6,817.2,671.9,925.6,783.65,782.67,927.54),  
  	
	starting_date = as.Date(c("2012-04-06", "2013-08-20", "2014-06-11", "2014-09-25", 
          "2015-02-27","2013-02-19","2012-05-12")), 
	stringsAsFactors = FALSE 
	) 

# Extracting the third row from the considered data frame  	
final <- emp.data[3,] 
print(final)  

# Extracting the rows from the above considered data frame   	
final <- emp.data[2:5,] 
print(final) 

# Extracting 2nd and 5th row corresponding to the 3rd and 4th column  	
final <- emp.data[c(2,5),c(3,4)] 
print(final)

Output:

employee_id employee_name   sal starting_date
3           3      Rishitha 671.9    2014-06-11


  employee_id employee_name    sal starting_date
2           2        Arpita 817.20    2013-08-20
3           3      Rishitha 671.90    2014-06-11
4           4        Gunjan 925.60    2014-09-25
5           5         Suman 783.65    2015-02-27

  sal starting_date
2 817.20	2013-08-20
5 783.65	2015-02-27

Modifications on the Data Frames

R allows programmers to perform the modifications on the data frames. Like that of matrix modification, one can also modify the data items in the data frames in the R. This task can be accomplished by reassigning the data items in the data frames. As a part of accomplishing modifications, one can add or delete rows and columns to the existing data frame. As a part of mutations, one can perform the following on the existing data frames. They are:

  1. Add columns to an existing data frame using the cbind() function. The cbind() function adds the new column vector to the prevailing data frame.
  2. Rows can be added to the data frame using rbind() function.
  3. To delete the existing columns/rows, simply reassign them with a NULL value.

Example:

Let's workout with the rbind() function and cbind() function with a sample example:

# Creating the data frame using the frame() function in the R
emp.data<- data.frame( 
    employee_id = c (1:7),  
    employee_name = c("Shivam","Arya","Rishi","Arvind","Arjun","Ram","Dheeraj"), 
    favcolor = c("pink","yellow","green","blue","orange","purple","red"),  
    starting_date = as.Date(c("2003-04-06", "2007-08-20", "2004-06-11", "2012-09-25", 
          "2011-02-27","2014-02-19","2010-05-12")), 
    stringsAsFactors = FALSE 
) 

#Adding the row in the data frame 
x <- list(8,"Vardhan","black","2013-02-06") 
rbind(emp.data,x) 

#Adding the column in the data frame 
y <- c("Hyderabad","Lucknow","paris","Dhargha","Meerut","Banglore","Chennai") 
cbind(emp.data,Address=y) 

Output:

    employee_id  employee_name  favcolor    starting_date
1       	1    		Shivam 		pink	2003-04-06
2       	2      		Arya   		yellow	2007-08-20
3       	3     		Rishi		green	2004-06-11
4       	4    		Arvind 		blue    2012-09-25
5       	5     		Arjun   	orange	2011-02-27
6       	6       	Ram         purple	2014-02-19
7       	7   	    Dheeraj     red	   	2010-05-12
8       	8   	    Vardhan	    black	2013-02-06


    employee_id employee_name favcolor starting_date   Address
1      1    	  Shivam 	         pink	2003-04-06   Hyderabad
2      2          Arya               yellow	2007-08-20   Lucknow
3      3     	  Rishi	             green	2004-06-11 	 paris
4      4    	  Arvind 	         blue	2012-09-25   Dhargha
5      5     	  Arjun              orange	2011-02-27	 Meerut
6      6          Ram                purple	2014-02-19   Bangalore
7      7   	    Dheeraj  	         red	2010-05-12    Chennai

Binding two data frames using rbind()

Consider the below code:

# Creating the data frame.  
emp.data<- data.frame(  
    employee_id = c (1:5),   
    employee_name = c("Shubham","Arpita","Nishka","Gunjan","Sumit"),  
    sal = c(623.3,515.2,611.0,729.0,843.25),   
    starting_date = as.Date(c("2012-01-01", "2013-09-23", "2014-11-15", "2014-05-11",  
  	"2015-03-27")),  
    stringsAsFactors = FALSE  
)  
print(emp.data)  

# Creating another data frame using the frame() function 
# in the R Programming Language.
# Creating the data frame.
emp.newdata<- data.frame(
    employee_id = c (1:7),  
    employee_name = c("Shiva","Aryan","Rishin","Arvinda","Arnav","Ramesh","Dhana"),
    sal = c(683.6,817.2,671.9,925.6,783.65,782.67,927.54),   	 
    starting_date = as.Date(c("2012-04-06", "2013-08-20", "2014-06-11", "2014-09-25",
  	    "2015-02-27","2013-02-19","2012-05-12")),
    stringsAsFactors = FALSE
)
print(emp.newdata)  

# Bind the two data frames.
emp.finaldata <- rbind(emp.data,emp.newdata)
print(emp.finaldata)

Output:

 employee_id   employee_name    sal starting_date
1           1  Shubham      623.30          2012-01-01
2           2  Arpita       515.20          2013-09-23
3           3  Nishka       611.00          2014-11-15
4           4  Gunjan       729.00          2014-05-11
5           5  Sumit        843.25          2015-03-27


  employee_id   employee_name      sal          starting_date
1           1   Shiva           683.60       2012-04-06
2           2   Aryan           817.20       2013-08-20
3           3   Rishin          671.90       2014-06-11
4           4   Arvinda         925.60       2014-09-25
5           5   Arnav           783.65       2015-02-27
6           6   Ramesh          782.67       2013-02-19
7           7   Dhana           927.54       2012-05-12


   employee_id   employee_name     sal       starting_date
1            1   Shubham        623.30      2012-01-01
2            2   Arpita         515.20      2013-09-23
3            3   Nishka         611.00      2014-11-15
4            4   Gunjan         729.00      2014-05-11
5            5   Sumit          843.25      2015-03-27
6            1   Shiva          683.60      2012-04-06
7            2   Aryan          817.20      2013-08-20
8            3   Rishin         671.90      2014-06-11
9            4   Arvinda        925.60      2014-09-25
10           5   Arnav          783.65      2015-02-27
11           6   Ramesh         782.67      2013-02-19
12           7   Dhana          927.54      2012-05-12

Summary() Function

In certain cases, the programmer needs to find a statistical summary of the prevailing data. Among the available advantages of using R, one can also find the nature of the input data in the particular data frame.

In order to solve all such difficulties, the team behind the R development made a successful attempt by including the summary() function which will assist the programmer in extracting the statistical summary along with the nature of the considered data. To accomplish this particularly the summary() function considers the data frame to be a single parameter and then returns the required statistical message of the input data considered to the user.

We have seen the above code in which the rows and columns are apprehended to the subsisting data frame. Now in this example, you will go through the concept of deleting the rows as well as the columns as per the user request.

To understand the above concept, take a look over the below code:

# Creating the data frame using the frame () function 
 # in the R Programming Language. 
emp.data<- data.frame( 
    employee_id = c (1:7),  
    employee_name = c("Shivam","Arya","Rishi","Arvind","Arjun","Ram","Dheeraj"), 
    favcolor = c("pink","yellow","green","blue","orange","purple","red"),  
    starting_date = as.Date(c("2003-04-06", "2007-08-20", "2004-06-11", "2012-09-25", 
      "2011-02-27","2014-02-19","2010-05-12")), 
    stringsAsFactors = FALSE 
) 
print(emp.data) 
 
# Deleting the existing rows from the available data frame 
# here the third row will be deleted in the output
emp.data<-emp.data[-3,] 
print(emp.data) 
 
# Deleting the existing columns from the available data frame 
# this code will delete the column corresponding to the employee_id
emp.data$employee_id<-NULL 
print(emp.data)

Output:

  employee_id   employee_name       favcolor    starting_date
1       	1   Shivam          pink	    2003-04-06
2       	2   Arya            yellow	    2007-08-20
3       	3   Rishi           green	    2004-06-11
4       	4   Arvind          blue	    2012-09-25
5       	5   Arjun           orange	    2011-02-27
6       	6   Ram             purple	    2014-02-19
7       	7   Dheeraj         red             2010-05-12
 
 

  employee_id          employee_name       favcolor    starting_date
1       	1       Shivam      pink	    2003-04-06
2       	2       Arya        yellow	    2007-08-20
4       	4       Arvind 	    blue	    2012-09-25
5       	5       Arjun       orange	    2011-02-27
6       	6       Ram         purple	    2014-02-19
7       	7       Dheeraj     red	            2010-05-12


        employee_name    favcolor     starting_date
1    	Shivam          pink	        2003-04-06
2      	Arya            yellow	        2007-08-20
4    	Arvind 	        blue	        2012-09-25
5     	Arjun           orange	        2011-02-27
6       Ram             purple	        2014-02-19
7       Dheeraj         red             2010-05-12


Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.