Five Basic Ways of Importing Data in R

The first task in data analysis is importing data in R. R has several different alternatives for importing datasets, let’s explore five simple yet efficient ways.
Suppose our data is saved as a .csv file named “Employee Survey Data”. The file is saved in C -> Users -> … -> Documents.

1. Using read.table() from base R

empdata1<-read.table("C:/Users/Dell/Documents/Employee Survey Data.csv", header=TRUE, sep=",")

header = TRUE (logical) indicates that the first row of the file contains the names of the columns. Default is header = FALSE

sep = "," specifies that the data is separated by commas. Without this command, R imports data in a single column.

2. Using read.csv() from base R

empdata2<-read.csv("C:/Users/Dell/Documents/Employee Survey Data.csv")

Here we haven’t given additional arguments for header= and sep= because the function is specifically designed to import .csv - Comma Separated Files and header=TRUE is assumed.

3. Using read_csv() from “readr”

Using standard functions from base R to import large sized data can be time consuming. Package “readr” can be used to import data faster. It is useful for importing tabular data.

#Install and load package
install.packages("readr")
library(readr)

empdata3<-read_csv("C:/Users/Dell/Documents/Employee Survey Data.csv")

The read_csv() function also considers by default col_names=TRUE (First row as header) and separator as comma. Object imported using this function is stored as a "tbl", "tbl_df" and "data.frame". Unlike usual data.frame objects, importing doesn’t automatically convert character variables into factors. Functions in package readr, by default, display a progress bar. This helps when the data is very big. It also has a provision to efficiently recognise dates in various formats.

4. Using fread() from “data.table”

fread() is faster than all the functions discussed above and is one of the most popular ways of importing big data.

#Install and load package
install.packages("data.table")
library(data.table)

empdata4<-fread("C:/Users/Dell/Documents/Employee Survey Data.csv")

Objects imported via fread() are stored as "data.table". data.table format is considered to be an enhanced version of data.frame as it can handle a lot more sophisiticated data manipulation tasks. Similar to read_csv(), fread() also doesn’t automatically convert characters into factors.

Decision about whether to use read_csv() or fread() should be taken keeping in mind that data.table and data.frame objects behave in different ways in case of some data manipulation functions.

The above mentioned methods deal with .csv data files. In case your data is saved as an Excel file, you need not convert it to csv format for importing. Package "readxl" provides function to read Excel worksheets in both .xls and .xlsx formats.

5. Using read_excel() from “readxl”

#Install and load package
install.packages("readxl")
library(readxl)

empdata5<-read_excel("C:/Users/Dell/Documents/Employee Survey Data.xlsx")

If sheet= argument is specified, you can choose which worksheet from the file you want to import. If not then by default read_excel() will import the first sheet of the Excel file.

(Do check our Analytics Lab - Archives to see how to import multiple Excel files!)