Fast Reading of Data From TXT|CSV Files into R: readr package


There are many solutions for importing txt|csv file into R. In our previous articles, we described some best practices for preparing your data as well as R base functions (read.delim() and read.csv()) for importing txt|csv file into R.


In this article, we’ll describe the readr package, developed by Hadley Wickham. readr package provides a fast and friendly solution to read a delimited file into R.

Compared to R base functions, readr functions are:

  1. much faster (X10),
  2. have a helpful progress bar if loading is going to take a while and
  3. all functions work exactly the same way.


Reading Data From txt|csv Files: readr package

Preleminary tasks

  1. Launch RStudio as described here: Running RStudio and setting up your working directory

  2. Prepare your data as described here: Best practices for preparing your data

Installing and loading readr

# Installing
install.packages("readr")
# Loading
library("readr")

The readr package contains functions for reading i) delimited files, ii) lines and iii) the whole file.

Functions for reading delimited files: txt|csv

The function read_delim()[in readr package] is a general function to import a data table into R. Depending on the format of your file, you can also use:


  • read_csv(): to read a comma (“,”) separated values
  • read_csv2(): to read a semicolon (“;”) separated values
  • read_tsv(): to read a tab separated (“\t”) values


The simplified format of these functions are, as follow:

# General function
read_delim(file, delim, col_names = TRUE)
# Read comma (",") separated values
read_csv(file, col_names = TRUE)
# Read semicolon (";") separated values
# (this is common in European countries)
read_csv2(file, col_names = TRUE)
    
# Read tab separated values
read_tsv(file, col_names = TRUE)

  • file: file path, connexion or raw vector. Files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed. Files starting with “http://”, “https://”, “ftp://”, or “ftps://” will be automatically downloaded. Remote gz files can also be automatically downloaded & decompressed.
  • delim: the character that separates the values in the data file.
  • col_names: Either TRUE, FALSE or a character vector specifying column names. If TRUE, the first row of the input will be used as the column names.


read_csv() and read_tsv() are special case of the general function read_delim(). They’re useful for reading the most common types of flat file data, comma separated values and tab separated values, respectively.

The above mentioned functions return an object of class tbl_df which is data frame providing a nicer printing method, useful when working with large data sets.

Reading a file

Reading a local file

  • To import a local .txt or .csv files, the syntax would be:
# Read a txt file, named "mtcars.txt"
my_data <- read_tsv("mtcars.txt")
# Read a csv file, named "mtcars.csv"
my_data <- read_csv("mtcars.csv")

The above R code, assumes that the file “mtcars.txt” or “mtcars.csv” is in your current working directory. To know your current working directory, type the function getwd() in R console.

  • It’s also possible to choose a file interactively using the function file.choose(), which I recommend if you’re a beginner in R programming:
# Read a txt file
my_data <- read_tsv(file.choose())
# Read a csv file
my_data <- read_csv(file.choose())

If you use the R code above in RStudio, you will be asked to choose a file.

  • If your field separator is for example “|”, it’s possible to use the general function read_delim(), which reads in files with a user supplied delimiter:
my_data <- read_delim(file.choose(), sep = "|")

Reading a file from internet

It’s possible to use the functions read_delim(), read_csv() and read_tsv() to import files from the web.

my_data <- read_tsv("http://www.sthda.com/upload/boxplot_format.txt")
head(my_data)
   Nom variable Group
1 IND1       10     A
2 IND2        7     A
3 IND3       20     A
4 IND4       14     A
5 IND5       14     A
6 IND6       12     A

In the case of parsing problems

If there are parsing problems, a warning tells you how many, and you can retrieve the details with the function problems().

my_data <- read_csv(file.choose())
problems(my_data)

Specify column types

There are different types of data: numeric, character, logical, …

readr tries to guess automatically the type of data contained in each column. You might see a lot of warnings in a situation where readr has guessed the column type incorrectly. To fix these problems you can use the additional arguments col_type() to specify the data type of each column.

The following column types are available:


  • col_integer(): to specify integer (alias = “i”)
  • col_double(): to specify double (alias = “d”).
  • col_logical(): to specify logical variable (alias = “l”)
  • col_character(): leaves strings as is. Don’t convert it to a factor (alias = “c”).
  • col_factor(): to specify a factor (or grouping) variable (alias = “f”)
  • col_skip(): to ignore a column (alias = “-” or “_“)
  • col_date() (alias = “D”), col_datetime() (alias = “T”) and col_time() (“t”) to specify dates, date times, and times.


An example is as follow (column x is an integer (i) and column treatment = “character” (c):

read_csv("my_file.csv", col_types = cols(
  x = "i", # integer column
  treatment = "c" # character column
))

Reading lines from a file

Function: read_lines().

  1. Simplified format:
read_lines(file, skip = 0, n_max = -1L)

  • file: file path
  • skip: Number of lines to skip before reading data
  • n_max: Numbers of lines to read. If n is -1, all lines in file will be read.


The function read_lines() returns a character vector with one element for each line.

  1. Example of usage
# Demo file
my_file <- system.file("extdata/mtcars.csv", package = "readr")
# Read lines
my_data <- read_lines(my_file)
head(my_data)
[1] "\"mpg\",\"cyl\",\"disp\",\"hp\",\"drat\",\"wt\",\"qsec\",\"vs\",\"am\",\"gear\",\"carb\""
[2] "21,6,160,110,3.9,2.62,16.46,0,1,4,4"                                                     
[3] "21,6,160,110,3.9,2.875,17.02,0,1,4,4"                                                    
[4] "22.8,4,108,93,3.85,2.32,18.61,1,1,4,1"                                                   
[5] "21.4,6,258,110,3.08,3.215,19.44,1,0,3,1"                                                 
[6] "18.7,8,360,175,3.15,3.44,17.02,0,0,3,2"                                                  

Read whole file

  1. Simplified format
read_file(file)
  • Example of usage
# Demo file
my_file <- system.file("extdata/mtcars.csv", package = "readr")
# Read whole file
read_file(my_file)
[1] "\"mpg\",\"cyl\",\"disp\",\"hp\",\"drat\",\"wt\",\"qsec\",\"vs\",\"am\",\"gear\",\"carb\"\n21,6,160,110,3.9,2.62,16.46,0,1,4,4\n21,6,160,110,3.9,2.875,17.02,0,1,4,4\n22.8,4,108,93,3.85,2.32,18.61,1,1,4,1\n21.4,6,258,110,3.08,3.215,19.44,1,0,3,1\n18.7,8,360,175,3.15,3.44,17.02,0,0,3,2\n18.1,6,225,105,2.76,3.46,20.22,1,0,3,1\n14.3,8,360,245,3.21,3.57,15.84,0,0,3,4\n24.4,4,146.7,62,3.69,3.19,20,1,0,4,2\n22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2\n19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4\n17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4\n16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3\n17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3\n15.2,8,275.8,180,3.07,3.78,18,0,0,3,3\n10.4,8,472,205,2.93,5.25,17.98,0,0,3,4\n10.4,8,460,215,3,5.424,17.82,0,0,3,4\n14.7,8,440,230,3.23,5.345,17.42,0,0,3,4\n32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1\n30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2\n33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1\n21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1\n15.5,8,318,150,2.76,3.52,16.87,0,0,3,2\n15.2,8,304,150,3.15,3.435,17.3,0,0,3,2\n13.3,8,350,245,3.73,3.84,15.41,0,0,3,4\n19.2,8,400,175,3.08,3.845,17.05,0,0,3,2\n27.3,4,79,66,4.08,1.935,18.9,1,1,4,1\n26,4,120.3,91,4.43,2.14,16.7,0,1,5,2\n30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2\n15.8,8,351,264,4.22,3.17,14.5,0,1,5,4\n19.7,6,145,175,3.62,2.77,15.5,0,1,5,6\n15,8,301,335,3.54,3.57,14.6,0,1,5,8\n21.4,4,121,109,4.11,2.78,18.6,1,1,4,2\n"

Summary


  • Import a local .txt file: read_tsv(file.choose())

  • Import a local .csv file: read_csv(file.choose())

  • Import a file from internet: read_delim(url) if a txt file or read_csv(url) if a csv file


Infos

This analysis has been performed using R (ver. 3.2.3).









Want to Learn More on R Programming and Data Science?

Follow us by Email

by FeedBurner

On Social Networks:


 Get involved :
  Click to follow us on and Google+ :   
  Comment this article by clicking on "Discussion" button (top-right position of this page)
  Sign up as a member and post news and articles on STHDA web site.
This page has been seen 6520 times