Fast Reading of Data From TXT|CSV Files into R: readr package


There are many solutions for importing txt|csv file into R. In our previous articles, we described some best practices for preparing your data as well as R base functions (read.delim() and read.csv()) for importing txt|csv file into R.


In this article, we’ll describe the readr package, developed by Hadley Wickham. readr package provides a fast and friendly solution to read a delimited file into R.

Compared to R base functions, readr functions are:

  1. much faster (X10),
  2. have a helpful progress bar if loading is going to take a while and
  3. all functions work exactly the same way.


Reading Data From txt|csv Files: readr package

Preleminary tasks

  1. Launch RStudio as described here: Running RStudio and setting up your working directory

  2. Prepare your data as described here: Best practices for preparing your data

Installing and loading readr

# Installing
install.packages("readr")
# Loading
library("readr")

The readr package contains functions for reading i) delimited files, ii) lines and iii) the whole file.

Functions for reading delimited files: txt|csv

The function read_delim()[in readr package] is a general function to import a data table into R. Depending on the format of your file, you can also use:


  • read_csv(): to read a comma (“,”) separated values
  • read_csv2(): to read a semicolon (“;”) separated values
  • read_tsv(): to read a tab separated (“\t”) values


The simplified format of these functions are, as follow:

# General function
read_delim(file, delim, col_names = TRUE)
# Read comma (",") separated values
read_csv(file, col_names = TRUE)
# Read semicolon (";") separated values
# (this is common in European countries)
read_csv2(file, col_names = TRUE)
    
# Read tab separated values
read_tsv(file, col_names = TRUE)

  • file: file path, connexion or raw vector. Files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed. Files starting with “http://”, “https://”, “ftp://”, or “ftps://” will be automatically downloaded. Remote gz files can also be automatically downloaded & decompressed.
  • delim: the character that separates the values in the data file.
  • col_names: Either TRUE, FALSE or a character vector specifying column names. If TRUE, the first row of the input will be used as the column names.


read_csv() and read_tsv() are special case of the general function read_delim(). They’re useful for reading the most common types of flat file data, comma separated values and tab separated values, respectively.

The above mentioned functions return an object of class tbl_df which is data frame providing a nicer printing method, useful when working with large data sets.

Reading a file

Reading a local file

  • To import a local .txt or .csv files, the syntax would be:
# Read a txt file, named "mtcars.txt"
my_data <- read_tsv("mtcars.txt")
# Read a csv file, named "mtcars.csv"
my_data <- read_csv("mtcars.csv")

The above R code, assumes that the file “mtcars.txt” or “mtcars.csv” is in your current working directory. To know your current working directory, type the function getwd() in R console.

  • It’s also possible to choose a file interactively using the function file.choose(), which I recommend if you’re a beginner in R programming:
# Read a txt file
my_data <- read_tsv(file.choose())
# Read a csv file
my_data <- read_csv(file.choose())

If you use the R code above in RStudio, you will be asked to choose a file.

  • If your field separator is for example “|”, it’s possible to use the general function read_delim(), which reads in files with a user supplied delimiter:
my_data <- read_delim(file.choose(), sep = "|")

Reading a file from internet

It’s possible to use the functions read_delim(), read_csv() and read_tsv() to import files from the web.

my_data <- read_tsv("http://www.sthda.com/upload/boxplot_format.txt")
head(my_data)
   Nom variable Group
1 IND1       10     A
2 IND2        7     A
3 IND3       20     A
4 IND4       14     A
5 IND5       14     A
6 IND6       12     A

In the case of parsing problems

If there are parsing problems, a warning tells you how many, and you can retrieve the details with the function problems().

my_data <- read_csv(file.choose())
problems(my_data)

Specify column types

There are different types of data: numeric, character, logical, …

readr tries to guess automatically the type of data contained in each column. You might see a lot of warnings in a situation where readr has guessed the column type incorrectly. To fix these problems you can use the additional arguments col_type() to specify the data type of each column.

The following column types are available:


  • col_integer(): to specify integer (alias = “i”)
  • col_double(): to specify double (alias = “d”).
  • col_logical(): to specify logical variable (alias = “l”)
  • col_character(): leaves strings as is. Don’t convert it to a factor (alias = “c”).
  • col_factor(): to specify a factor (or grouping) variable (alias = “f”)
  • col_skip(): to ignore a column (alias = “-” or “_“)
  • col_date() (alias = “D”), col_datetime() (alias = “T”) and col_time() (“t”) to specify dates, date times, and times.


An example is as follow (column x is an integer (i) and column treatment = “character” (c):

read_csv("my_file.csv", col_types = cols(
  x = "i", # integer column
  treatment = "c" # character column
))

Reading lines from a file

Function: read_lines().

  1. Simplified format:
read_lines(file, skip = 0, n_max = -1L)

  • file: file path
  • skip: Number of lines to skip before reading data
  • n_max: Numbers of lines to read. If n is -1, all lines in file will be read.


The function read_lines() returns a character vector with one element for each line.

  1. Example of usage
# Demo file
my_file <- system.file("extdata/mtcars.csv", package = "readr")
# Read lines
my_data <- read_lines(my_file)
head(my_data)
[1] "\"mpg\",\"cyl\",\"disp\",\"hp\",\"drat\",\"wt\",\"qsec\",\"vs\",\"am\",\"gear\",\"carb\""
[2] "21,6,160,110,3.9,2.62,16.46,0,1,4,4"                                                     
[3] "21,6,160,110,3.9,2.875,17.02,0,1,4,4"                                                    
[4] "22.8,4,108,93,3.85,2.32,18.61,1,1,4,1"                                                   
[5] "21.4,6,258,110,3.08,3.215,19.44,1,0,3,1"                                                 
[6] "18.7,8,360,175,3.15,3.44,17.02,0,0,3,2"                                                  

Read whole file

  1. Simplified format
read_file(file)
  • Example of usage
# Demo file
my_file <- system.file("extdata/mtcars.csv", package = "readr")
# Read whole file
read_file(my_file)
[1] "\"mpg\",\"cyl\",\"disp\",\"hp\",\"drat\",\"wt\",\"qsec\",\"vs\",\"am\",\"gear\",\"carb\"\n21,6,160,110,3.9,2.62,16.46,0,1,4,4\n21,6,160,110,3.9,2.875,17.02,0,1,4,4\n22.8,4,108,93,3.85,2.32,18.61,1,1,4,1\n21.4,6,258,110,3.08,3.215,19.44,1,0,3,1\n18.7,8,360,175,3.15,3.44,17.02,0,0,3,2\n18.1,6,225,105,2.76,3.46,20.22,1,0,3,1\n14.3,8,360,245,3.21,3.57,15.84,0,0,3,4\n24.4,4,146.7,62,3.69,3.19,20,1,0,4,2\n22.8,4,140.8,95,3.92,3.15,22.9,1,0,4,2\n19.2,6,167.6,123,3.92,3.44,18.3,1,0,4,4\n17.8,6,167.6,123,3.92,3.44,18.9,1,0,4,4\n16.4,8,275.8,180,3.07,4.07,17.4,0,0,3,3\n17.3,8,275.8,180,3.07,3.73,17.6,0,0,3,3\n15.2,8,275.8,180,3.07,3.78,18,0,0,3,3\n10.4,8,472,205,2.93,5.25,17.98,0,0,3,4\n10.4,8,460,215,3,5.424,17.82,0,0,3,4\n14.7,8,440,230,3.23,5.345,17.42,0,0,3,4\n32.4,4,78.7,66,4.08,2.2,19.47,1,1,4,1\n30.4,4,75.7,52,4.93,1.615,18.52,1,1,4,2\n33.9,4,71.1,65,4.22,1.835,19.9,1,1,4,1\n21.5,4,120.1,97,3.7,2.465,20.01,1,0,3,1\n15.5,8,318,150,2.76,3.52,16.87,0,0,3,2\n15.2,8,304,150,3.15,3.435,17.3,0,0,3,2\n13.3,8,350,245,3.73,3.84,15.41,0,0,3,4\n19.2,8,400,175,3.08,3.845,17.05,0,0,3,2\n27.3,4,79,66,4.08,1.935,18.9,1,1,4,1\n26,4,120.3,91,4.43,2.14,16.7,0,1,5,2\n30.4,4,95.1,113,3.77,1.513,16.9,1,1,5,2\n15.8,8,351,264,4.22,3.17,14.5,0,1,5,4\n19.7,6,145,175,3.62,2.77,15.5,0,1,5,6\n15,8,301,335,3.54,3.57,14.6,0,1,5,8\n21.4,4,121,109,4.11,2.78,18.6,1,1,4,2\n"

Summary


  • Import a local .txt file: read_tsv(file.choose())

  • Import a local .csv file: read_csv(file.choose())

  • Import a file from internet: read_delim(url) if a txt file or read_csv(url) if a csv file


Infos

This analysis has been performed using R (ver. 3.2.3).


Enjoyed this article? I’d be very grateful if you’d help it spread by emailing it to a friend, or sharing it on Twitter, Facebook or Linked In.

Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Avez vous aimé cet article? Je vous serais très reconnaissant si vous aidiez à sa diffusion en l'envoyant par courriel à un ami ou en le partageant sur Twitter, Facebook ou Linked In.

Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!





This page has been seen 172918 times