Fast Reading of Data From TXT|CSV Files into R: readr package
There are many solutions for importing txt|csv file into R. In our previous articles, we described some best practices for preparing your data as well as R base functions (read.delim() and read.csv()) for importing txt|csv file into R.
In this article, we’ll describe the readr package, developed by Hadley Wickham. readr package provides a fast and friendly solution to read a delimited file into R.
Compared to R base functions, readr functions are:
- much faster (X10),
- have a helpful progress bar if loading is going to take a while and
- all functions work exactly the same way.
Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data
Installing and loading readr
# Installing install.packages("readr") # Loading library("readr")
The readr package contains functions for reading i) delimited files, ii) lines and iii) the whole file.
Functions for reading delimited files: txt|csv
The function read_delim()[in readr package] is a general function to import a data table into R. Depending on the format of your file, you can also use:
- read_csv(): to read a comma (“,”) separated values
- read_csv2(): to read a semicolon (“;”) separated values
- read_tsv(): to read a tab separated (“\t”) values
The simplified format of these functions are, as follow:
# General function read_delim(file, delim, col_names = TRUE) # Read comma (",") separated values read_csv(file, col_names = TRUE) # Read semicolon (";") separated values # (this is common in European countries) read_csv2(file, col_names = TRUE) # Read tab separated values read_tsv(file, col_names = TRUE)
- file: file path, connexion or raw vector. Files ending in .gz, .bz2, .xz, or .zip will be automatically uncompressed. Files starting with “http://”, “https://”, “ftp://”, or “ftps://” will be automatically downloaded. Remote gz files can also be automatically downloaded & decompressed.
- delim: the character that separates the values in the data file.
- col_names: Either TRUE, FALSE or a character vector specifying column names. If TRUE, the first row of the input will be used as the column names.
read_csv() and read_tsv() are special case of the general function read_delim(). They’re useful for reading the most common types of flat file data, comma separated values and tab separated values, respectively.
The above mentioned functions return an object of class tbl_df which is data frame providing a nicer printing method, useful when working with large data sets.
Reading a file
Reading a local file
- To import a local .txt or .csv files, the syntax would be:
# Read a txt file, named "mtcars.txt" my_data <- read_tsv("mtcars.txt") # Read a csv file, named "mtcars.csv" my_data <- read_csv("mtcars.csv")
The above R code, assumes that the file “mtcars.txt” or “mtcars.csv” is in your current working directory. To know your current working directory, type the function getwd() in R console.
- It’s also possible to choose a file interactively using the function file.choose(), which I recommend if you’re a beginner in R programming:
# Read a txt file my_data <- read_tsv(file.choose()) # Read a csv file my_data <- read_csv(file.choose())
If you use the R code above in RStudio, you will be asked to choose a file.
- If your field separator is for example “|”, it’s possible to use the general function read_delim(), which reads in files with a user supplied delimiter:
my_data <- read_delim(file.choose(), sep = "|")
Reading a file from internet
It’s possible to use the functions read_delim(), read_csv() and read_tsv() to import files from the web.
my_data <- read_tsv("http://www.sthda.com/upload/boxplot_format.txt") head(my_data)
Nom variable Group 1 IND1 10 A 2 IND2 7 A 3 IND3 20 A 4 IND4 14 A 5 IND5 14 A 6 IND6 12 A
In the case of parsing problems
If there are parsing problems, a warning tells you how many, and you can retrieve the details with the function problems().
my_data <- read_csv(file.choose()) problems(my_data)
Specify column types
There are different types of data: numeric, character, logical, …
readr tries to guess automatically the type of data contained in each column. You might see a lot of warnings in a situation where readr has guessed the column type incorrectly. To fix these problems you can use the additional arguments col_type() to specify the data type of each column.
The following column types are available:
- col_integer(): to specify integer (alias = “i”)
- col_double(): to specify double (alias = “d”).
- col_logical(): to specify logical variable (alias = “l”)
- col_character(): leaves strings as is. Don’t convert it to a factor (alias = “c”).
- col_factor(): to specify a factor (or grouping) variable (alias = “f”)
- col_skip(): to ignore a column (alias = “-” or “_“)
- col_date() (alias = “D”), col_datetime() (alias = “T”) and col_time() (“t”) to specify dates, date times, and times.
An example is as follow (column x is an integer (i) and column treatment = “character” (c):
read_csv("my_file.csv", col_types = cols( x = "i", # integer column treatment = "c" # character column ))
Reading lines from a file
- Simplified format:
read_lines(file, skip = 0, n_max = -1L)
- file: file path
- skip: Number of lines to skip before reading data
- n_max: Numbers of lines to read. If n is -1, all lines in file will be read.
The function read_lines() returns a character vector with one element for each line.
- Example of usage
# Demo file my_file <- system.file("extdata/mtcars.csv", package = "readr") # Read lines my_data <- read_lines(my_file) head(my_data)
 "\"mpg\",\"cyl\",\"disp\",\"hp\",\"drat\",\"wt\",\"qsec\",\"vs\",\"am\",\"gear\",\"carb\""  "21,6,160,110,3.9,2.62,16.46,0,1,4,4"  "21,6,160,110,3.9,2.875,17.02,0,1,4,4"  "22.8,4,108,93,3.85,2.32,18.61,1,1,4,1"  "21.4,6,258,110,3.08,3.215,19.44,1,0,3,1"  "18.7,8,360,175,3.15,3.44,17.02,0,0,3,2"
Read whole file
- Simplified format
- Example of usage
# Demo file my_file <- system.file("extdata/mtcars.csv", package = "readr") # Read whole file read_file(my_file)
Import a local .txt file: read_tsv(file.choose())
Import a local .csv file: read_csv(file.choose())
- Import a file from internet: read_delim(url) if a txt file or read_csv(url) if a csv file
This analysis has been performed using R (ver. 3.2.3).
Show me some love with the like buttons below... Thank you and please don't forget to share and comment below!!
Montrez-moi un peu d'amour avec les like ci-dessous ... Merci et n'oubliez pas, s'il vous plaît, de partager et de commenter ci-dessous!
Recommended for You!
Recommended for you
This section contains best data science and self-development resources to help you on your path.
Coursera - Online Courses and Specialization
- Course: Machine Learning: Master the Fundamentals by Standford
- Specialization: Data Science by Johns Hopkins University
- Specialization: Python for Everybody by University of Michigan
- Courses: Build Skills for a Top Job in any Industry by Coursera
- Specialization: Master Machine Learning Fundamentals by University of Washington
- Specialization: Statistics with R by Duke University
- Specialization: Software Development in R by Johns Hopkins University
- Specialization: Genomic Data Science by Johns Hopkins University
Popular Courses Launched in 2020
- Google IT Automation with Python by Google
- AI for Medicine by deeplearning.ai
- Epidemiology in Public Health Practice by Johns Hopkins University
- AWS Fundamentals by Amazon Web Services
- The Science of Well-Being by Yale University
- Google IT Support Professional by Google
- Python for Everybody by University of Michigan
- IBM Data Science Professional Certificate by IBM
- Business Foundations by University of Pennsylvania
- Introduction to Psychology by Yale University
- Excel Skills for Business by Macquarie University
- Psychological First Aid by Johns Hopkins University
- Graphic Design by Cal Arts
Books - Data Science
- Practical Guide to Cluster Analysis in R by A. Kassambara (Datanovia)
- Practical Guide To Principal Component Methods in R by A. Kassambara (Datanovia)
- Machine Learning Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R Graphics Essentials for Great Data Visualization by A. Kassambara (Datanovia)
- GGPlot2 Essentials for Great Data Visualization in R by A. Kassambara (Datanovia)
- Network Analysis and Visualization in R by A. Kassambara (Datanovia)
- Practical Statistics in R for Comparing Groups: Numerical Variables by A. Kassambara (Datanovia)
- Inter-Rater Reliability Essentials: Practical Guide in R by A. Kassambara (Datanovia)
- R for Data Science: Import, Tidy, Transform, Visualize, and Model Data by Hadley Wickham & Garrett Grolemund
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems by Aurelien Géron
- Practical Statistics for Data Scientists: 50 Essential Concepts by Peter Bruce & Andrew Bruce
- Hands-On Programming with R: Write Your Own Functions And Simulations by Garrett Grolemund & Hadley Wickham
- An Introduction to Statistical Learning: with Applications in R by Gareth James et al.
- Deep Learning with R by François Chollet & J.J. Allaire
- Deep Learning with Python by François Chollet
Want to Learn More on R Programming and Data Science?
Follow us by Email On Social Networks: