Easy Guides

Computing and Adding new Variables to a Data Frame in R

Sun, 01 May 2016 10:43:30 +0200

Pleleminary tasks
Install and load dplyr package for renaming columns
dplyr::mutate(): Add new variables by preserving existing ones
dplyr::transmute(): Make new variables by dropping existing ones
Use mutate() and transmute() programmatically inside a function:
transform(): R base function to compute and add new variables
Summary
Related articles
Infos

Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is modern convention way to work with your data. We also described crutial steps to reshape your data with R for easier analyses.

Here, you we’ll learn how to compute and add new variables to a data frame in R. This can be done easily using the functions mutate() and transmute() in dplyr R package.

mutate(): Computes and adds new variable(s). Preserves existing variables. It’s similar to the R base function transform().
transmute(): Computes new variable(s). Drops existing variables.

Figure adapted from RStudio data wrangling cheatsheet

Pleleminary tasks

Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.

Here, we’ll use the R built-in iris data set, which we start by converting to a tibble data frame (tbl_df). Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.

# Create my_data
my_data <- iris[, -5]

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data

Source: local data frame [150 x 4]

   Sepal.Length Sepal.Width Petal.Length Petal.Width
                                
1           5.1         3.5          1.4         0.2
2           4.9         3.0          1.4         0.2
3           4.7         3.2          1.3         0.2
4           4.6         3.1          1.5         0.2
5           5.0         3.6          1.4         0.2
6           5.4         3.9          1.7         0.4
7           4.6         3.4          1.4         0.3
8           5.0         3.4          1.5         0.2
9           4.4         2.9          1.4         0.2
10          4.9         3.1          1.5         0.1
..          ...         ...          ...         ...

Install and load dplyr package for renaming columns

Install dplyr

install.packages("dplyr")

Load dplyr:

library("dplyr")

dplyr::mutate(): Add new variables by preserving existing ones

Add new columns (sepal_by_petal_*) by preserving existing ones:

mutate(my_data,
       sepal_by_petal_l = Sepal.Length/Petal.Length
       )

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width sepal_by_petal_l
          (dbl)       (dbl)        (dbl)       (dbl)            (dbl)
1           5.1         3.5          1.4         0.2         3.642857
2           4.9         3.0          1.4         0.2         3.500000
3           4.7         3.2          1.3         0.2         3.615385
4           4.6         3.1          1.5         0.2         3.066667
5           5.0         3.6          1.4         0.2         3.571429
6           5.4         3.9          1.7         0.4         3.176471
7           4.6         3.4          1.4         0.3         3.285714
8           5.0         3.4          1.5         0.2         3.333333
9           4.4         2.9          1.4         0.2         3.142857
10          4.9         3.1          1.5         0.1         3.266667
..          ...         ...          ...         ...              ...

dplyr::transmute(): Make new variables by dropping existing ones

Add new columns (sepal_by_petal_*) by dropping existing ones:

transmute(my_data, 
            sepal_by_petal_l = Sepal.Length/Petal.Length,
            sepal_by_petal_w = Sepal.Width/Petal.Width
            )

Source: local data frame [150 x 2]

   sepal_by_petal_l sepal_by_petal_w
              (dbl)            (dbl)
1          3.642857         17.50000
2          3.500000         15.00000
3          3.615385         16.00000
4          3.066667         15.50000
5          3.571429         18.00000
6          3.176471          9.75000
7          3.285714         11.33333
8          3.333333         17.00000
9          3.142857         14.50000
10         3.266667         31.00000
..              ...              ...

Use mutate() and transmute() programmatically inside a function:

mutate() and transmute() are best-suited for interactive use. The functions mutate_() and transmute() should be used for calling from a function. In this case the input must be “quoted”.

There are three ways to quote inputs that dplyr understands:

With a formula, ~Sepal.Length.
With quote(), quote(Sepal.Length).
As a string: “Sepal.Length”.

# Use formula
mutate_(my_data, 
            sepal_by_petal_l = ~Sepal.Length/Petal.Length,
            sepal_by_petal_w = ~Sepal.Width/Petal.Width
            )

# Or use quote
transmute_(my_data, 
            sepal_by_petal_l = quote(Sepal.Length/Petal.Length),
            sepal_by_petal_w = quote(Sepal.Width/Petal.Width)
            )

# or, this
transmute_(my_data, 
            sepal_by_petal_l = "Sepal.Length/Petal.Length",
            sepal_by_petal_w = "Sepal.Width/Petal.Width"
            )

transform(): R base function to compute and add new variables

dplyr::mutate() works similarly to the R base function transform(), except that in mutate() you can refer to variables you’ve just created. This is not possible in transform().

my_data2 <- transform(my_data, neg_sepal_length = -Sepal.Length)
head(my_data2)

  Sepal.Length Sepal.Width Petal.Length Petal.Width neg_sepal_length
1          5.1         3.5          1.4         0.2             -5.1
2          4.9         3.0          1.4         0.2             -4.9
3          4.7         3.2          1.3         0.2             -4.7
4          4.6         3.1          1.5         0.2             -4.6
5          5.0         3.6          1.4         0.2             -5.0
6          5.4         3.9          1.7         0.4             -5.4

Summary

dplyr::mutate(iris, sepal = 2*Sepal.Length): Computes and appends new variable(s).
dplyr::transmute(iris, sepal = 2*Sepal.Length): Makes new variable(s) and drops existing ones.
transform(iris, sepal = 2*Sepal.Length): R base function similar to mutate().

Infos

This analysis has been performed using R (ver. 3.2.4).

Identifying and Removing Duplicate Data in R

Thu, 14 Apr 2016 22:53:19 +0200

Pleleminary tasks
R base functions
- Find and drop duplicate elements: duplicated()
- Extract unique elements: unique()
Remove duplicate rows using dplyr
Summary
Related articles
Infos

Here, you we’ll learn how to remove duplicate data using R base functions (duplicated() and unique()) as well as the function distinct [in dplyr package].

Pleleminary tasks

Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.

# Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
                                  
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

R base functions

In this section, we’ll describe the function unique() [for extracting unique elements] and the function duplicated() [for identifying duplicated elements].

Find and drop duplicate elements: duplicated()

The function duplicated() returns a logical vector where TRUE specifies which elements of a vector or data frame are duplicates.

Given the following vector:

x <- c(1, 1, 4, 5, 4, 6)

To find the position of duplicate elements in x, use this:

duplicated(x)

[1] FALSE  TRUE FALSE FALSE  TRUE FALSE

Extract duplicate elements:

x[duplicated(x)]

[1] 1 4

If you want to remove duplicated elements, use !duplicated(), where ! is a logical negation:

x[!duplicated(x)]

[1] 1 4 5 6

Following this way, you can remove duplicate rows from a data frame based on a column values, as follow:

# Remove duplicates based on Sepal.Width columns
my_data[!duplicated(my_data$Sepal.Width), ]

Source: local data frame [23 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
                                  
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           4.4         2.9          1.4         0.2  setosa
9           5.4         3.7          1.5         0.2  setosa
10          5.8         4.0          1.2         0.2  setosa
..          ...         ...          ...         ...     ...

! is a logical negation. !duplicated() means that we don’t want duplicate rows.

Extract unique elements: unique()

Given the following vector:

x <- c(1, 1, 4, 5, 4, 6)

You can extract unique elements as follow:

unique(x)

[1] 1 4 5 6

It’s also possible to apply unique() on a data frame, for removing duplicated rows as follow:

unique(my_data)

Source: local data frame [149 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
                                  
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Remove duplicate rows using dplyr

The function distinct() in dplyr package can be used to keep only unique/distinct rows from a data frame. If there are duplicate rows, only the first row is preserved. It’s an efficient version of the R base function unique().

The dplyr package can be loaded and installed as follow:

# Install
install.packages("dplyr")

# Load
library("dplyr")

Remove duplicate rows based on all columns:

distinct(my_data)

Source: local data frame [149 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Remove duplicate rows based on certain columns (variables):

# Remove duplicated rows based on Sepal.Length
distinct(my_data, Sepal.Length)

Source: local data frame [35 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.4         2.9          1.4         0.2  setosa
8           4.8         3.4          1.6         0.2  setosa
9           4.3         3.0          1.1         0.1  setosa
10          5.8         4.0          1.2         0.2  setosa
..          ...         ...          ...         ...     ...

# Remove duplicated rows based on 
# Sepal.Length and Petal.Width
distinct(my_data, Sepal.Length, Petal.Width)

Source: local data frame [110 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           4.4         2.9          1.4         0.2  setosa
9           4.9         3.1          1.5         0.1  setosa
10          5.4         3.7          1.5         0.2  setosa
..          ...         ...          ...         ...     ...

distinct() is best-suited for interactive use. The function distinct_() should be used for calling from a function. In this case the input must be “quoted”.

distinct_(my_data,  "Sepal.Length", "Petal.Width")

Summary

Remove duplicate rows based on one or more column values: dplyr::distinct(my_data, Sepal.Length)
R base function to extract unique elements from vectors and data frames: unique(my_data)
R base function to determine duplicate elements: duplicated(my_data)

Infos

This analysis has been performed using R (ver. 3.2.3).

Subsetting Data Frame Columns in R

Thu, 14 Apr 2016 22:48:39 +0200

Pleleminary tasks
Install and load dplyr package
Selecting column by position
Select columns by names
Drop columns
Use select() programmatically inside an R function
Summary
Related articles
Infos

Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is the best and modern way to work with your data. We next described crutial steps to reshape your data with R for easier analyses. Additionally, we provided quick start guides for subsetting data frame rows based on some logical criteria.

Here, you we’ll learn how to subset data frame columns (i.e., variables) by names using the function select() [in dplyr package].

Pleleminary tasks

Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.

# Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
                                  
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Install and load dplyr package

Install dplyr

install.packages("dplyr")

Load dplyr:

library("dplyr")

Selecting column by position

Select columns 1 to 2:

my_data[, 1:2]

Select column 1 and 3 but not 2:

my_data[, c(1, 3)]

Select columns by names

Select columns by names: Sepal.Length and Petal.Length

select(my_data, Sepal.Length, Petal.Length)

Source: local data frame [150 x 2]

   Sepal.Length Petal.Length
          (dbl)        (dbl)
1           5.1          1.4
2           4.9          1.4
3           4.7          1.3
4           4.6          1.5
5           5.0          1.4
6           5.4          1.7
7           4.6          1.4
8           5.0          1.5
9           4.4          1.4
10          4.9          1.5
..          ...          ...

Select all columns from Sepal.Length to Petal.Length

select(my_data, Sepal.Length:Petal.Length)

Source: local data frame [150 x 3]

   Sepal.Length Sepal.Width Petal.Length
          (dbl)       (dbl)        (dbl)
1           5.1         3.5          1.4
2           4.9         3.0          1.4
3           4.7         3.2          1.3
4           4.6         3.1          1.5
5           5.0         3.6          1.4
6           5.4         3.9          1.7
7           4.6         3.4          1.4
8           5.0         3.4          1.5
9           4.4         2.9          1.4
10          4.9         3.1          1.5
..          ...         ...          ...

There are several special functions that can be used inside select(): starts_with(), ends_with(), contains(), matches(), one_of(), etc.

# Select column whose name starts with "Petal"
select(my_data, starts_with("Petal"))

# Select column whose name ends with "Width"
select(my_data, ends_with("Width"))

# Select columns whose names contains "etal"
select(my_data, contains("etal"))
  
# Select columns whose name maches a regular expression
select(my_data, matches(".t."))

# selects variables provided in a character vector.
select(my_data, one_of(c("Sepal.Length", "Petal.Length")))

Drop columns

Note that, to remove a column from a data frame, prepend its name by minus -.

Dropping Sepal.Length and Petal.Length:

select(my_data, -Sepal.Length, -Petal.Length)

Dropping columns from Sepal.Length to Petal.Length:

select(my_data, -(Sepal.Length:Petal.Length))

Source: local data frame [150 x 2]

   Petal.Width Species
         (dbl)  (fctr)
1          0.2  setosa
2          0.2  setosa
3          0.2  setosa
4          0.2  setosa
5          0.2  setosa
6          0.4  setosa
7          0.3  setosa
8          0.2  setosa
9          0.2  setosa
10         0.1  setosa
..         ...     ...

Dropping columns whose name starts with “Petal”:

select(my_data, -starts_with("Petal"))

Source: local data frame [150 x 3]

   Sepal.Length Sepal.Width Species
          (dbl)       (dbl)  (fctr)
1           5.1         3.5  setosa
2           4.9         3.0  setosa
3           4.7         3.2  setosa
4           4.6         3.1  setosa
5           5.0         3.6  setosa
6           5.4         3.9  setosa
7           4.6         3.4  setosa
8           5.0         3.4  setosa
9           4.4         2.9  setosa
10          4.9         3.1  setosa
..          ...         ...     ...

Note that, if you want to drop columns by position, the syntax is as follow.

# Drop column 1
my_data[, -1]

# Drop columns 1 to 3
my_data[, -(1:3)]

# Drop columns 1 and 3 but not 2
my_data[, -c(1, 3)]

Use select() programmatically inside an R function

Dplyr uses non-standard evaluation (NSE), which is great for interactive use and save you typing. Behind the scene, NSE is powered by the lazyeval package.

select() is best-suited for interactive use. The function select_() should be used for calling from a function. In this case the input must be “quoted”.

There are three ways to quote inputs that dplyr understands:

With a formula, ~Sepal.Length.
With quote(), quote(Sepal.Length).
As a string: “Sepal.Length”.

For example, you can select the column Sepal.Length by typing the following R code:

select_(my_data, ~Sepal.Length)

Or, by using this:

select_(my_data, "Sepal.Length")

It’s also possible to use function inside select_(). The R package lazyeval is required. It can be installed as follow:

install.packages("lazyeval")

Use lazyeval package to interpret functions inside select_():

# Select column names that match ".t."
select_(my_data, lazyeval::interp(~matches(x), x = ".t."))

# Select column names that start with "Petal"
select_(my_data, lazyeval::interp(~starts_with(x), x = "Petal"))

# Dropping columns: Sepal.Length and Sepal.Width
select_(my_data, quote(-Sepal.Length), quote(-Sepal.Width))

# Or use this
select_(my_data, .dots = list(quote(-Petal.Length), quote(-Petal.Width)))

Summary

Select columns by position: my_data[, 1:2]
Select columns by name: dplyr::select(my_data, Sepal.Length, Petal.Length)
Drop columns: dplyr::select(my_data, -Sepal.Length, -Petal.Length)
Helper functions: starts_with(), ends_with(), contains(), matches(), one_of()
- dplyr::select(my_data, starts_with(“Petal”))
- dplyr::select(my_data, ends_with(“Length”))

Infos

This analysis has been performed using R (ver. 3.2.3).

Subsetting Data Frame Rows in R

Thu, 14 Apr 2016 22:36:43 +0200

Pleleminary tasks
Install and load dplyr package
Extracting rows by position: dplyr::slice()
Extracting rows by criteria: dplyr::filter()
Extracting rows by criteria with R base functions: subset()
Select random rows from a table
Select top n rows ordered by a variable
Summary
Related articles
Infos

Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is the best and modern way to work with your data. We also described crutial steps to reshape your data with R for easier analyses.

Here, you we’ll learn how to subset (or filter) rows of a data frame based on certain criteria. This can be done easily using R functions provided by dplyr package. It’s also possible to use the R base functions subset().

Among the functions available in dplyr package, there are:

filter(iris, Sepal.Length >7): Extract rows based on logical criteria
distinct(iris): Remove duplicated rows
sample_n(iris, 10, replace = FALSE): Select n random rows from a table
sample_frac(iris, 0.5, replace = FALSE): Select a random fraction of rows
slice(iris, 3:8): Select rows by position
top_n(iris, 10, Sepal.Length): Select and order top n rows (by groups if grouped data)

We’ll start by describing how to subset rows based on some criteria, with the dplyr::filter() function as well as the R base function subset(). Next, we’ll show you how to select rows randomly using sample_n() and sample_frac() functions. Finally, we’ll describe how to select the top n elements in each group, ordered by a given variables.

Pleleminary tasks

Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.

# Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
                                  
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Install and load dplyr package

Install dplyr

install.packages("dplyr")

Load dplyr:

library("dplyr")

Extracting rows by position: dplyr::slice()

Select rows 1 to 6:

my_data[1:6, ]

or you can also use the function slice()[in dplyr]:

slice(my_data, 1:6)

Extracting rows by criteria: dplyr::filter()

The function filter() is used to filter rows that meet some logical criteria.

Logical comparisons

Before continuing, we introduce the notion of logical comparisons and operators, which are important to know for filtering data.

The “logical” comparison operators available in R are:

Logical comparisons
- <: for less than
- >: for greater than
- <=: for less than or equal to
- >=: for greater than or equal to
- ==: for equal to each other
- !=: not equal to each other
- %in%: group membership. For example, “value %in% c(2, 3)” means that value can takes 2 or 3.
- is.na(): is NA
- !is.na(): is not NA.
Logical operators
- value == 2|3: means that the value equal 2 or (|) 3. value %in% c(2, 3) is a shortcut equivalent to value == 2|3.
- &: means and. For example sex == “female” & age > 25

The most frequent mistake made by beginners in R is to use = instead of == when testing for equality. Remember that, when you are testing for equality, you should always use == (not =).

Extracting rows based on logical criteria

One-column based criteria: Extract rows where Sepal.Length > 7:

filter(my_data, Sepal.Length > 7)

Source: local data frame [12 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
          (dbl)       (dbl)        (dbl)       (dbl)    (fctr)
1           7.1         3.0          5.9         2.1 virginica
2           7.6         3.0          6.6         2.1 virginica
3           7.3         2.9          6.3         1.8 virginica
4           7.2         3.6          6.1         2.5 virginica
5           7.7         3.8          6.7         2.2 virginica
6           7.7         2.6          6.9         2.3 virginica
7           7.7         2.8          6.7         2.0 virginica
8           7.2         3.2          6.0         1.8 virginica
9           7.2         3.0          5.8         1.6 virginica
10          7.4         2.8          6.1         1.9 virginica
11          7.9         3.8          6.4         2.0 virginica
12          7.7         3.0          6.1         2.3 virginica

Multiple-column based criteria: Extract rows where Sepal.Length > 6.7 and Sepal.Width ≤ 3:

filter(my_data, Sepal.Length > 6.7, Sepal.Width <= 3)

Source: local data frame [10 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
          (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1           6.8         2.8          4.8         1.4 versicolor
2           7.1         3.0          5.9         2.1  virginica
3           7.6         3.0          6.6         2.1  virginica
4           7.3         2.9          6.3         1.8  virginica
5           6.8         3.0          5.5         2.1  virginica
6           7.7         2.6          6.9         2.3  virginica
7           7.7         2.8          6.7         2.0  virginica
8           7.2         3.0          5.8         1.6  virginica
9           7.4         2.8          6.1         1.9  virginica
10          7.7         3.0          6.1         2.3  virginica

Test for equality (==): Extract rows where Sepal.Length > 6.5 and Species = “versicolor”:

filter(my_data, Sepal.Length > 6.7, Species == "versicolor")

Source: local data frame [3 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
         (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1          7.0         3.2          4.7         1.4 versicolor
2          6.9         3.1          4.9         1.5 versicolor
3          6.8         2.8          4.8         1.4 versicolor

Using OR operator (|): Extract rows where Sepal.Length > 6.5 and (Species = “versicolor” or Species = “virginica”):

Use this:

filter(my_data, Sepal.Length > 6.7, 
       Species == "versicolor" | Species == "virginica" )

Or, equivalently, use this shortcut (%in% operator):

filter(my_data, Sepal.Length > 6.7, 
      Species %in% c("versicolor", "virginica" ))

Source: local data frame [20 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
          (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1           7.0         3.2          4.7         1.4 versicolor
2           6.9         3.1          4.9         1.5 versicolor
3           6.8         2.8          4.8         1.4 versicolor
4           7.1         3.0          5.9         2.1  virginica
5           7.6         3.0          6.6         2.1  virginica
6           7.3         2.9          6.3         1.8  virginica
7           7.2         3.6          6.1         2.5  virginica
8           6.8         3.0          5.5         2.1  virginica
9           7.7         3.8          6.7         2.2  virginica
10          7.7         2.6          6.9         2.3  virginica
11          6.9         3.2          5.7         2.3  virginica
12          7.7         2.8          6.7         2.0  virginica
13          7.2         3.2          6.0         1.8  virginica
14          7.2         3.0          5.8         1.6  virginica
15          7.4         2.8          6.1         1.9  virginica
16          7.9         3.8          6.4         2.0  virginica
17          7.7         3.0          6.1         2.3  virginica
18          6.9         3.1          5.4         2.1  virginica
19          6.9         3.1          5.1         2.3  virginica
20          6.8         3.2          5.9         2.3  virginica

Note that, filter() works similarly to the R base function subset(), which will be described in the next sections.

Removing missing values

As described in the chapter named R programming basics, it’s possible to use the function is.na(x) to check whether a data contains missing value. It takes a vector x as an input and returns a logical vector in which the value TRUE specifies that the corresponding element in x is NA.

Create a tbl with missing values using data_frame() [in dplyr]. In R NA (Not Available) is used to represent missing values:

# Create a data frame with missing data
friends_data <- data_frame(
  name = c("Nicolas", "Thierry", "Bernard", "Jerome"),
  age = c(27, 25, 29, 26),
  height = c(180, NA, NA, 169),
  married = c("yes", "yes", "no", "no")
)
# Print
friends_data

Source: local data frame [4 x 4]

     name   age height married
    (chr) (dbl)  (dbl)   (chr)
1 Nicolas    27    180     yes
2 Thierry    25     NA     yes
3 Bernard    29     NA      no
4  Jerome    26    169      no

Extract rows where height is NA:

filter(friends_data, is.na(height))

Source: local data frame [2 x 4]

     name   age height married
    (chr) (dbl)  (dbl)   (chr)
1 Thierry    25     NA     yes
2 Bernard    29     NA      no

Exclude (drop) rows where height is NA:

filter(friends_data, !is.na(height))

Source: local data frame [2 x 4]

     name   age height married
    (chr) (dbl)  (dbl)   (chr)
1 Nicolas    27    180     yes
2  Jerome    26    169      no

In the R code above, !is.na() means that “we don’t want” NAs.

Using filter() programmatically inside an R function

filter() is best-suited for interactive use. The function filter_() should be used for calling from a function. In this case the input must be “quoted”.

There are three ways to quote inputs that dplyr understands:

With a formula, ~Sepal.Length.
With quote(), quote(Sepal.Length).
As a string: “Sepal.Length”.

# Extract rows where Sepal.Length > 7
filter_(my_data, "Sepal.Length > 7")

# Extract rows where Sepal.Length > 7 and Sepal.Width <= 3
filter_(my_data, "Sepal.Length > 7 & Sepal.Width <= 3")

# Extract rows where Sepal.Length > 6.5 and
# (Species = "versicolor" or Species = "virginica")
filter_(my_data, quote(Sepal.Length > 6.7 & 
      Species %in% c("versicolor", "virginica" )))

Extracting rows by criteria with R base functions: subset()

Extract rows where Sepal.Length > 7 and Sepal.Width ≤ 3:

You can use this:

my_data[my_data$Sepal.Length > 7 & my_data$Sepal.Width <= 3, ]

Or use the R base function subset():

subset(my_data, Sepal.Length > 7 & Sepal.Width <= 3)

Extract rows where Sepal.Length > 6.7 and (Species = “versicolor” or Species = “virginica”)

subset(my_data, Sepal.Length > 6.7, 
      Species %in% c("versicolor", "virginica" ))

subset() works also with vectors as follow.

my_vec <- 1:10
subset(my_vec, my_vec >5 & my_vec < 8)

[1] 6 7

Note that, R base functions require more typing than dplyr::filter(), so we recommend dplyr solutions.

Select random rows from a table

It’s possible to select either n random rows with the function sample_n() or a random fraction of rows with sample_frac().

We first use the function set.seed() to initiate random number generator engine. This important for users to reproduce the analysis.

set.seed(1234)
# Extract 5 random rows without replacement
sample_n(my_data, 5, replace = FALSE)

Source: local data frame [5 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
         (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1          5.1         3.5          1.4         0.3     setosa
2          5.8         2.6          4.0         1.2 versicolor
3          5.5         2.6          4.4         1.2 versicolor
4          6.1         3.0          4.6         1.4 versicolor
5          7.2         3.2          6.0         1.8  virginica

# Extract 5% of rows, randomly without replacement
sample_frac(my_data, 0.05, replace = FALSE)

Source: local data frame [8 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
         (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1          5.7         2.9          4.2         1.3 versicolor
2          4.9         3.0          1.4         0.2     setosa
3          4.9         3.1          1.5         0.2     setosa
4          6.2         2.9          4.3         1.3 versicolor
5          6.6         3.0          4.4         1.4 versicolor
6          6.3         3.3          6.0         2.5  virginica
7          6.0         2.9          4.5         1.5 versicolor
8          5.0         3.5          1.3         0.3     setosa

Note that, it’s also possible to use the R base function sample(), but it requires more typing.

set.seed(1234)
my_data[sample(1:nrow(my_data), 5, replace = FALSE), , drop = FALSE]

Select top n rows ordered by a variable

As mentioned above, the function top_n(), can be used to select the top n entries in each group.

The format is as follow:

top_n(x, n, wt)

x: Data table
n: Number of rows to return. If x is grouped, this is the number of rows per group. May include more than n if there are ties.
wt(Optional): The variable to use for ordering. If not specified, defaults to the last variable in the data table.

Select the top 5 rows ordered by Sepal.Length

top_n(my_data, 5, Sepal.Length)

Source: local data frame [5 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
         (dbl)       (dbl)        (dbl)       (dbl)    (fctr)
1          7.7         3.8          6.7         2.2 virginica
2          7.7         2.6          6.9         2.3 virginica
3          7.7         2.8          6.7         2.0 virginica
4          7.9         3.8          6.4         2.0 virginica
5          7.7         3.0          6.1         2.3 virginica

Group by the column Species and select the top 5 of each group ordered by Sepal.Length:

my_data %>% 
  group_by(Species) %>%
  top_n(5, Sepal.Length)

Note that, dplyr package allows to use the forward-pipe operator (%>%) for combining multiple operations. For example, x %>% f is equivalent to f(x). The output of each operation is passed to the next operation.

Summary

Filter rows by logical criteria: dplyr::filter(iris, Sepal.Length >7)
Select n random rows: dplyr::sample_n(iris, 10)
Select a random fraction of rows: dplyr::sample_frac(iris, 10)
Select top n rows by values: dplyr::top_n(iris, 10, Sepal.Length)

Infos

This analysis has been performed using R (ver. 3.2.3).

Renaming Data Frame Columns in R

Thu, 14 Apr 2016 22:25:10 +0200

Pleleminary tasks
Install and load dplyr package for renaming columns
Renaming columns with dplyr::rename()
Renaming columns with dplyr::select()
Renaming columns with R base functions
Summary
Related articles
Infos

Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is the best and modern way to work with your data. We also described crutial steps to reshape your data with R for easier analyses.

Here, you we’ll learn how to rename the columns of a data frame in R.This can be done easily using the function rename() in dplyr. It’s also possible to use R base functions, but they require more typing.

Pleleminary tasks

Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.

# Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
                                  
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Install and load dplyr package for renaming columns

Install dplyr

install.packages("dplyr")

Load dplyr:

library("dplyr")

Renaming columns with dplyr::rename()

Rename the column Sepal.Length to sepal_length and Sepal.Width to sepal_width:

rename(my_data, sepal_length = Sepal.Length,
       sepal_width = Sepal.Width)

Source: local data frame [150 x 5]

   sepal_length sepal_width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Renaming columns with dplyr::select()

select() can be also used to rename variables as follow.

select(my_data, sepal_length = Sepal.Length,
       sepal_width = Sepal.Width)

Source: local data frame [150 x 2]

   sepal_length sepal_width
          (dbl)       (dbl)
1           5.1         3.5
2           4.9         3.0
3           4.7         3.2
4           4.6         3.1
5           5.0         3.6
6           5.4         3.9
7           4.6         3.4
8           5.0         3.4
9           4.4         2.9
10          4.9         3.1
..          ...         ...

Note that, select() keeps only the variables you mentioned. In order to to keep all, you can use the function rename(), which is an alternative of select().

Renaming columns with R base functions

To rename the column Sepal.Length to sepal_length, the procedure is as follow:

Get column names using the function names() or colnames()
Change column names where name = Sepal.Length

# get column names
colnames(my_data)

[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

# Rename column where names is "Sepal.Length"
names(my_data)[names(my_data) == "Sepal.Length"] <- "sepal_length"
names(my_data)[names(my_data) == "Sepal.Width"] <- "sepal_width"
my_data

Source: local data frame [150 x 5]

   sepal_length sepal_width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

It’s also possible to rename by index in names vector as follow.

names(my_data)[1] <- "sepal_length"
names(my_data)[2] <- "sepal_width"

Summary

To rename the column of a data frame, use the function rename()[in dplyr package].

Infos

This analysis has been performed using R (ver. 3.2.3).

Reordering Data Frame Rows in R

Thu, 14 Apr 2016 21:42:21 +0200

Pleleminary tasks
Install and load dplyr package
Reorder rows with dplyr::arrange()
Reorder rows with R base function order()
Summary
Related articles
Infos

Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is the best and modern way to work with your data. We also described crutial steps to reshape your data with R for easier analyses.

Here, you we’ll learn how to reorder (i.e., sort) rows, in your data table, by the value of one or more columns (i.e., variables). This can be done using either the R base function order() or the modern function arrange()[in dplyr package]. We recommend dplyr::arrange() because it requires less typing.

Pleleminary tasks

Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.

# Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
                                  
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Install and load dplyr package

Install dplyr

install.packages("dplyr")

Load dplyr:

library("dplyr")

Reorder rows with dplyr::arrange()

The dplyr function arrange() can be used to reorder (sort) rows by one or more variables.

Reorder rows by Sepal.Length in ascending order

arrange(my_data, Sepal.Length)

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           4.3         3.0          1.1         0.1  setosa
2           4.4         2.9          1.4         0.2  setosa
3           4.4         3.0          1.3         0.2  setosa
4           4.4         3.2          1.3         0.2  setosa
5           4.5         2.3          1.3         0.3  setosa
6           4.6         3.1          1.5         0.2  setosa
7           4.6         3.4          1.4         0.3  setosa
8           4.6         3.6          1.0         0.2  setosa
9           4.6         3.2          1.4         0.2  setosa
10          4.7         3.2          1.3         0.2  setosa
..          ...         ...          ...         ...     ...

Reorder rows by Sepal.Length in descending order. Use the function desc():

arrange(my_data, desc(Sepal.Length))

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
          (dbl)       (dbl)        (dbl)       (dbl)    (fctr)
1           7.9         3.8          6.4         2.0 virginica
2           7.7         3.8          6.7         2.2 virginica
3           7.7         2.6          6.9         2.3 virginica
4           7.7         2.8          6.7         2.0 virginica
5           7.7         3.0          6.1         2.3 virginica
6           7.6         3.0          6.6         2.1 virginica
7           7.4         2.8          6.1         1.9 virginica
8           7.3         2.9          6.3         1.8 virginica
9           7.2         3.6          6.1         2.5 virginica
10          7.2         3.2          6.0         1.8 virginica
..          ...         ...          ...         ...       ...

Instead of using the function desc(), you can prepend the sorting variable by a minus sign to indicate descending order, as follow.

arrange(my_data, -Sepal.Length)

Reorder rows by multiple variables: Sepal.Length and Sepal.width

arrange(my_data, Sepal.Length, Sepal.Width)

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           4.3         3.0          1.1         0.1  setosa
2           4.4         2.9          1.4         0.2  setosa
3           4.4         3.0          1.3         0.2  setosa
4           4.4         3.2          1.3         0.2  setosa
5           4.5         2.3          1.3         0.3  setosa
6           4.6         3.1          1.5         0.2  setosa
7           4.6         3.2          1.4         0.2  setosa
8           4.6         3.4          1.4         0.3  setosa
9           4.6         3.6          1.0         0.2  setosa
10          4.7         3.2          1.3         0.2  setosa
..          ...         ...          ...         ...     ...

If the data contain missing values, they will always come at the end.

dplyr::arrange() is the homologous of R base function order(). It requires less typing.

Reorder rows with R base function order()

Reorder rows by Sepal.Length in ascending order

my_data[order(my_data$Sepal.Length), , drop = FALSE]

Reorder rows by Sepal.Length in descending order. Use the additional argument decreasing = TRUE:

row_order <- order(my_data$Sepal.Length, decreasing = TRUE)
my_data[row_order, , drop = FALSE]

Summary

To order rows by values of a column use the function arrange()[in dplyr package].

Infos

This analysis has been performed using R (ver. 3.2.3).

Reordering Data Frame Columns in R

Thu, 14 Apr 2016 21:26:35 +0200

Pleleminary tasks
Reorder column by position
Reorder column by name
Summary
Related articles
References
Infos

Previously, we described the essentials of R programming and provided quick start guides for importing data into R as well as converting your data into a tibble data format, which is the best and modern way to work with your data. We also described crutial steps to reshape your data with R for easier analyses.

Here, you we’ll learn how to reorder columns, in your data table, by either column positions or column names.

Pleleminary tasks

Launch RStudio as described here: Running RStudio and setting up your working directory
Prepare your data as described here: Best practices for preparing your data and save it in an external .txt tab or .csv files
Import your data into R as described here: Fast reading of data from txt|csv files into R: readr package.

# Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data

Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
                                  
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...

Reorder column by position

# Get column names
colnames(my_data)

[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"

my_data contains 5 columns ordered as follow:

Sepal.Length
Sepal.Width
Petal.Length
Petal.Width
Species

But we want:

the variable “Species” to be the first column (1)
the variable “Petal.Width” to be the second column (2)

It’s possible to reorder the column by position as follow:

my_data2 <- my_data[, c(5, 4, 1, 2, 3)]
my_data2

Source: local data frame [150 x 5]

   Species Petal.Width Sepal.Length Sepal.Width Petal.Length
                                  
1   setosa         0.2          5.1         3.5          1.4
2   setosa         0.2          4.9         3.0          1.4
3   setosa         0.2          4.7         3.2          1.3
4   setosa         0.2          4.6         3.1          1.5
5   setosa         0.2          5.0         3.6          1.4
6   setosa         0.4          5.4         3.9          1.7
7   setosa         0.3          4.6         3.4          1.4
8   setosa         0.2          5.0         3.4          1.5
9   setosa         0.2          4.4         2.9          1.4
10  setosa         0.1          4.9         3.1          1.5
..     ...         ...          ...         ...          ...

Reorder column by name

col_order <- c("Species", "Petal.Width", "Sepal.Length",
               "Sepal.Width", "Petal.Length")

my_data2 <- my_data[, col_order]
my_data2

Source: local data frame [150 x 5]

   Species Petal.Width Sepal.Length Sepal.Width Petal.Length
                                  
1   setosa         0.2          5.1         3.5          1.4
2   setosa         0.2          4.9         3.0          1.4
3   setosa         0.2          4.7         3.2          1.3
4   setosa         0.2          4.6         3.1          1.5
5   setosa         0.2          5.0         3.6          1.4
6   setosa         0.4          5.4         3.9          1.7
7   setosa         0.3          4.6         3.4          1.4
8   setosa         0.2          5.0         3.4          1.5
9   setosa         0.2          4.4         2.9          1.4
10  setosa         0.1          4.9         3.1          1.5
..     ...         ...          ...         ...          ...

Summary

It’s possible to reorder columns by either column position (i.e., number) or column names.

References

Infos

This analysis has been performed using R (ver. 3.2.3).

Data Manipulation in R

Thu, 14 Apr 2016 18:39:11 +0200

Read the articles below

Easy Guides

Computing and Adding new Variables to a Data Frame in R

Pleleminary tasks

Install and load dplyr package for renaming columns

dplyr::mutate(): Add new variables by preserving existing ones

dplyr::transmute(): Make new variables by dropping existing ones

Use mutate() and transmute() programmatically inside a function:

transform(): R base function to compute and add new variables

Summary

Related articles

Infos

Identifying and Removing Duplicate Data in R

Pleleminary tasks

R base functions

Find and drop duplicate elements: duplicated()

Extract unique elements: unique()

Remove duplicate rows using dplyr

Summary

Related articles

Infos

Subsetting Data Frame Columns in R

Pleleminary tasks

Install and load dplyr package

Selecting column by position

Select columns by names

Drop columns

Use select() programmatically inside an R function

Summary

Related articles

Infos

Subsetting Data Frame Rows in R

Pleleminary tasks

Install and load dplyr package

Extracting rows by position: dplyr::slice()

Extracting rows by criteria: dplyr::filter()

Logical comparisons

Extracting rows based on logical criteria

Removing missing values

Using filter() programmatically inside an R function

Extracting rows by criteria with R base functions: subset()

Select random rows from a table

Select top n rows ordered by a variable

Summary

Related articles

Infos

Renaming Data Frame Columns in R

Pleleminary tasks

Install and load dplyr package for renaming columns

Renaming columns with dplyr::rename()

Renaming columns with dplyr::select()

Renaming columns with R base functions

Summary

Related articles

Infos

Reordering Data Frame Rows in R

Pleleminary tasks

Install and load dplyr package

Reorder rows with dplyr::arrange()

Reorder rows with R base function order()

Summary

Related articles

Infos

Reordering Data Frame Columns in R

Pleleminary tasks

Reorder column by position

Reorder column by name

Summary

Related articles

References

Infos

Data Manipulation in R