<?xml version="1.0" encoding="UTF-8" ?>
<!-- RSS generated by PHPBoost on Tue, 28 Apr 2026 18:09:36 +0200 -->

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title><![CDATA[Easy Guides]]></title>
		<atom:link href="https://www.sthda.com/english/syndication/rss/wiki/41" rel="self" type="application/rss+xml"/>
		<link>https://www.sthda.com</link>
		<description><![CDATA[Last articles of the category: Data Manipulation in R]]></description>
		<copyright>(C) 2005-2026 PHPBoost</copyright>
		<language>en</language>
		<generator>PHPBoost</generator>
		
		
		<item>
			<title><![CDATA[Computing and Adding new Variables to a Data Frame in R]]></title>
			<link>https://www.sthda.com/english/wiki/computing-and-adding-new-variables-to-a-data-frame-in-r</link>
			<guid>https://www.sthda.com/english/wiki/computing-and-adding-new-variables-to-a-data-frame-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

            
  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">


<div id="TOC">
<ul>
<li><a href="#pleleminary-tasks">Pleleminary tasks</a></li>
<li><a href="#install-and-load-dplyr-package-for-renaming-columns">Install and load dplyr package for renaming columns</a></li>
<li><a href="#dplyrmutate-add-new-variables-by-preserving-existing-ones">dplyr::mutate(): Add new variables by preserving existing ones</a></li>
<li><a href="#dplyrtransmute-make-new-variables-by-dropping-existing-ones">dplyr::transmute(): Make new variables by dropping existing ones</a></li>
<li><a href="#use-mutate-and-transmute-programmatically-inside-a-function">Use mutate() and transmute() programmatically inside a function:</a></li>
<li><a href="#transform-r-base-function-to-compute-and-add-new-variables">transform(): R base function to compute and add new variables</a></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#related-articles">Related articles</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p>
</p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong> as well as converting your data into a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">tibble data format</a>, which is modern convention way to work with your data. We also described <a href="https://www.sthda.com/english/wiki/tidyr-crutial-step-reshaping-data-with-r-for%20easier-analyses">crutial steps to reshape your data with R</a> for easier analyses.</p>
<br/>
<div class="block">
Here, you we’ll learn how to compute and <strong>add new variables</strong> to a <strong>data frame</strong> in <strong>R</strong>. This can be done easily using the functions <strong>mutate</strong>() and <strong>transmute</strong>() in <strong>dplyr</strong> R package.
</div>
<p><br/></p>
<ul>
<li><strong>mutate</strong>(): Computes and adds new variable(s). Preserves existing variables. It’s similar to the R base function <strong>transform</strong>().</li>
<li><strong>transmute</strong>(): Computes new variable(s). Drops existing variables.</li>
</ul>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/computing-adding-new-variables.png" alt="Renaming Columns of a Data Table in R" /> <br/> <span class="small">Figure adapted from <a href="https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf">RStudio data wrangling cheatsheet</a></span></p>
<div id="pleleminary-tasks" class="section level1">
<h1>Pleleminary tasks</h1>
<ol style="list-style-type: decimal">
<li><p><strong>Launch RStudio</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/running-rstudio-and-setting-up-your-working-directory-easy-r-programming">Running RStudio and setting up your working directory</a></p></li>
<li><p><strong>Prepare your data</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/best-practices-in-preparing-data-files-for-importing-into-r">Best practices for preparing your data</a> and save it in an external .txt tab or .csv files</p></li>
<li><p><strong>Import your data</strong> into <strong>R</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/fast-reading-of-data-from-txt-csv-files-into-r-readr-package">Fast reading of data from txt|csv files into R: readr package</a>.</p></li>
</ol>
<p><span class="success">Here, we’ll use the <a href="https://www.sthda.com/english/english/wiki/r-built-in-data-sets#iris">R built-in iris data set</a>, which we start by converting to a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-data"><strong>tibble data frame</strong> (<strong>tbl_df</strong>)</a>. Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.</span></p>
<pre class="r"><code># Create my_data
my_data <- iris[, -5]

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data</code></pre>
<pre><code>Source: local data frame [150 x 4]

   Sepal.Length Sepal.Width Petal.Length Petal.Width
          <dbl>       <dbl>        <dbl>       <dbl>
1           5.1         3.5          1.4         0.2
2           4.9         3.0          1.4         0.2
3           4.7         3.2          1.3         0.2
4           4.6         3.1          1.5         0.2
5           5.0         3.6          1.4         0.2
6           5.4         3.9          1.7         0.4
7           4.6         3.4          1.4         0.3
8           5.0         3.4          1.5         0.2
9           4.4         2.9          1.4         0.2
10          4.9         3.1          1.5         0.1
..          ...         ...          ...         ...</code></pre>
</div>
<div id="install-and-load-dplyr-package-for-renaming-columns" class="section level1">
<h1>Install and load dplyr package for renaming columns</h1>
<ul>
<li>Install <strong>dplyr</strong></li>
</ul>
<pre class="r"><code>install.packages("dplyr")</code></pre>
<ul>
<li>Load <strong>dplyr</strong>:</li>
</ul>
<pre class="r"><code>library("dplyr")</code></pre>
</div>
<div id="dplyrmutate-add-new-variables-by-preserving-existing-ones" class="section level1">
<h1>dplyr::mutate(): Add new variables by preserving existing ones</h1>
<ul>
<li>Add new columns (sepal_by_petal_*) by preserving existing ones:</li>
</ul>
<pre class="r"><code>mutate(my_data,
       sepal_by_petal_l = Sepal.Length/Petal.Length
       )</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width sepal_by_petal_l
          (dbl)       (dbl)        (dbl)       (dbl)            (dbl)
1           5.1         3.5          1.4         0.2         3.642857
2           4.9         3.0          1.4         0.2         3.500000
3           4.7         3.2          1.3         0.2         3.615385
4           4.6         3.1          1.5         0.2         3.066667
5           5.0         3.6          1.4         0.2         3.571429
6           5.4         3.9          1.7         0.4         3.176471
7           4.6         3.4          1.4         0.3         3.285714
8           5.0         3.4          1.5         0.2         3.333333
9           4.4         2.9          1.4         0.2         3.142857
10          4.9         3.1          1.5         0.1         3.266667
..          ...         ...          ...         ...              ...</code></pre>
</div>
<div id="dplyrtransmute-make-new-variables-by-dropping-existing-ones" class="section level1">
<h1>dplyr::transmute(): Make new variables by dropping existing ones</h1>
<ul>
<li>Add new columns (sepal_by_petal_*) by dropping existing ones:</li>
</ul>
<pre class="r"><code>transmute(my_data, 
            sepal_by_petal_l = Sepal.Length/Petal.Length,
            sepal_by_petal_w = Sepal.Width/Petal.Width
            )</code></pre>
<pre><code>Source: local data frame [150 x 2]

   sepal_by_petal_l sepal_by_petal_w
              (dbl)            (dbl)
1          3.642857         17.50000
2          3.500000         15.00000
3          3.615385         16.00000
4          3.066667         15.50000
5          3.571429         18.00000
6          3.176471          9.75000
7          3.285714         11.33333
8          3.333333         17.00000
9          3.142857         14.50000
10         3.266667         31.00000
..              ...              ...</code></pre>
</div>
<div id="use-mutate-and-transmute-programmatically-inside-a-function" class="section level1">
<h1>Use mutate() and transmute() programmatically inside a function:</h1>
<br/>
<div class="block">
<strong>mutate</strong>() and <strong>transmute</strong>() are best-suited for interactive use. The functions <strong>mutate_</strong>() and <strong>transmute</strong>() should be used for calling from a function. In this case the input must be “quoted”.
</div>
<p><br/></p>
<p>There are three ways to quote inputs that dplyr understands:</p>
<ul>
<li>With a formula, ~Sepal.Length.</li>
<li>With quote(), quote(Sepal.Length).</li>
<li>As a string: “Sepal.Length”.</li>
</ul>
<pre class="r"><code># Use formula
mutate_(my_data, 
            sepal_by_petal_l = ~Sepal.Length/Petal.Length,
            sepal_by_petal_w = ~Sepal.Width/Petal.Width
            )

# Or use quote
transmute_(my_data, 
            sepal_by_petal_l = quote(Sepal.Length/Petal.Length),
            sepal_by_petal_w = quote(Sepal.Width/Petal.Width)
            )

# or, this
transmute_(my_data, 
            sepal_by_petal_l = "Sepal.Length/Petal.Length",
            sepal_by_petal_w = "Sepal.Width/Petal.Width"
            )</code></pre>
</div>
<div id="transform-r-base-function-to-compute-and-add-new-variables" class="section level1">
<h1>transform(): R base function to compute and add new variables</h1>
<p><strong>dplyr::mutate</strong>() works similarly to the R base function <strong>transform</strong>(), except that in mutate() you can refer to variables you’ve just created. This is not possible in transform().</p>
<pre class="r"><code>my_data2 <- transform(my_data, neg_sepal_length = -Sepal.Length)
head(my_data2)</code></pre>
<pre><code>  Sepal.Length Sepal.Width Petal.Length Petal.Width neg_sepal_length
1          5.1         3.5          1.4         0.2             -5.1
2          4.9         3.0          1.4         0.2             -4.9
3          4.7         3.2          1.3         0.2             -4.7
4          4.6         3.1          1.5         0.2             -4.6
5          5.0         3.6          1.4         0.2             -5.0
6          5.4         3.9          1.7         0.4             -5.4</code></pre>
</div>
<div id="summary" class="section level1">
<h1>Summary</h1>
<br/>
<div class="block">
<ul>
<li><strong>dplyr::mutate</strong>(iris, sepal = 2*Sepal.Length): Computes and appends new variable(s).</li>
<li><strong>dplyr::transmute</strong>(iris, sepal = 2*Sepal.Length): Makes new variable(s) and drops existing ones.</li>
<li><strong>transform</strong>(iris, sepal = 2*Sepal.Length): R base function similar to mutate().</li>
</ul>
</div>
<p><br/></p>
</div>
<div id="related-articles" class="section level1">
<h1>Related articles</h1>
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Programming Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing Data into R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/exporting-data-from-r">Exporting Data from R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses">Preparing and Reshaping Data in R for Easier Analyses</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-columns-in-r">Reordering Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-rows-in-r">Reordering Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/renaming-data-frame-columns-in-r">Renaming Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-rows-in-r">Subsetting Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-columns-in-r">Subsetting Data Frame Columns in R</a></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using R (ver. 3.2.4). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->


<!-- END HTML -->]]></description>
			<pubDate>Sun, 01 May 2016 10:43:30 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Identifying and Removing Duplicate Data in R]]></title>
			<link>https://www.sthda.com/english/wiki/identifying-and-removing-duplicate-data-in-r</link>
			<guid>https://www.sthda.com/english/wiki/identifying-and-removing-duplicate-data-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">


<div id="TOC">
<ul>
<li><a href="#pleleminary-tasks">Pleleminary tasks</a></li>
<li><a href="#r-base-functions">R base functions</a><ul>
<li><a href="#find-and-drop-duplicate-elements-duplicated">Find and drop duplicate elements: duplicated()</a></li>
<li><a href="#extract-unique-elements-unique">Extract unique elements: unique()</a></li>
</ul></li>
<li><a href="#remove-duplicate-rows-using-dplyr">Remove duplicate rows using dplyr</a></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#related-articles">Related articles</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p>
</p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong> as well as converting your data into a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">tibble data format</a>, which is the best and modern way to work with your data.</p>
<br/>
<div class="block">
Here, you we’ll learn how to <strong>remove duplicate</strong> data using <strong>R</strong> base functions (<strong>duplicated</strong>() and <strong>unique</strong>()) as well as the function <strong>distinct</strong> [in <strong>dplyr</strong> package].
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/remove-duplicate-data-r.png" alt="Identifying and Removing Duplicate Data in R" /> <br/></p>
<div id="pleleminary-tasks" class="section level1">
<h1>Pleleminary tasks</h1>
<ol style="list-style-type: decimal">
<li><p><strong>Launch RStudio</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/running-rstudio-and-setting-up-your-working-directory-easy-r-programming">Running RStudio and setting up your working directory</a></p></li>
<li><p><strong>Prepare your data</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/best-practices-in-preparing-data-files-for-importing-into-r">Best practices for preparing your data</a> and save it in an external .txt tab or .csv files</p></li>
<li><p><strong>Import your data</strong> into <strong>R</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/fast-reading-of-data-from-txt-csv-files-into-r-readr-package">Fast reading of data from txt|csv files into R: readr package</a>.</p></li>
</ol>
<p><span class="success">Here, we’ll use the <a href="https://www.sthda.com/english/english/wiki/r-built-in-data-sets#iris">R built-in iris data set</a>, which we start by converting to a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-data"><strong>tibble data frame</strong> (<strong>tbl_df</strong>)</a>. Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.</span></p>
<pre class="r"><code># Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
</div>
<div id="r-base-functions" class="section level1">
<h1>R base functions</h1>
<br/>
<div class="block">
In this section, we’ll describe the function <strong>unique</strong>() [for extracting unique elements] and the function <strong>duplicated</strong>() [for identifying duplicated elements].
</div>
<p><br/></p>
<div id="find-and-drop-duplicate-elements-duplicated" class="section level2">
<h2>Find and drop duplicate elements: duplicated()</h2>
<p>The function <strong>duplicated</strong>() returns a logical vector where TRUE specifies which elements of a vector or data frame are duplicates.</p>
<p>Given the following vector:</p>
<pre class="r"><code>x <- c(1, 1, 4, 5, 4, 6)</code></pre>
<ul>
<li>To find the position of duplicate elements in x, use this:</li>
</ul>
<pre class="r"><code>duplicated(x)</code></pre>
<pre><code>[1] FALSE  TRUE FALSE FALSE  TRUE FALSE</code></pre>
<ul>
<li>Extract duplicate elements:</li>
</ul>
<pre class="r"><code>x[duplicated(x)]</code></pre>
<pre><code>[1] 1 4</code></pre>
<ul>
<li>If you want to remove duplicated elements, use !duplicated(), where ! is a logical negation:</li>
</ul>
<pre class="r"><code>x[!duplicated(x)]</code></pre>
<pre><code>[1] 1 4 5 6</code></pre>
<ul>
<li>Following this way, you can remove duplicate rows from a data frame based on a column values, as follow:</li>
</ul>
<pre class="r"><code># Remove duplicates based on Sepal.Width columns
my_data[!duplicated(my_data$Sepal.Width), ]</code></pre>
<pre><code>Source: local data frame [23 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           4.4         2.9          1.4         0.2  setosa
9           5.4         3.7          1.5         0.2  setosa
10          5.8         4.0          1.2         0.2  setosa
..          ...         ...          ...         ...     ...</code></pre>
<p><span class="warning"><strong>!</strong> is a logical negation. <strong>!duplicated</strong>() means that we don’t want duplicate rows.</span></p>
</div>
<div id="extract-unique-elements-unique" class="section level2">
<h2>Extract unique elements: unique()</h2>
<p>Given the following vector:</p>
<pre class="r"><code>x <- c(1, 1, 4, 5, 4, 6)</code></pre>
<p>You can extract unique elements as follow:</p>
<pre class="r"><code>unique(x)</code></pre>
<pre><code>[1] 1 4 5 6</code></pre>
<p>It’s also possible to apply <strong>unique</strong>() on a data frame, for removing duplicated rows as follow:</p>
<pre class="r"><code>unique(my_data)</code></pre>
<pre><code>Source: local data frame [149 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
</div>
</div>
<div id="remove-duplicate-rows-using-dplyr" class="section level1">
<h1>Remove duplicate rows using dplyr</h1>
<br/>
<div class="block">
The function <strong>distinct</strong>() in <strong>dplyr</strong> package can be used to keep only unique/distinct rows from a data frame. If there are duplicate rows, only the first row is preserved. It’s an efficient version of the R base function <strong>unique</strong>().
</div>
<p><br/></p>
<p>The <strong>dplyr</strong> package can be loaded and installed as follow:</p>
<pre class="r"><code># Install
install.packages("dplyr")

# Load
library("dplyr")</code></pre>
<ul>
<li>Remove duplicate rows based on all columns:</li>
</ul>
<pre class="r"><code>distinct(my_data)</code></pre>
<pre><code>Source: local data frame [149 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
<ul>
<li>Remove duplicate rows based on certain columns (variables):</li>
</ul>
<pre class="r"><code># Remove duplicated rows based on Sepal.Length
distinct(my_data, Sepal.Length)</code></pre>
<pre><code>Source: local data frame [35 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.4         2.9          1.4         0.2  setosa
8           4.8         3.4          1.6         0.2  setosa
9           4.3         3.0          1.1         0.1  setosa
10          5.8         4.0          1.2         0.2  setosa
..          ...         ...          ...         ...     ...</code></pre>
<pre class="r"><code># Remove duplicated rows based on 
# Sepal.Length and Petal.Width
distinct(my_data, Sepal.Length, Petal.Width)</code></pre>
<pre><code>Source: local data frame [110 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           4.4         2.9          1.4         0.2  setosa
9           4.9         3.1          1.5         0.1  setosa
10          5.4         3.7          1.5         0.2  setosa
..          ...         ...          ...         ...     ...</code></pre>
<br/>
<div class="success">
<strong>distinct</strong>() is best-suited for interactive use. The function <strong>distinct_</strong>() should be used for calling from a function. In this case the input must be “quoted”.
</div>
<p><br/></p>
<pre class="r"><code>distinct_(my_data,  "Sepal.Length", "Petal.Width")</code></pre>
</div>
<div id="summary" class="section level1">
<h1>Summary</h1>
<br/>
<div class="block">
<ul>
<li><p>Remove duplicate rows based on one or more column values: <strong>dplyr::distinct</strong>(my_data, Sepal.Length)</p></li>
<li><p>R base function to extract unique elements from vectors and data frames: <strong>unique</strong>(my_data)</p></li>
<li>R base function to determine duplicate elements: <strong>duplicated</strong>(my_data)</li>
</ul>
</div>
<p><br/></p>
</div>
<div id="related-articles" class="section level1">
<h1>Related articles</h1>
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Programming Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing Data into R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/exporting-data-from-r">Exporting Data from R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses">Preparing and Reshaping Data in R for Easier Analyses</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-columns-in-r">Reordering Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-rows-in-r">Reordering Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/renaming-data-frame-columns-in-r">Renaming Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-rows-in-r">Subsetting Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-columns-in-r">Subsetting Data Frame Columns in R</a></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using R (ver. 3.2.3). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Thu, 14 Apr 2016 22:53:19 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Subsetting Data Frame Columns in R]]></title>
			<link>https://www.sthda.com/english/wiki/subsetting-data-frame-columns-in-r</link>
			<guid>https://www.sthda.com/english/wiki/subsetting-data-frame-columns-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#pleleminary-tasks">Pleleminary tasks</a></li>
<li><a href="#install-and-load-dplyr-package">Install and load dplyr package</a></li>
<li><a href="#selecting-column-by-position">Selecting column by position</a></li>
<li><a href="#select-columns-by-names">Select columns by names</a></li>
<li><a href="#drop-columns">Drop columns</a></li>
<li><a href="#use-select-programmatically-inside-an-r-function">Use select() programmatically inside an R function</a></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#related-articles">Related articles</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p>
</p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong> as well as converting your data into a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">tibble data format</a>, which is the best and modern way to work with your data. We next described <a href="https://www.sthda.com/english/wiki/tidyr-crutial-step-reshaping-data-with-r-for%20easier-analyses">crutial steps to reshape your data with R</a> for easier analyses. Additionally, we provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-rows-in-r">subsetting data frame rows</a> based on some logical criteria.</p>
<br/>
<div class="block">
Here, you we’ll learn how to <strong>subset</strong> data frame <strong>columns</strong> (i.e., variables) by names using the function <strong>select</strong>() [in <strong>dplyr</strong> package].
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/subsetting-columns-in-r.png" alt="Subsetting Columns of a Data Frame in R" /> <br/></p>
<div id="pleleminary-tasks" class="section level1">
<h1>Pleleminary tasks</h1>
<ol style="list-style-type: decimal">
<li><p><strong>Launch RStudio</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/running-rstudio-and-setting-up-your-working-directory-easy-r-programming">Running RStudio and setting up your working directory</a></p></li>
<li><p><strong>Prepare your data</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/best-practices-in-preparing-data-files-for-importing-into-r">Best practices for preparing your data</a> and save it in an external .txt tab or .csv files</p></li>
<li><p><strong>Import your data</strong> into <strong>R</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/fast-reading-of-data-from-txt-csv-files-into-r-readr-package">Fast reading of data from txt|csv files into R: readr package</a>.</p></li>
</ol>
<p><span class="success">Here, we’ll use the <a href="https://www.sthda.com/english/english/wiki/r-built-in-data-sets#iris">R built-in iris data set</a>, which we start by converting to a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-data"><strong>tibble data frame</strong> (<strong>tbl_df</strong>)</a>. Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.</span></p>
<pre class="r"><code># Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
</div>
<div id="install-and-load-dplyr-package" class="section level1">
<h1>Install and load dplyr package</h1>
<ul>
<li>Install <strong>dplyr</strong></li>
</ul>
<pre class="r"><code>install.packages("dplyr")</code></pre>
<ul>
<li>Load <strong>dplyr</strong>:</li>
</ul>
<pre class="r"><code>library("dplyr")</code></pre>
</div>
<div id="selecting-column-by-position" class="section level1">
<h1>Selecting column by position</h1>
<ul>
<li>Select columns 1 to 2:</li>
</ul>
<pre class="r"><code>my_data[, 1:2]</code></pre>
<ul>
<li>Select column 1 and 3 but not 2:</li>
</ul>
<pre class="r"><code>my_data[, c(1, 3)]</code></pre>
</div>
<div id="select-columns-by-names" class="section level1">
<h1>Select columns by names</h1>
<ul>
<li>Select columns by names: Sepal.Length and Petal.Length</li>
</ul>
<pre class="r"><code>select(my_data, Sepal.Length, Petal.Length)</code></pre>
<pre><code>Source: local data frame [150 x 2]

   Sepal.Length Petal.Length
          (dbl)        (dbl)
1           5.1          1.4
2           4.9          1.4
3           4.7          1.3
4           4.6          1.5
5           5.0          1.4
6           5.4          1.7
7           4.6          1.4
8           5.0          1.5
9           4.4          1.4
10          4.9          1.5
..          ...          ...</code></pre>
<ul>
<li>Select all columns from Sepal.Length to Petal.Length</li>
</ul>
<pre class="r"><code>select(my_data, Sepal.Length:Petal.Length)</code></pre>
<pre><code>Source: local data frame [150 x 3]

   Sepal.Length Sepal.Width Petal.Length
          (dbl)       (dbl)        (dbl)
1           5.1         3.5          1.4
2           4.9         3.0          1.4
3           4.7         3.2          1.3
4           4.6         3.1          1.5
5           5.0         3.6          1.4
6           5.4         3.9          1.7
7           4.6         3.4          1.4
8           5.0         3.4          1.5
9           4.4         2.9          1.4
10          4.9         3.1          1.5
..          ...         ...          ...</code></pre>
<p><span class="success">There are several special functions that can be used inside select(): <strong>starts_with</strong>(), <strong>ends_with</strong>(), <strong>contains</strong>(), <strong>matches</strong>(), <strong>one_of</strong>(), etc.</span></p>
<pre class="r"><code># Select column whose name starts with "Petal"
select(my_data, starts_with("Petal"))

# Select column whose name ends with "Width"
select(my_data, ends_with("Width"))

# Select columns whose names contains "etal"
select(my_data, contains("etal"))
  
# Select columns whose name maches a regular expression
select(my_data, matches(".t."))

# selects variables provided in a character vector.
select(my_data, one_of(c("Sepal.Length", "Petal.Length")))</code></pre>
</div>
<div id="drop-columns" class="section level1">
<h1>Drop columns</h1>
<p><span class="warning">Note that, to remove a column from a data frame, prepend its name by minus <strong>-</strong>.</span></p>
<ul>
<li>Dropping Sepal.Length and Petal.Length:</li>
</ul>
<pre class="r"><code>select(my_data, -Sepal.Length, -Petal.Length)</code></pre>
<ul>
<li>Dropping columns from Sepal.Length to Petal.Length:</li>
</ul>
<pre class="r"><code>select(my_data, -(Sepal.Length:Petal.Length))</code></pre>
<pre><code>Source: local data frame [150 x 2]

   Petal.Width Species
         (dbl)  (fctr)
1          0.2  setosa
2          0.2  setosa
3          0.2  setosa
4          0.2  setosa
5          0.2  setosa
6          0.4  setosa
7          0.3  setosa
8          0.2  setosa
9          0.2  setosa
10         0.1  setosa
..         ...     ...</code></pre>
<ul>
<li>Dropping columns whose name starts with “Petal”:</li>
</ul>
<pre class="r"><code>select(my_data, -starts_with("Petal"))</code></pre>
<pre><code>Source: local data frame [150 x 3]

   Sepal.Length Sepal.Width Species
          (dbl)       (dbl)  (fctr)
1           5.1         3.5  setosa
2           4.9         3.0  setosa
3           4.7         3.2  setosa
4           4.6         3.1  setosa
5           5.0         3.6  setosa
6           5.4         3.9  setosa
7           4.6         3.4  setosa
8           5.0         3.4  setosa
9           4.4         2.9  setosa
10          4.9         3.1  setosa
..          ...         ...     ...</code></pre>
<p><span class="warning">Note that, if you want to drop columns by position, the syntax is as follow.</span></p>
<pre class="r"><code># Drop column 1
my_data[, -1]

# Drop columns 1 to 3
my_data[, -(1:3)]

# Drop columns 1 and 3 but not 2
my_data[, -c(1, 3)]</code></pre>
</div>
<div id="use-select-programmatically-inside-an-r-function" class="section level1">
<h1>Use select() programmatically inside an R function</h1>
<p>Dplyr uses non-standard evaluation (NSE), which is great for interactive use and save you typing. Behind the scene, NSE is powered by the <strong>lazyeval</strong> package.</p>
<br/>
<div class="block">
<strong>select</strong>() is best-suited for interactive use. The function <strong>select_</strong>() should be used for calling from a function. In this case the input must be “quoted”.
</div>
<p><br/></p>
<p>There are three ways to quote inputs that dplyr understands:</p>
<ul>
<li>With a formula, ~Sepal.Length.</li>
<li>With quote(), quote(Sepal.Length).</li>
<li>As a string: “Sepal.Length”.</li>
</ul>
<p>For example, you can select the column Sepal.Length by typing the following R code:</p>
<pre class="r"><code>select_(my_data, ~Sepal.Length)</code></pre>
<p>Or, by using this:</p>
<pre class="r"><code>select_(my_data, "Sepal.Length")</code></pre>
<p>It’s also possible to use function inside <strong>select_</strong>(). The R package <strong>lazyeval</strong> is required. It can be installed as follow:</p>
<pre class="r"><code>install.packages("lazyeval")</code></pre>
<p>Use <strong>lazyeval</strong> package to interpret functions inside <strong>select_</strong>():</p>
<pre class="r"><code># Select column names that match ".t."
select_(my_data, lazyeval::interp(~matches(x), x = ".t."))

# Select column names that start with "Petal"
select_(my_data, lazyeval::interp(~starts_with(x), x = "Petal"))

# Dropping columns: Sepal.Length and Sepal.Width
select_(my_data, quote(-Sepal.Length), quote(-Sepal.Width))

# Or use this
select_(my_data, .dots = list(quote(-Petal.Length), quote(-Petal.Width)))</code></pre>
</div>
<div id="summary" class="section level1">
<h1>Summary</h1>
<br/>
<div class="block">
<ul>
<li><p>Select columns by position: my_data[, 1:2]</p></li>
<li><p>Select columns by name: dplyr::select(my_data, Sepal.Length, Petal.Length)</p></li>
<li><p>Drop columns: dplyr::select(my_data, -Sepal.Length, -Petal.Length)</p></li>
<li>Helper functions: <strong>starts_with</strong>(), <strong>ends_with</strong>(), <strong>contains</strong>(), <strong>matches</strong>(), <strong>one_of</strong>()
<ul>
<li>dplyr::select(my_data, starts_with(“Petal”))</li>
<li>dplyr::select(my_data, ends_with(“Length”))</li>
</ul></li>
</ul>
</div>
<p><br/></p>
</div>
<div id="related-articles" class="section level1">
<h1>Related articles</h1>
<ul>
<li>Previous chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Programming Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing Data into R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/exporting-data-from-r">Exporting Data from R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses">Preparing and Reshaping Data in R for Easier Analyses</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-columns-in-r">Reordering Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-rows-in-r">Reordering Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/renaming-data-frame-columns-in-r">Renaming Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-rows-in-r">Subsetting Data Frame Rows in R</a></li>
</ul></li>
<li>Next chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/identifying-and-removing-duplicate-data-in-r">Identifying and Removing Duplicate Data in R</a></li>
</ul></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using R (ver. 3.2.3). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Thu, 14 Apr 2016 22:48:39 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Subsetting Data Frame Rows in R]]></title>
			<link>https://www.sthda.com/english/wiki/subsetting-data-frame-rows-in-r</link>
			<guid>https://www.sthda.com/english/wiki/subsetting-data-frame-rows-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#pleleminary-tasks">Pleleminary tasks</a></li>
<li><a href="#install-and-load-dplyr-package">Install and load dplyr package</a></li>
<li><a href="#extracting-rows-by-position-dplyrslice">Extracting rows by position: dplyr::slice()</a></li>
<li><a href="#extracting-rows-by-criteria-dplyrfilter">Extracting rows by criteria: dplyr::filter()</a><ul>
<li><a href="#logical-comparisons">Logical comparisons</a></li>
<li><a href="#extracting-rows-based-on-logical-criteria">Extracting rows based on logical criteria</a></li>
<li><a href="#removing-missing-values">Removing missing values</a></li>
<li><a href="#using-filter-programmatically-inside-an-r-function">Using filter() programmatically inside an R function</a></li>
</ul></li>
<li><a href="#extracting-rows-by-criteria-with-r-base-functions-subset">Extracting rows by criteria with R base functions: subset()</a></li>
<li><a href="#select-random-rows-from-a-table">Select random rows from a table</a></li>
<li><a href="#select-top-n-rows-ordered-by-a-variable">Select top n rows ordered by a variable</a></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#related-articles">Related articles</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p>
</p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong> as well as converting your data into a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">tibble data format</a>, which is the best and modern way to work with your data. We also described <a href="https://www.sthda.com/english/wiki/tidyr-crutial-step-reshaping-data-with-r-for%20easier-analyses">crutial steps to reshape your data with R</a> for easier analyses.</p>
<br/>
<div class="block">
Here, you we’ll learn how to <strong>subset</strong> (or <strong>filter</strong>) rows of a <strong>data frame</strong> based on certain criteria. This can be done easily using R functions provided by <strong>dplyr</strong> package. It’s also possible to use the R base functions <strong>subset</strong>().
</div>
<p><br/></p>
<p>Among the functions available in <strong>dplyr</strong> package, there are:</p>
<ul>
<li><strong>filter</strong>(iris, Sepal.Length >7): Extract rows based on logical criteria</li>
<li><strong>distinct</strong>(iris): Remove duplicated rows</li>
<li><strong>sample_n</strong>(iris, 10, replace = FALSE): Select n random rows from a table</li>
<li><strong>sample_frac</strong>(iris, 0.5, replace = FALSE): Select a random fraction of rows</li>
<li><strong>slice</strong>(iris, 3:8): Select rows by position</li>
<li><strong>top_n</strong>(iris, 10, Sepal.Length): Select and order top n rows (by groups if grouped data)</li>
</ul>
<p><span class="success">We’ll start by describing how to <strong>subset rows</strong> based on some criteria, with the dplyr::<strong>filter</strong>() function as well as the R base function <strong>subset</strong>(). Next, we’ll show you how to <strong>select rows randomly</strong> using <strong>sample_n</strong>() and <strong>sample_frac</strong>() functions. Finally, we’ll describe how to <strong>select the top n</strong> elements in each group, ordered by a given variables.</span></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/subsetting-data-frame-rows-in-r.png" alt="Subsetting Data Frame Rows in R" /> <br/></p>
<div id="pleleminary-tasks" class="section level1">
<h1>Pleleminary tasks</h1>
<ol style="list-style-type: decimal">
<li><p><strong>Launch RStudio</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/running-rstudio-and-setting-up-your-working-directory-easy-r-programming">Running RStudio and setting up your working directory</a></p></li>
<li><p><strong>Prepare your data</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/best-practices-in-preparing-data-files-for-importing-into-r">Best practices for preparing your data</a> and save it in an external .txt tab or .csv files</p></li>
<li><p><strong>Import your data</strong> into <strong>R</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/fast-reading-of-data-from-txt-csv-files-into-r-readr-package">Fast reading of data from txt|csv files into R: readr package</a>.</p></li>
</ol>
<p><span class="success">Here, we’ll use the <a href="https://www.sthda.com/english/english/wiki/r-built-in-data-sets#iris">R built-in iris data set</a>, which we start by converting to a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-data"><strong>tibble data frame</strong> (<strong>tbl_df</strong>)</a>. Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.</span></p>
<pre class="r"><code># Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
</div>
<div id="install-and-load-dplyr-package" class="section level1">
<h1>Install and load dplyr package</h1>
<ul>
<li>Install <strong>dplyr</strong></li>
</ul>
<pre class="r"><code>install.packages("dplyr")</code></pre>
<ul>
<li>Load <strong>dplyr</strong>:</li>
</ul>
<pre class="r"><code>library("dplyr")</code></pre>
</div>
<div id="extracting-rows-by-position-dplyrslice" class="section level1">
<h1>Extracting rows by position: dplyr::slice()</h1>
<p>Select rows 1 to 6:</p>
<pre class="r"><code>my_data[1:6, ]</code></pre>
<p>or you can also use the function <strong>slice</strong>()[in <strong>dplyr</strong>]:</p>
<pre class="r"><code>slice(my_data, 1:6)</code></pre>
</div>
<div id="extracting-rows-by-criteria-dplyrfilter" class="section level1">
<h1>Extracting rows by criteria: dplyr::filter()</h1>
<br/>
<div class="block">
The function <strong>filter</strong>() is used to filter rows that meet some logical criteria.
</div>
<p><br/></p>
<div id="logical-comparisons" class="section level2">
<h2>Logical comparisons</h2>
<p>Before continuing, we introduce the notion of logical comparisons and operators, which are important to know for filtering data.</p>
<p>The “logical” comparison operators available in R are:</p>
<br/>
<div class="block">
<ol style="list-style-type: decimal">
<li><strong>Logical comparisons</strong>
<ul>
<li><strong><</strong>: for less than</li>
<li><strong>></strong>: for greater than</li>
<li><strong><=</strong>: for less than or equal to</li>
<li><strong>>=</strong>: for greater than or equal to</li>
<li><strong>==</strong>: for equal to each other</li>
<li><strong>!=</strong>: not equal to each other</li>
<li><strong>%in%</strong>: group membership. For example, “value <strong>%in%</strong> c(2, 3)” means that value can takes 2 or 3.</li>
<li><strong>is.na</strong>(): is NA</li>
<li><strong>!is.na</strong>(): is not NA.</li>
</ul></li>
<li><strong>Logical operators</strong>
<ul>
<li>value == 2<strong>|</strong>3: means that the value equal 2 or (|) 3. value <strong>%in%</strong> c(2, 3) is a shortcut equivalent to value == 2<strong>|</strong>3.</li>
<li><strong>&amp;</strong>: means and. For example sex == “female” &amp; age > 25</li>
</ul></li>
</ol>
</div>
<p><br/></p>
<p><span class="error">The most frequent mistake made by beginners in R is to use = instead of == when testing for equality. Remember that, when you are testing for equality, you should always use == (not =).</span></p>
</div>
<div id="extracting-rows-based-on-logical-criteria" class="section level2">
<h2>Extracting rows based on logical criteria</h2>
<ul>
<li><strong>One-column based criteria</strong>: Extract rows where Sepal.Length > 7:</li>
</ul>
<pre class="r"><code>filter(my_data, Sepal.Length > 7)</code></pre>
<pre><code>Source: local data frame [12 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
          (dbl)       (dbl)        (dbl)       (dbl)    (fctr)
1           7.1         3.0          5.9         2.1 virginica
2           7.6         3.0          6.6         2.1 virginica
3           7.3         2.9          6.3         1.8 virginica
4           7.2         3.6          6.1         2.5 virginica
5           7.7         3.8          6.7         2.2 virginica
6           7.7         2.6          6.9         2.3 virginica
7           7.7         2.8          6.7         2.0 virginica
8           7.2         3.2          6.0         1.8 virginica
9           7.2         3.0          5.8         1.6 virginica
10          7.4         2.8          6.1         1.9 virginica
11          7.9         3.8          6.4         2.0 virginica
12          7.7         3.0          6.1         2.3 virginica</code></pre>
<ul>
<li><strong>Multiple-column based criteria</strong>: Extract rows where Sepal.Length > 6.7 and Sepal.Width ≤ 3:</li>
</ul>
<pre class="r"><code>filter(my_data, Sepal.Length > 6.7, Sepal.Width <= 3)</code></pre>
<pre><code>Source: local data frame [10 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
          (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1           6.8         2.8          4.8         1.4 versicolor
2           7.1         3.0          5.9         2.1  virginica
3           7.6         3.0          6.6         2.1  virginica
4           7.3         2.9          6.3         1.8  virginica
5           6.8         3.0          5.5         2.1  virginica
6           7.7         2.6          6.9         2.3  virginica
7           7.7         2.8          6.7         2.0  virginica
8           7.2         3.0          5.8         1.6  virginica
9           7.4         2.8          6.1         1.9  virginica
10          7.7         3.0          6.1         2.3  virginica</code></pre>
<ul>
<li><strong>Test for equality</strong> (==): Extract rows where Sepal.Length > 6.5 and Species = “versicolor”:</li>
</ul>
<pre class="r"><code>filter(my_data, Sepal.Length > 6.7, Species == "versicolor")</code></pre>
<pre><code>Source: local data frame [3 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
         (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1          7.0         3.2          4.7         1.4 versicolor
2          6.9         3.1          4.9         1.5 versicolor
3          6.8         2.8          4.8         1.4 versicolor</code></pre>
<ul>
<li><strong>Using OR operator</strong> (|): Extract rows where Sepal.Length > 6.5 and (Species = “versicolor” or Species = “virginica”):</li>
</ul>
<p>Use this:</p>
<pre class="r"><code>filter(my_data, Sepal.Length > 6.7, 
       Species == "versicolor" | Species == "virginica" )</code></pre>
<p>Or, equivalently, use this shortcut (<strong>%in%</strong> operator):</p>
<pre class="r"><code>filter(my_data, Sepal.Length > 6.7, 
      Species %in% c("versicolor", "virginica" ))</code></pre>
<pre><code>Source: local data frame [20 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
          (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1           7.0         3.2          4.7         1.4 versicolor
2           6.9         3.1          4.9         1.5 versicolor
3           6.8         2.8          4.8         1.4 versicolor
4           7.1         3.0          5.9         2.1  virginica
5           7.6         3.0          6.6         2.1  virginica
6           7.3         2.9          6.3         1.8  virginica
7           7.2         3.6          6.1         2.5  virginica
8           6.8         3.0          5.5         2.1  virginica
9           7.7         3.8          6.7         2.2  virginica
10          7.7         2.6          6.9         2.3  virginica
11          6.9         3.2          5.7         2.3  virginica
12          7.7         2.8          6.7         2.0  virginica
13          7.2         3.2          6.0         1.8  virginica
14          7.2         3.0          5.8         1.6  virginica
15          7.4         2.8          6.1         1.9  virginica
16          7.9         3.8          6.4         2.0  virginica
17          7.7         3.0          6.1         2.3  virginica
18          6.9         3.1          5.4         2.1  virginica
19          6.9         3.1          5.1         2.3  virginica
20          6.8         3.2          5.9         2.3  virginica</code></pre>
<p><span class="success">Note that, <strong>filter</strong>() works similarly to the R base function <strong>subset</strong>(), which will be described in the next sections.</span></p>
</div>
<div id="removing-missing-values" class="section level2">
<h2>Removing missing values</h2>
<p>As described in the chapter named <a href="https://www.sthda.com/english/english/wiki/easy-r-programming-basics#case-of-missing-values">R programming basics</a>, it’s possible to use the function <strong>is.na</strong>(x) to check whether a data contains missing value. It takes a <a href="https://www.sthda.com/english/english/wiki/easy-r-programming-basics#vectors">vector</a> x as an input and returns a logical vector in which the value TRUE specifies that the corresponding element in x is NA.</p>
<ul>
<li>Create a tbl with missing values using <strong>data_frame</strong>() [in <strong>dplyr</strong>]. In R <strong>NA</strong> (Not Available) is used to represent missing values:</li>
</ul>
<pre class="r"><code># Create a data frame with missing data
friends_data <- data_frame(
  name = c("Nicolas", "Thierry", "Bernard", "Jerome"),
  age = c(27, 25, 29, 26),
  height = c(180, NA, NA, 169),
  married = c("yes", "yes", "no", "no")
)
# Print
friends_data</code></pre>
<pre><code>Source: local data frame [4 x 4]

     name   age height married
    (chr) (dbl)  (dbl)   (chr)
1 Nicolas    27    180     yes
2 Thierry    25     NA     yes
3 Bernard    29     NA      no
4  Jerome    26    169      no</code></pre>
<ul>
<li>Extract rows where height is NA:</li>
</ul>
<pre class="r"><code>filter(friends_data, is.na(height))</code></pre>
<pre><code>Source: local data frame [2 x 4]

     name   age height married
    (chr) (dbl)  (dbl)   (chr)
1 Thierry    25     NA     yes
2 Bernard    29     NA      no</code></pre>
<ul>
<li>Exclude (drop) rows where height is NA:</li>
</ul>
<pre class="r"><code>filter(friends_data, !is.na(height))</code></pre>
<pre><code>Source: local data frame [2 x 4]

     name   age height married
    (chr) (dbl)  (dbl)   (chr)
1 Nicolas    27    180     yes
2  Jerome    26    169      no</code></pre>
<p><span class="success">In the R code above, <strong>!is.na()</strong> means that “we don’t want” NAs.</span></p>
</div>
<div id="using-filter-programmatically-inside-an-r-function" class="section level2">
<h2>Using filter() programmatically inside an R function</h2>
<br/>
<div class="block">
<strong>filter</strong>() is best-suited for interactive use. The function <strong>filter_</strong>() should be used for calling from a function. In this case the input must be “quoted”.
</div>
<p><br/></p>
<p>There are three ways to quote inputs that dplyr understands:</p>
<ul>
<li>With a formula, ~Sepal.Length.</li>
<li>With quote(), quote(Sepal.Length).</li>
<li>As a string: “Sepal.Length”.</li>
</ul>
<pre class="r"><code># Extract rows where Sepal.Length > 7
filter_(my_data, "Sepal.Length > 7")

# Extract rows where Sepal.Length > 7 and Sepal.Width <= 3
filter_(my_data, "Sepal.Length > 7 &amp; Sepal.Width <= 3")

# Extract rows where Sepal.Length > 6.5 and
# (Species = "versicolor" or Species = "virginica")
filter_(my_data, quote(Sepal.Length > 6.7 &amp; 
      Species %in% c("versicolor", "virginica" )))</code></pre>
</div>
</div>
<div id="extracting-rows-by-criteria-with-r-base-functions-subset" class="section level1">
<h1>Extracting rows by criteria with R base functions: subset()</h1>
<ul>
<li>Extract rows where Sepal.Length > 7 and Sepal.Width ≤ 3:</li>
</ul>
<p>You can use this:</p>
<pre class="r"><code>my_data[my_data$Sepal.Length > 7 &amp; my_data$Sepal.Width <= 3, ]</code></pre>
<p>Or use the R base function <strong>subset</strong>():</p>
<pre class="r"><code>subset(my_data, Sepal.Length > 7 &amp; Sepal.Width <= 3)</code></pre>
<ul>
<li>Extract rows where Sepal.Length > 6.7 and (Species = “versicolor” or Species = “virginica”)</li>
</ul>
<pre class="r"><code>subset(my_data, Sepal.Length > 6.7, 
      Species %in% c("versicolor", "virginica" ))</code></pre>
<p><span class="notice"><strong>subset</strong>() works also with vectors as follow.</span></p>
<pre class="r"><code>my_vec <- 1:10
subset(my_vec, my_vec >5 &amp; my_vec < 8)</code></pre>
<pre><code>[1] 6 7</code></pre>
<p><span class="warning">Note that, R base functions require more typing than dplyr::filter(), so we recommend dplyr solutions.</span></p>
</div>
<div id="select-random-rows-from-a-table" class="section level1">
<h1>Select random rows from a table</h1>
<br/>
<div class="block">
It’s possible to select either n random rows with the function <strong>sample_n</strong>() or a random fraction of rows with <strong>sample_frac</strong>().
</div>
<p><br/></p>
<p>We first use the function set.seed() to initiate random number generator engine. This important for users to reproduce the analysis.</p>
<pre class="r"><code>set.seed(1234)
# Extract 5 random rows without replacement
sample_n(my_data, 5, replace = FALSE)</code></pre>
<pre><code>Source: local data frame [5 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
         (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1          5.1         3.5          1.4         0.3     setosa
2          5.8         2.6          4.0         1.2 versicolor
3          5.5         2.6          4.4         1.2 versicolor
4          6.1         3.0          4.6         1.4 versicolor
5          7.2         3.2          6.0         1.8  virginica</code></pre>
<pre class="r"><code># Extract 5% of rows, randomly without replacement
sample_frac(my_data, 0.05, replace = FALSE)</code></pre>
<pre><code>Source: local data frame [8 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width    Species
         (dbl)       (dbl)        (dbl)       (dbl)     (fctr)
1          5.7         2.9          4.2         1.3 versicolor
2          4.9         3.0          1.4         0.2     setosa
3          4.9         3.1          1.5         0.2     setosa
4          6.2         2.9          4.3         1.3 versicolor
5          6.6         3.0          4.4         1.4 versicolor
6          6.3         3.3          6.0         2.5  virginica
7          6.0         2.9          4.5         1.5 versicolor
8          5.0         3.5          1.3         0.3     setosa</code></pre>
<p><span class="notice">Note that, it’s also possible to use the R base function <strong>sample</strong>(), but it requires more typing.</span></p>
<pre class="r"><code>set.seed(1234)
my_data[sample(1:nrow(my_data), 5, replace = FALSE), , drop = FALSE]</code></pre>
</div>
<div id="select-top-n-rows-ordered-by-a-variable" class="section level1">
<h1>Select top n rows ordered by a variable</h1>
<br/>
<div class="block">
As mentioned above, the function <strong>top_n</strong>(), can be used to select the top n entries in each group.
</div>
<p><br/></p>
<ul>
<li>The format is as follow:</li>
</ul>
<pre class="r"><code>top_n(x, n, wt)</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>x</strong>: Data table</li>
<li><strong>n</strong>: Number of rows to return. If x is grouped, this is the number of rows per group. May include more than n if there are ties.</li>
<li><strong>wt</strong>(Optional): The variable to use for ordering. If not specified, defaults to the last variable in the data table.</li>
</ul>
</div>
<p><br/></p>
<ul>
<li>Select the top 5 rows ordered by Sepal.Length</li>
</ul>
<pre class="r"><code>top_n(my_data, 5, Sepal.Length)</code></pre>
<pre><code>Source: local data frame [5 x 5]

  Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
         (dbl)       (dbl)        (dbl)       (dbl)    (fctr)
1          7.7         3.8          6.7         2.2 virginica
2          7.7         2.6          6.9         2.3 virginica
3          7.7         2.8          6.7         2.0 virginica
4          7.9         3.8          6.4         2.0 virginica
5          7.7         3.0          6.1         2.3 virginica</code></pre>
<ul>
<li>Group by the column Species and select the top 5 of each group ordered by Sepal.Length:</li>
</ul>
<pre class="r"><code>my_data %>% 
  group_by(Species) %>%
  top_n(5, Sepal.Length)</code></pre>
<p><span class="success">Note that, <strong>dplyr</strong> package allows to use the forward-pipe operator (<strong>%>%</strong>) for combining multiple operations. For example, <strong>x %>% f</strong> is equivalent to <strong>f(x)</strong>. The output of each operation is passed to the next operation.</span></p>
</div>
<div id="summary" class="section level1">
<h1>Summary</h1>
<br/>
<div class="block">
<ul>
<li><p>Filter rows by logical criteria: dplyr::filter(iris, Sepal.Length >7)</p></li>
<li><p>Select n random rows: dplyr::sample_n(iris, 10)</p></li>
<li><p>Select a random fraction of rows: dplyr::sample_frac(iris, 10)</p></li>
<li>Select top n rows by values: dplyr::top_n(iris, 10, Sepal.Length)</li>
</ul>
</div>
<p><br/></p>
</div>
<div id="related-articles" class="section level1">
<h1>Related articles</h1>
<ul>
<li>Previous chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Programming Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing Data into R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/exporting-data-from-r">Exporting Data from R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses">Preparing and Reshaping Data in R for Easier Analyses</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-columns-in-r">Reordering Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-rows-in-r">Reordering Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/renaming-data-frame-columns-in-r">Renaming Data Frame Columns in R</a></li>
</ul></li>
<li>Next chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-columns-in-r">Subsetting Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/identifying-and-removing-duplicate-data-in-r">Identifying and Removing Duplicate Data in R</a></li>
</ul></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using R (ver. 3.2.3). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->


<!-- END HTML -->]]></description>
			<pubDate>Thu, 14 Apr 2016 22:36:43 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Renaming Data Frame Columns in R]]></title>
			<link>https://www.sthda.com/english/wiki/renaming-data-frame-columns-in-r</link>
			<guid>https://www.sthda.com/english/wiki/renaming-data-frame-columns-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">


<div id="TOC">
<ul>
<li><a href="#pleleminary-tasks">Pleleminary tasks</a></li>
<li><a href="#install-and-load-dplyr-package-for-renaming-columns">Install and load dplyr package for renaming columns</a></li>
<li><a href="#renaming-columns-with-dplyrrename">Renaming columns with dplyr::rename()</a></li>
<li><a href="#renaming-columns-with-dplyrselect">Renaming columns with dplyr::select()</a></li>
<li><a href="#renaming-columns-with-r-base-functions">Renaming columns with R base functions</a></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#related-articles">Related articles</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p>
</p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong> as well as converting your data into a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">tibble data format</a>, which is the best and modern way to work with your data. We also described <a href="https://www.sthda.com/english/wiki/tidyr-crutial-step-reshaping-data-with-r-for%20easier-analyses">crutial steps to reshape your data with R</a> for easier analyses.</p>
<br/>
<div class="block">
Here, you we’ll learn how to <strong>rename</strong> the columns of a <strong>data frame</strong> in <strong>R</strong>.This can be done easily using the function <strong>rename</strong>() in <strong>dplyr</strong>. It’s also possible to use R base functions, but they require more typing.
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/rename-columns-data-frame-r.png" alt="Renaming Columns of a Data Table in R" /> <br/></p>
<div id="pleleminary-tasks" class="section level1">
<h1>Pleleminary tasks</h1>
<ol style="list-style-type: decimal">
<li><p><strong>Launch RStudio</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/running-rstudio-and-setting-up-your-working-directory-easy-r-programming">Running RStudio and setting up your working directory</a></p></li>
<li><p><strong>Prepare your data</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/best-practices-in-preparing-data-files-for-importing-into-r">Best practices for preparing your data</a> and save it in an external .txt tab or .csv files</p></li>
<li><p><strong>Import your data</strong> into <strong>R</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/fast-reading-of-data-from-txt-csv-files-into-r-readr-package">Fast reading of data from txt|csv files into R: readr package</a>.</p></li>
</ol>
<p><span class="success">Here, we’ll use the <a href="https://www.sthda.com/english/english/wiki/r-built-in-data-sets#iris">R built-in iris data set</a>, which we start by converting to a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-data"><strong>tibble data frame</strong> (<strong>tbl_df</strong>)</a>. Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.</span></p>
<pre class="r"><code># Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
</div>
<div id="install-and-load-dplyr-package-for-renaming-columns" class="section level1">
<h1>Install and load dplyr package for renaming columns</h1>
<ul>
<li>Install <strong>dplyr</strong></li>
</ul>
<pre class="r"><code>install.packages("dplyr")</code></pre>
<ul>
<li>Load <strong>dplyr</strong>:</li>
</ul>
<pre class="r"><code>library("dplyr")</code></pre>
</div>
<div id="renaming-columns-with-dplyrrename" class="section level1">
<h1>Renaming columns with dplyr::rename()</h1>
<ul>
<li>Rename the column Sepal.Length to sepal_length and Sepal.Width to sepal_width:</li>
</ul>
<pre class="r"><code>rename(my_data, sepal_length = Sepal.Length,
       sepal_width = Sepal.Width)</code></pre>
<pre><code>Source: local data frame [150 x 5]

   sepal_length sepal_width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
</div>
<div id="renaming-columns-with-dplyrselect" class="section level1">
<h1>Renaming columns with dplyr::select()</h1>
<p><span class="success">select() can be also used to rename variables as follow.</span></p>
<pre class="r"><code>select(my_data, sepal_length = Sepal.Length,
       sepal_width = Sepal.Width)</code></pre>
<pre><code>Source: local data frame [150 x 2]

   sepal_length sepal_width
          (dbl)       (dbl)
1           5.1         3.5
2           4.9         3.0
3           4.7         3.2
4           4.6         3.1
5           5.0         3.6
6           5.4         3.9
7           4.6         3.4
8           5.0         3.4
9           4.4         2.9
10          4.9         3.1
..          ...         ...</code></pre>
<p><span class="warning">Note that, select() keeps only the variables you mentioned. In order to to keep all, you can use the function <strong>rename</strong>(), which is an alternative of <strong>select</strong>().</span></p>
</div>
<div id="renaming-columns-with-r-base-functions" class="section level1">
<h1>Renaming columns with R base functions</h1>
<p>To rename the column Sepal.Length to sepal_length, the procedure is as follow:</p>
<ol style="list-style-type: decimal">
<li>Get column names using the function <strong>names</strong>() or <strong>colnames</strong>()</li>
<li>Change column names where name = Sepal.Length</li>
</ol>
<pre class="r"><code># get column names
colnames(my_data)</code></pre>
<pre><code>[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     </code></pre>
<pre class="r"><code># Rename column where names is "Sepal.Length"
names(my_data)[names(my_data) == "Sepal.Length"] <- "sepal_length"
names(my_data)[names(my_data) == "Sepal.Width"] <- "sepal_width"
my_data</code></pre>
<pre><code>Source: local data frame [150 x 5]

   sepal_length sepal_width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
<p><span class = "success">It’s also possible to rename by index in names vector as follow.<span></p>
<pre class="r"><code>names(my_data)[1] <- "sepal_length"
names(my_data)[2] <- "sepal_width"</code></pre>
</div>
<div id="summary" class="section level1">
<h1>Summary</h1>
<br/>
<div class="block">
To rename the column of a data frame, use the function <strong>rename</strong>()[in <strong>dplyr</strong> package].
</div>
<p><br/></p>
</div>
<div id="related-articles" class="section level1">
<h1>Related articles</h1>
<ul>
<li>Previous chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Programming Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing Data into R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/exporting-data-from-r">Exporting Data from R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses">Preparing and Reshaping Data in R for Easier Analyses</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frames-columns-in-r">Reordering Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-rows-in-r">Reordering Data Frame Rows in R</a></li>
</ul></li>
<li>Next chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-rows-in-r">Subsetting Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-columns-in-r">Subsetting Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/identifying-and-removing-duplicate-data-in-r">Identifying and Removing Duplicate Data in R</a></li>
</ul></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using R (ver. 3.2.3). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Thu, 14 Apr 2016 22:25:10 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Reordering Data Frame Rows in R]]></title>
			<link>https://www.sthda.com/english/wiki/reordering-data-frame-rows-in-r</link>
			<guid>https://www.sthda.com/english/wiki/reordering-data-frame-rows-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#pleleminary-tasks">Pleleminary tasks</a></li>
<li><a href="#install-and-load-dplyr-package">Install and load dplyr package</a></li>
<li><a href="#reorder-rows-with-dplyrarrange">Reorder rows with dplyr::arrange()</a></li>
<li><a href="#reorder-rows-with-r-base-function-order">Reorder rows with R base function order()</a></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#related-articles">Related articles</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p>
</p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong> as well as converting your data into a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">tibble data format</a>, which is the best and modern way to work with your data. We also described <a href="https://www.sthda.com/english/wiki/tidyr-crutial-step-reshaping-data-with-r-for%20easier-analyses">crutial steps to reshape your data with R</a> for easier analyses.</p>
<br/>
<div class="block">
Here, you we’ll learn how to <strong>reorder</strong> (i.e., <strong>sort</strong>) rows, in your data table, by the value of one or more columns (i.e., variables). This can be done using either the <strong>R</strong> base function <strong>order</strong>() or the modern function <strong>arrange</strong>()[in <strong>dplyr</strong> package]. We recommend dplyr::arrange() because it requires less typing.
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/reorder-rows-data-table-r.png" alt="Reordering Data Frame Rows by Variables in R" /> <br/></p>
<div id="pleleminary-tasks" class="section level1">
<h1>Pleleminary tasks</h1>
<ol style="list-style-type: decimal">
<li><p><strong>Launch RStudio</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/running-rstudio-and-setting-up-your-working-directory-easy-r-programming">Running RStudio and setting up your working directory</a></p></li>
<li><p><strong>Prepare your data</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/best-practices-in-preparing-data-files-for-importing-into-r">Best practices for preparing your data</a> and save it in an external .txt tab or .csv files</p></li>
<li><p><strong>Import your data</strong> into <strong>R</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/fast-reading-of-data-from-txt-csv-files-into-r-readr-package">Fast reading of data from txt|csv files into R: readr package</a>.</p></li>
</ol>
<p><span class="success">Here, we’ll use the <a href="https://www.sthda.com/english/english/wiki/r-built-in-data-sets#iris">R built-in iris data set</a>, which we start by converting to a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-data"><strong>tibble data frame</strong> (<strong>tbl_df</strong>)</a>. Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.</span></p>
<pre class="r"><code># Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
</div>
<div id="install-and-load-dplyr-package" class="section level1">
<h1>Install and load dplyr package</h1>
<ul>
<li>Install <strong>dplyr</strong></li>
</ul>
<pre class="r"><code>install.packages("dplyr")</code></pre>
<ul>
<li>Load <strong>dplyr</strong>:</li>
</ul>
<pre class="r"><code>library("dplyr")</code></pre>
</div>
<div id="reorder-rows-with-dplyrarrange" class="section level1">
<h1>Reorder rows with dplyr::arrange()</h1>
<br/>
<div class="block">
The dplyr function <strong>arrange</strong>() can be used to <strong>reorder</strong> (<strong>sort</strong>) rows by one or more variables.
</div>
<p><br/></p>
<ul>
<li><strong>Reorder rows</strong> by Sepal.Length in <strong>ascending</strong> order</li>
</ul>
<pre class="r"><code>arrange(my_data, Sepal.Length)</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           4.3         3.0          1.1         0.1  setosa
2           4.4         2.9          1.4         0.2  setosa
3           4.4         3.0          1.3         0.2  setosa
4           4.4         3.2          1.3         0.2  setosa
5           4.5         2.3          1.3         0.3  setosa
6           4.6         3.1          1.5         0.2  setosa
7           4.6         3.4          1.4         0.3  setosa
8           4.6         3.6          1.0         0.2  setosa
9           4.6         3.2          1.4         0.2  setosa
10          4.7         3.2          1.3         0.2  setosa
..          ...         ...          ...         ...     ...</code></pre>
<ul>
<li><strong>Reorder rows</strong> by Sepal.Length in <strong>descending</strong> order. Use the function <strong>desc</strong>():</li>
</ul>
<pre class="r"><code>arrange(my_data, desc(Sepal.Length))</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width   Species
          (dbl)       (dbl)        (dbl)       (dbl)    (fctr)
1           7.9         3.8          6.4         2.0 virginica
2           7.7         3.8          6.7         2.2 virginica
3           7.7         2.6          6.9         2.3 virginica
4           7.7         2.8          6.7         2.0 virginica
5           7.7         3.0          6.1         2.3 virginica
6           7.6         3.0          6.6         2.1 virginica
7           7.4         2.8          6.1         1.9 virginica
8           7.3         2.9          6.3         1.8 virginica
9           7.2         3.6          6.1         2.5 virginica
10          7.2         3.2          6.0         1.8 virginica
..          ...         ...          ...         ...       ...</code></pre>
<p><span class="success">Instead of using the function <strong>desc</strong>(), you can prepend the sorting variable by a minus sign to indicate descending order, as follow.</span></p>
<pre class="r"><code>arrange(my_data, -Sepal.Length)</code></pre>
<ul>
<li><strong>Reorder rows</strong> by multiple variables: Sepal.Length and Sepal.width</li>
</ul>
<pre class="r"><code>arrange(my_data, Sepal.Length, Sepal.Width)</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          (dbl)       (dbl)        (dbl)       (dbl)  (fctr)
1           4.3         3.0          1.1         0.1  setosa
2           4.4         2.9          1.4         0.2  setosa
3           4.4         3.0          1.3         0.2  setosa
4           4.4         3.2          1.3         0.2  setosa
5           4.5         2.3          1.3         0.3  setosa
6           4.6         3.1          1.5         0.2  setosa
7           4.6         3.2          1.4         0.2  setosa
8           4.6         3.4          1.4         0.3  setosa
9           4.6         3.6          1.0         0.2  setosa
10          4.7         3.2          1.3         0.2  setosa
..          ...         ...          ...         ...     ...</code></pre>
<p><span class="notice">If the data contain missing values, they will always come at the end.</span></p>
<p><span class="success">dplyr::<strong>arrange</strong>() is the homologous of R base function <strong>order</strong>(). It requires less typing.</span></p>
</div>
<div id="reorder-rows-with-r-base-function-order" class="section level1">
<h1>Reorder rows with R base function order()</h1>
<ul>
<li><strong>Reorder rows</strong> by Sepal.Length in <strong>ascending</strong> order</li>
</ul>
<pre class="r"><code>my_data[order(my_data$Sepal.Length), , drop = FALSE]</code></pre>
<ul>
<li><strong>Reorder rows</strong> by Sepal.Length in <strong>descending</strong> order. Use the additional argument <strong>decreasing = TRUE</strong>:</li>
</ul>
<pre class="r"><code>row_order <- order(my_data$Sepal.Length, decreasing = TRUE)
my_data[row_order, , drop = FALSE]</code></pre>
</div>
<div id="summary" class="section level1">
<h1>Summary</h1>
<br/>
<div class="block">
To order rows by values of a column use the function <strong>arrange</strong>()[in <strong>dplyr</strong> package].
</div>
<p><br/></p>
</div>
<div id="related-articles" class="section level1">
<h1>Related articles</h1>
<ul>
<li>Previous chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Programming Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing Data into R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/exporting-data-from-r">Exporting Data from R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses">Preparing and Reshaping Data in R for Easier Analyses</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-columns-in-r">Reordering Data Frame Columns in R</a></li>
</ul></li>
<li>Next chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/renaming-data-frame-columns-in-r">Renaming Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-rows-in-r">Subsetting Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-columns-in-r">Subsetting Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/identifying-and-removing-duplicate-data-in-r">Identifying and Removing Duplicate Data in R</a></li>
</ul></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using R (ver. 3.2.3). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->


<!-- END HTML -->]]></description>
			<pubDate>Thu, 14 Apr 2016 21:42:21 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Reordering Data Frame Columns in R]]></title>
			<link>https://www.sthda.com/english/wiki/reordering-data-frame-columns-in-r</link>
			<guid>https://www.sthda.com/english/wiki/reordering-data-frame-columns-in-r</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#pleleminary-tasks">Pleleminary tasks</a></li>
<li><a href="#reorder-column-by-position">Reorder column by position</a></li>
<li><a href="#reorder-column-by-name">Reorder column by name</a></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#related-articles">Related articles</a></li>
<li><a href="#references">References</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p>
</p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong> as well as converting your data into a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">tibble data format</a>, which is the best and modern way to work with your data. We also described <a href="https://www.sthda.com/english/wiki/tidyr-crutial-step-reshaping-data-with-r-for%20easier-analyses">crutial steps to reshape your data with R</a> for easier analyses.</p>
<br/>
<div class="block">
Here, you we’ll learn how to <strong>reorder columns</strong>, in your data table, by either column positions or column names.
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/reordering-data-frame-columns.png" alt="Reordering Data Table Columns in R" /> <br/></p>
<div id="pleleminary-tasks" class="section level1">
<h1>Pleleminary tasks</h1>
<ol style="list-style-type: decimal">
<li><p><strong>Launch RStudio</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/running-rstudio-and-setting-up-your-working-directory-easy-r-programming">Running RStudio and setting up your working directory</a></p></li>
<li><p><strong>Prepare your data</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/best-practices-in-preparing-data-files-for-importing-into-r">Best practices for preparing your data</a> and save it in an external .txt tab or .csv files</p></li>
<li><p><strong>Import your data</strong> into <strong>R</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/fast-reading-of-data-from-txt-csv-files-into-r-readr-package">Fast reading of data from txt|csv files into R: readr package</a>.</p></li>
</ol>
<p><span class="success">Here, we’ll use the <a href="https://www.sthda.com/english/english/wiki/r-built-in-data-sets#iris">R built-in iris data set</a>, which we start by converting to a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-data"><strong>tibble data frame</strong> (<strong>tbl_df</strong>)</a>. Tibble is a modern rethinking of data frame providing a nicer printing method. This is useful when working with large data sets.</span></p>
<pre class="r"><code># Create my_data
my_data <- iris

# Convert to a tibble
library("tibble")
my_data <- as_data_frame(my_data)

# Print
my_data</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
</div>
<div id="reorder-column-by-position" class="section level1">
<h1>Reorder column by position</h1>
<pre class="r"><code># Get column names
colnames(my_data)</code></pre>
<pre><code>[1] "Sepal.Length" "Sepal.Width"  "Petal.Length" "Petal.Width"  "Species"     </code></pre>
<p>my_data contains 5 columns ordered as follow:</p>
<ol style="list-style-type: decimal">
<li>Sepal.Length</li>
<li>Sepal.Width</li>
<li>Petal.Length</li>
<li>Petal.Width</li>
<li>Species</li>
</ol>
<p>But we want:</p>
<ul>
<li>the variable “Species” to be the first column (1)</li>
<li>the variable “Petal.Width” to be the second column (2)</li>
</ul>
<p>It’s possible to reorder the column by position as follow:</p>
<pre class="r"><code>my_data2 <- my_data[, c(5, 4, 1, 2, 3)]
my_data2</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Species Petal.Width Sepal.Length Sepal.Width Petal.Length
    <fctr>       <dbl>        <dbl>       <dbl>        <dbl>
1   setosa         0.2          5.1         3.5          1.4
2   setosa         0.2          4.9         3.0          1.4
3   setosa         0.2          4.7         3.2          1.3
4   setosa         0.2          4.6         3.1          1.5
5   setosa         0.2          5.0         3.6          1.4
6   setosa         0.4          5.4         3.9          1.7
7   setosa         0.3          4.6         3.4          1.4
8   setosa         0.2          5.0         3.4          1.5
9   setosa         0.2          4.4         2.9          1.4
10  setosa         0.1          4.9         3.1          1.5
..     ...         ...          ...         ...          ...</code></pre>
</div>
<div id="reorder-column-by-name" class="section level1">
<h1>Reorder column by name</h1>
<pre class="r"><code>col_order <- c("Species", "Petal.Width", "Sepal.Length",
               "Sepal.Width", "Petal.Length")

my_data2 <- my_data[, col_order]
my_data2</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Species Petal.Width Sepal.Length Sepal.Width Petal.Length
    <fctr>       <dbl>        <dbl>       <dbl>        <dbl>
1   setosa         0.2          5.1         3.5          1.4
2   setosa         0.2          4.9         3.0          1.4
3   setosa         0.2          4.7         3.2          1.3
4   setosa         0.2          4.6         3.1          1.5
5   setosa         0.2          5.0         3.6          1.4
6   setosa         0.4          5.4         3.9          1.7
7   setosa         0.3          4.6         3.4          1.4
8   setosa         0.2          5.0         3.4          1.5
9   setosa         0.2          4.4         2.9          1.4
10  setosa         0.1          4.9         3.1          1.5
..     ...         ...          ...         ...          ...</code></pre>
</div>
<div id="summary" class="section level1">
<h1>Summary</h1>
<br/>
<div class="block">
It’s possible to reorder columns by either column position (i.e., number) or column names.
</div>
<p><br/></p>
</div>
<div id="related-articles" class="section level1">
<h1>Related articles</h1>
<ul>
<li>Previous chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Programming Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing Data into R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/exporting-data-from-r">Exporting Data from R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses">Preparing and Reshaping Data in R for Easier Analyses</a></li>
</ul></li>
<li>Next chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frame-rows-in-r">Reordering Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/reordering-data-frames-columns-in-r">Reordering Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/renaming-data-frame-columns-in-r">Renaming Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-rows-in-r">Subsetting Data Frame Rows in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/subsetting-data-frame-columns-in-r">Subsetting Data Frame Columns in R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/identifying-and-removing-duplicate-data-in-r">Identifying and Removing Duplicate Data in R</a></li>
</ul></li>
</ul>
</div>
<div id="references" class="section level1">
<h1>References</h1>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using R (ver. 3.2.3). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Thu, 14 Apr 2016 21:26:35 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Data Manipulation in R]]></title>
			<link>https://www.sthda.com/english/wiki/data-manipulation-in-r</link>
			<guid>https://www.sthda.com/english/wiki/data-manipulation-in-r</guid>
			<description><![CDATA[Read the articles below]]></description>
			<pubDate>Thu, 14 Apr 2016 18:39:11 +0200</pubDate>
			
		</item>
		
	</channel>
</rss>
