<?xml version="1.0" encoding="UTF-8" ?>
<!-- RSS generated by PHPBoost on Thu, 16 Apr 2026 18:36:57 +0200 -->

<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
	<channel>
		<title><![CDATA[Easy Guides]]></title>
		<atom:link href="https://www.sthda.com/english/syndication/rss/wiki/40" rel="self" type="application/rss+xml"/>
		<link>https://www.sthda.com</link>
		<description><![CDATA[Last articles of the category: Preparing and Reshaping Data in R for Easier Analyses]]></description>
		<copyright>(C) 2005-2026 PHPBoost</copyright>
		<language>en</language>
		<generator>PHPBoost</generator>
		
		
		<item>
			<title><![CDATA[Preparing and Reshaping Data in R for Easier Analyses]]></title>
			<link>https://www.sthda.com/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses</link>
			<guid>https://www.sthda.com/english/wiki/preparing-and-reshaping-data-in-r-for-easier-analyses</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">


<p><br/></p>
<style>#rdoc .course_material a{font-size:1.5em;} #rdoc .readmore a{font-size:1em;}</style>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong>. The next crucial step is to set your data into a consistent data structure for easier analyses. Here, you’ll learn modern conventions for <strong>preparing</strong> and <strong>reshaping</strong> <strong>data</strong> in order to facilitate analyses in R.</p>
<div class="block">
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">Tibble Data Format in R: Best and Modern Way to Work with your Data</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/tidyr-crucial-step-reshaping-data-with-r-for-easier-analyses">Tidyr: crucial Step Reshaping Data with R for Easier Analyses</a></li>
</ul>
</div>
<p><br/></p>
<img src="https://www.sthda.com/english/sthda/RDoc/images/preparing-reshaping-r-data-analyses.png" alt="Importing data into R" /> <br/>
<hr/>
<p><br/></p>
<br/>
<div class="course_material">
<ol style="list-style-type: decimal">
<li><a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data"><i class="fa fa-folder-open"></i> <strong>Tibble Data Format in R: Best and Modern Way to Work with your Data</strong></a></li>
</ol>
<ul>
<li>Installing and loading tibble package: type <strong>install.packages</strong>(“tibble”) for installing and <strong>library</strong>(“tibble”) for loading.</li>
<li>Create a new tibble: <strong>data_frame</strong>(x = rnorm(100), y = rnorm(100)).</li>
<li>Convert your data as a tibble: <strong>as_data_frame</strong>(iris)</li>
<li>Advantages of tibbles compared to data frames: <strong>nice printing methods</strong> for large data sets, specification of <strong>column types</strong>.</li>
</ul>
<p><br/>
<img src="https://www.sthda.com/english/sthda/RDoc/images/tibble-data-format-tbl_df.png" alt="tibble data format: tbl_df" /></p>
<p><span class="success readmore">Read more: <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data"><i class="fa fa-play"></i> Tibble Data Format in R: Best and Modern Way to Work with your Data</a></span> 
</p>
<ol start="2" style="list-style-type: decimal">
<li><a href="https://www.sthda.com/english/english/wiki/tidyr-crucial-step-reshaping-data-with-r-for-easier-analyses"><i class="fa fa-folder-open"></i> <strong>Tidyr: crucial Step Reshaping Data with R for Easier Analyses</strong></a></li>
</ol>
<ul>
<li>What is a tidy data set?: a data structure convention where each column is a variable and each row an observation</li>
<li>Reshaping data using tidyr package
<ul>
<li>Installing and loading tidyr: type <strong>install.packages</strong>(“tidyr”) for installing and <strong>library</strong>(“tidyr”) for loading.</li>
<li>Example data sets: USArrests</li>
<li><strong>gather</strong>(): collapse columns into rows</li>
<li><strong>spread</strong>(): spread two columns into multiple columns</li>
<li><strong>unite</strong>(): Unite multiple columns into one</li>
<li><strong>separate</strong>(): separate one column into multiple</li>
<li><strong>%>%</strong>: Chaining multiple operations</li>
</ul></li>
</ul>
<p><br/>
<img src="https://www.sthda.com/english/sthda/RDoc/images/tidyr.png" alt="Tidyr: crucial Step Reshaping Data with R for Easier Analyses" /></p>
<p><span class="success readmore">Read more: <a href="https://www.sthda.com/english/english/wiki/tidyr-crucial-step-reshaping-data-with-r-for-easier-analyses"><i class="fa fa-play"></i> Tidyr: crucial Step Reshaping Data with R for Easier Analyses</a></span></p>
<p>
</p>
</div>
<p><br/></p>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->

<!-- END HTML -->]]></description>
			<pubDate>Mon, 17 Oct 2016 03:43:56 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Tidyr: Crucial Step Reshaping Data with R for Easier Analyses]]></title>
			<link>https://www.sthda.com/english/wiki/tidyr-crucial-step-reshaping-data-with-r-for-easier-analyses</link>
			<guid>https://www.sthda.com/english/wiki/tidyr-crucial-step-reshaping-data-with-r-for-easier-analyses</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">

<div id="TOC">
<ul>
<li><a href="#what-is-a-tidy-data-set">What is a tidy data set?</a></li>
<li><a href="#preleminary-tasks">Preleminary tasks</a></li>
<li><a href="#reshaping-data-using-tidyr-package">Reshaping data using tidyr package</a><ul>
<li><a href="#installing-and-loading-tidyr">Installing and loading tidyr</a></li>
<li><a href="#example-data-sets">Example data sets</a></li>
<li><a href="#gather-collapse-columns-into-rows">gather(): collapse columns into rows</a></li>
<li><a href="#spread-spread-two-columns-into-multiple-columns">spread(): spread two columns into multiple columns</a></li>
<li><a href="#unite-unite-multiple-columns-into-one">unite(): Unite multiple columns into one</a></li>
<li><a href="#separate-separate-one-column-into-multiple">separate(): separate one column into multiple</a></li>
<li><a href="#chaining-multiple-operations">Chaining multiple operations</a></li>
</ul></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#related-articles">Related articles</a></li>
<li><a href="#references">References</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p>
</p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong> as well as converting your data into a <a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">tibble data format</a>, which is the best and modern way to work with your data.</p>
<br/>
<div class="block">
Here, you we’ll learn how to <strong>organize</strong> (or <strong>reshape</strong>) your data in order to make the analysis easier. This process is called <strong>tidying your data</strong>.
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/tidyr.png" alt="Tidyr: Crutial Step Reshaping Data with R for Easier Analyses" /> <br/> <span class="small">[Figure adapted from RStudio data wrangling cheatsheet (see reference section)]</span></p>
<div id="what-is-a-tidy-data-set" class="section level1">
<h1>What is a tidy data set?</h1>
<p>A data set is called <strong>tidy</strong> when:</p>
<ul>
<li>each column represents a variable</li>
<li>and each row represents an observation</li>
</ul>
<p><span class="notice"">The opposite of <strong>tidy</strong> is <strong>messy data</strong>, which corresponds to any other arrangement of the data.</span></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/tidy-data.png" alt="Tidy data" /></p>
<p><span class="error">Having your data in <strong>tidy</strong> format is crucial for facilitating the tasks of data analysis including <strong>data manipulation</strong>, <strong>modeling</strong> and <strong>visualization</strong>.</span></p>
<p><span class="success">The R package <strong>tidyr</strong>, developed by Hadley Wickham, provides functions to help you <strong>organize</strong> (or <strong>reshape</strong>) your data set into <strong>tidy</strong> format. It’s particularly designed to work in combination with <strong>magrittr</strong> and <strong>dplyr</strong> to build a solid data analysis pipeline.</span></p>
</div>
<div id="preleminary-tasks" class="section level1">
<h1>Preleminary tasks</h1>
<ol style="list-style-type: decimal">
<li><p><strong>Launch RStudio</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/running-rstudio-and-setting-up-your-working-directory-easy-r-programming">Running RStudio and setting up your working directory</a></p></li>
<li><p><strong>Import your data</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing data into R</a></p></li>
</ol>
</div>
<div id="reshaping-data-using-tidyr-package" class="section level1">
<h1>Reshaping data using tidyr package</h1>
<p>The <strong>tidyr</strong> package, provides four functions to help you <strong>change the layout</strong> of your data set:</p>
<ul>
<li><strong>gather</strong>(): gather (collapse) columns into rows</li>
<li><strong>spread</strong>(): spread rows into columns</li>
<li><strong>separate</strong>(): separate one column into multiple</li>
<li><strong>unite</strong>(): unite multiple columns into one</li>
</ul>
<div id="installing-and-loading-tidyr" class="section level2">
<h2>Installing and loading tidyr</h2>
<pre class="r"><code># Installing
install.packages("tidyr")

# Loading
library("tidyr")</code></pre>
</div>
<div id="example-data-sets" class="section level2">
<h2>Example data sets</h2>
<p>We’ll use the <a href="https://www.sthda.com/english/english/wiki/r-built-in-data-sets#usarrests">R built-in USArrests data sets</a>. We start by subsetting a small data set, which will be used in the next sections as an example data set:</p>
<pre class="r"><code>my_data <- USArrests[c(1, 10, 20, 30), ]
my_data</code></pre>
<pre><code>           Murder Assault UrbanPop Rape
Alabama      13.2     236       58 21.2
Georgia      17.4     211       60 25.8
Maryland     11.3     300       67 27.8
New Jersey    7.4     159       89 18.8</code></pre>
<p>Row names are states, so let’s use the function cbind() to add a column named “state” in the data. This will make the data tidy and the analysis easier.</p>
<pre class="r"><code>my_data <- cbind(state = rownames(my_data), my_data)
my_data</code></pre>
<pre><code>                state Murder Assault UrbanPop Rape
Alabama       Alabama   13.2     236       58 21.2
Georgia       Georgia   17.4     211       60 25.8
Maryland     Maryland   11.3     300       67 27.8
New Jersey New Jersey    7.4     159       89 18.8</code></pre>
</div>
<div id="gather-collapse-columns-into-rows" class="section level2">
<h2>gather(): collapse columns into rows</h2>
<br/>
<div class="block">
The function <strong>gather</strong>() collapses multiple columns into key-value pairs. It produces a “long” data format from a “wide” one. It’s an alternative of <strong>melt</strong>() function [in <strong>reshape2</strong> package].
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/tidyr-gather.png" alt="tidyr gather" /></p>
<ol style="list-style-type: decimal">
<li><strong>Simplified format</strong>:</li>
</ol>
<pre class="r"><code>gather(data, key, value, ...)</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>data</strong>: A data frame</li>
<li><strong>key, value</strong>: Names of key and value columns to create in output</li>
<li><strong>…</strong>: Specification of columns to gather. Allowed values are:
<ul>
<li>variable names</li>
<li>if you want to select all variables between a and e, use a:e</li>
<li>if you want to exclude a column name y use -y</li>
<li>for more options, see: dplyr::select()</li>
</ul></li>
</ul>
</div>
<p><br/></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Examples of usage</strong>:</li>
</ol>
<ul>
<li>Gather all columns except the column state</li>
</ul>
<pre class="r"><code>my_data2 <- gather(my_data,
                   key = "arrest_attribute",
                   value = "arrest_estimate",
                   -state)
my_data2</code></pre>
<pre><code>        state arrest_attribute arrest_estimate
1     Alabama           Murder            13.2
2     Georgia           Murder            17.4
3    Maryland           Murder            11.3
4  New Jersey           Murder             7.4
5     Alabama          Assault           236.0
6     Georgia          Assault           211.0
7    Maryland          Assault           300.0
8  New Jersey          Assault           159.0
9     Alabama         UrbanPop            58.0
10    Georgia         UrbanPop            60.0
11   Maryland         UrbanPop            67.0
12 New Jersey         UrbanPop            89.0
13    Alabama             Rape            21.2
14    Georgia             Rape            25.8
15   Maryland             Rape            27.8
16 New Jersey             Rape            18.8</code></pre>
<p><span class="success">Note that, all column names (except state) have been collapsed into a single key column (here “arrest_attribute”). Their values have been put into a value column (here “arrest_estimate”).</span></p>
<ul>
<li>Gather only Murder and Assault columns</li>
</ul>
<pre class="r"><code>my_data2 <- gather(my_data,
                   key = "arrest_attribute",
                   value = "arrest_estimate",
                   Murder, Assault)
my_data2</code></pre>
<pre><code>       state UrbanPop Rape arrest_attribute arrest_estimate
1    Alabama       58 21.2           Murder            13.2
2    Georgia       60 25.8           Murder            17.4
3   Maryland       67 27.8           Murder            11.3
4 New Jersey       89 18.8           Murder             7.4
5    Alabama       58 21.2          Assault           236.0
6    Georgia       60 25.8          Assault           211.0
7   Maryland       67 27.8          Assault           300.0
8 New Jersey       89 18.8          Assault           159.0</code></pre>
<p><span class="warning">Note that, the two columns Murder and Assault have been collapsed and the remaining columns (state, UrbanPop and Rape) have been duplicated.</span></p>
<ul>
<li>Gather all variables between Murder and UrbanPop</li>
</ul>
<pre class="r"><code>my_data2 <- gather(my_data,
                   key = "arrest_attribute",
                   value = "arrest_estimate",
                   Murder:UrbanPop)
my_data2</code></pre>
<pre><code>        state Rape arrest_attribute arrest_estimate
1     Alabama 21.2           Murder            13.2
2     Georgia 25.8           Murder            17.4
3    Maryland 27.8           Murder            11.3
4  New Jersey 18.8           Murder             7.4
5     Alabama 21.2          Assault           236.0
6     Georgia 25.8          Assault           211.0
7    Maryland 27.8          Assault           300.0
8  New Jersey 18.8          Assault           159.0
9     Alabama 21.2         UrbanPop            58.0
10    Georgia 25.8         UrbanPop            60.0
11   Maryland 27.8         UrbanPop            67.0
12 New Jersey 18.8         UrbanPop            89.0</code></pre>
<p><span class="warning">The remaining state column is duplicated.</span></p>
<ol start="3" style="list-style-type: decimal">
<li><strong>How to use gather() programmatically inside an R function</strong>?</li>
</ol>
<p><span class="error">You should use the function <strong>gather_</strong>() which takes character vectors, containing column names, instead of unquoted column names</span></p>
<p>The simplified syntax is as follow:</p>
<pre class="r"><code>gather_(data, key_col, value_col, gather_cols)</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>data</strong>: a data frame</li>
<li><strong>key_col</strong>, <strong>value_col</strong>: Strings specifying the names of key and value columns to create</li>
<li><strong>gather_cols</strong>: Character vector specifying column names to be gathered together into pair of key-value columns.</li>
</ul>
</div>
<p><br/></p>
<p>As an example, type this:</p>
<pre class="r"><code>gather_(my_data,
       key_col = "arrest_attribute",
       value_col = "arrest_estimate",
       gather_cols = c("Murder", "Assault"))</code></pre>
</div>
<div id="spread-spread-two-columns-into-multiple-columns" class="section level2">
<h2>spread(): spread two columns into multiple columns</h2>
<br/>
<div class="block">
The function <strong>spread</strong>() does the reverse of <strong>gather</strong>(). It takes two columns (key and value) and spreads into multiple columns. It produces a “wide” data format from a “long” one. It’s an alternative of the function <strong>cast</strong>() [in <strong>reshape2</strong> package].
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/tidyr-spread.png" alt="tidyr spread" /></p>
<ol style="list-style-type: decimal">
<li><strong>Simplified format</strong>:</li>
</ol>
<pre class="r"><code>spread(data, key, value)</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>data</strong>: A data frame</li>
<li><strong>key</strong>: The (unquoted) name of the column whose values will be used as column headings.</li>
<li><strong>value</strong>:The (unquoted) names of the column whose values will populate the cells.</li>
</ul>
</div>
<p><br/></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Examples of usage</strong>:</li>
</ol>
<p>Spread “my_data2” to turn back to the original data:</p>
<pre class="r"><code>my_data3 <- spread(my_data2, 
                   key = "arrest_attribute",
                   value = "arrest_estimate"
                   )
my_data3</code></pre>
<pre><code>       state Rape Assault Murder UrbanPop
1    Alabama 21.2     236   13.2       58
2    Georgia 25.8     211   17.4       60
3   Maryland 27.8     300   11.3       67
4 New Jersey 18.8     159    7.4       89</code></pre>
<ol start="3" style="list-style-type: decimal">
<li><strong>How to use spread() programmatically inside an R function</strong>?</li>
</ol>
<p><span class="error">You should use the function <strong>spread_</strong>() which takes strings specifying key and value columns instead of unquoted column names</span></p>
<p>The simplified syntax is as follow:</p>
<pre class="r"><code>spread_(data, key_col, value_col)</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>data</strong>: a data frame.</li>
<li><strong>key_col</strong>, <strong>value_col</strong>: Strings specifying the names of key and value columns.</li>
</ul>
</div>
<p><br/></p>
<p>As an example, type this:</p>
<pre class="r"><code>spread_(my_data2, 
       key = "arrest_attribute",
       value = "arrest_estimate"
       )</code></pre>
</div>
<div id="unite-unite-multiple-columns-into-one" class="section level2">
<h2>unite(): Unite multiple columns into one</h2>
<br/>
<div class="block">
The function <strong>unite</strong>() takes multiple columns and paste them together into one.
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/tidyr-unite.png" alt="tidyr unite" /></p>
<ol style="list-style-type: decimal">
<li><strong>Simplified format</strong>:</li>
</ol>
<pre class="r"><code>unite(data, col, ..., sep = "_")</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>data</strong>: A data frame</li>
<li><strong>col</strong>: The new (unquoted) name of column to add.</li>
<li><strong>sep</strong>: Separator to use between values</li>
</ul>
</div>
<p><br/></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Examples of usage</strong>:</li>
</ol>
<p>The R code below uses the data set “my_data” and unites the columns Murder and Assault</p>
<pre class="r"><code>my_data4 <- unite(my_data,
                  col = "Murder_Assault",
                  Murder, Assault,
                  sep = "_")
my_data4</code></pre>
<pre><code>                state Murder_Assault UrbanPop Rape
Alabama       Alabama       13.2_236       58 21.2
Georgia       Georgia       17.4_211       60 25.8
Maryland     Maryland       11.3_300       67 27.8
New Jersey New Jersey        7.4_159       89 18.8</code></pre>
<ol start="3" style="list-style-type: decimal">
<li><strong>How to use unite() programmatically inside an R function</strong>?</li>
</ol>
<p><span class="error">You should use the function <strong>unite_</strong>() as follow.</span></p>
<pre class="r"><code>unite_(data, col, from, sep = "_")</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>data</strong>: A data frame.</li>
<li><strong>col</strong>: String giving the name of the new column to be added</li>
<li><strong>from</strong>: Character vector specifying the names of existing columns to be united</li>
<li><strong>sep</strong>: Separator to use between values.</li>
</ul>
</div>
<p><br/></p>
<p>As an example, type this:</p>
<pre class="r"><code>unite_(my_data,
    col = "Murder_Assault",
    from = c("Murder", "Assault"),
    sep = "_")</code></pre>
</div>
<div id="separate-separate-one-column-into-multiple" class="section level2">
<h2>separate(): separate one column into multiple</h2>
<br/>
<div class="block">
The function <strong>sperate</strong>() is the reverse of <strong>unite</strong>(). It takes values inside a single character column and separates them into multiple columns.
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/tidyr-separate.png" alt="tidyr separate" /></p>
<ol style="list-style-type: decimal">
<li><strong>Simplified format</strong>:</li>
</ol>
<pre class="r"><code>separate(data, col, into, sep = "[^[:alnum:]]+")</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>data</strong>: A data frame</li>
<li><strong>col</strong>: Unquoted column names</li>
<li><strong>into</strong>: Character vector specifying the names of new variables to be created.</li>
<li><strong>sep</strong>: Separator between columns:
<ul>
<li>If character, is interpreted as a regular expression.</li>
<li>If numeric, interpreted as positions to split at. Positive values start at 1 at the far-left of the string; negative value start at -1 at the far-right of the string.</li>
</ul></li>
</ul>
</div>
<p><br/></p>
<ol start="2" style="list-style-type: decimal">
<li><strong>Examples of usage</strong>:</li>
</ol>
<p>Separate the column “Murder_Assault” [in my_data4] into two columns Murder and Assault:</p>
<pre class="r"><code>separate(my_data4,
         col = "Murder_Assault",
         into = c("Murder", "Assault"),
         sep = "_")</code></pre>
<pre><code>                state Murder Assault UrbanPop Rape
Alabama       Alabama   13.2     236       58 21.2
Georgia       Georgia   17.4     211       60 25.8
Maryland     Maryland   11.3     300       67 27.8
New Jersey New Jersey    7.4     159       89 18.8</code></pre>
<ol start="3" style="list-style-type: decimal">
<li><strong>How to use separate() programmatically inside an R function</strong>?</li>
</ol>
<p><span class="error">You should use the function <strong>separate_</strong>() as follow.</span></p>
<pre class="r"><code>separate_(data, col, into, sep = "[^[:alnum:]]+")</code></pre>
<br/>
<div class="block">
<ul>
<li><strong>data</strong>: A data frame.</li>
<li><strong>col</strong>: String giving the name of the column to split</li>
<li><strong>into</strong>: Character vector specifying the names of new columns to create</li>
<li><strong>sep</strong>: Separator between columns (as above).</li>
</ul>
</div>
<p><br/></p>
<p>As an example, type this:</p>
<pre class="r"><code>separate_(my_data4,
         col = "Murder_Assault",
         into = c("Murder", "Assault"),
         sep = "_")</code></pre>
</div>
<div id="chaining-multiple-operations" class="section level2">
<h2>Chaining multiple operations</h2>
<p>It’s possible to combine multiple operations using <strong>maggrittr</strong> forward-pipe operator : <strong>%>%</strong>.</p>
<p><span class="success">For example, <strong>x %>% f</strong> is equivalent to <strong>f(x)</strong>. </span></p>
<p>In the following R code:</p>
<ul>
<li>first, my_data is passed to gather() function</li>
<li>next, the output of gather() is passed to unite() function</li>
</ul>
<pre class="r"><code>my_data %>% gather(key = "arrest_attribute",
                   value = "arrest_estimate",
                   Murder:UrbanPop) %>%
            unite(col = "attribute_estimate",
                  arrest_attribute, arrest_estimate)</code></pre>
<pre><code>        state Rape attribute_estimate
1     Alabama 21.2        Murder_13.2
2     Georgia 25.8        Murder_17.4
3    Maryland 27.8        Murder_11.3
4  New Jersey 18.8         Murder_7.4
5     Alabama 21.2        Assault_236
6     Georgia 25.8        Assault_211
7    Maryland 27.8        Assault_300
8  New Jersey 18.8        Assault_159
9     Alabama 21.2        UrbanPop_58
10    Georgia 25.8        UrbanPop_60
11   Maryland 27.8        UrbanPop_67
12 New Jersey 18.8        UrbanPop_89</code></pre>
</div>
</div>
<div id="summary" class="section level1">
<h1>Summary</h1>
<p>You should tidy your data for easier data analysis using the R package <strong>tidyr</strong>, which provides the following functions.</p>
<br/>
<div class="block">
<ul>
<li><p>Collapse multiple columns together into key-value pairs (long data format): <strong>gather</strong>(data, key, value, …)</p></li>
<li><p>Spread key-value pairs into multiple columns (wide data format): <strong>spread</strong>(data, key, value)</p></li>
<li><p>Unite multiple columns into one: <strong>unite</strong>(data, col, …)</p></li>
<li>Separate one columns into multiple: <strong>separate</strong>(data, col, into)</li>
</ul>
</div>
<p><br/></p>
</div>
<div id="related-articles" class="section level1">
<h1>Related articles</h1>
<ul>
<li>Previous chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Programming Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing Data into R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/exporting-data-from-r">Exporting Data from R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data">Tibble Data Format in R: Best and Modern Way to Work with your Data</a></li>
</ul></li>
<li>Next chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/data-manipulation-in-r">Data Manipulation in R</a></li>
</ul></li>
</ul>
</div>
<div id="references" class="section level1">
<h1>References</h1>
<ul>
<li>The figures illustrating <strong>tidyr</strong> functions have been adapted from <a href="https://www.rstudio.com/wp-content/uploads/2015/02/data-wrangling-cheatsheet.pdf">RStudio data wrangling cheatsheet</a></li>
<li>Learn more about tidy data: <a href="http://www.jstatsoft.org/v59/i10/paper">Hadley Wickham. Tidy Data. Journal of Statistical Software, August 2014, Volume 59, Issue 10.</a>.</li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using R (ver. 3.2.3). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->



<!-- END HTML -->]]></description>
			<pubDate>Fri, 22 Apr 2016 07:37:31 +0200</pubDate>
			
		</item>
		
		<item>
			<title><![CDATA[Tibble Data Format in R: Best and Modern Way to Work with Your Data]]></title>
			<link>https://www.sthda.com/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data</link>
			<guid>https://www.sthda.com/english/wiki/tibble-data-format-in-r-best-and-modern-way-to-work-with-your-data</guid>
			<description><![CDATA[<!-- START HTML -->

  <!--====================== start from here when you copy to sthda================-->  
  <div id="rdoc">


<div id="TOC">
<ul>
<li><a href="#preleminary-tasks">Preleminary tasks</a></li>
<li><a href="#installing-and-loading-tibble-package">Installing and loading tibble package</a></li>
<li><a href="#create-a-new-tibble">Create a new tibble</a></li>
<li><a href="#convert-your-data-as-a-tibble">Convert your data as a tibble</a></li>
<li><a href="#advantages-of-tibbles-compared-to-data-frames">Advantages of tibbles compared to data frames</a></li>
<li><a href="#summary">Summary</a></li>
<li><a href="#related-articles">Related articles</a></li>
<li><a href="#infos">Infos</a></li>
</ul>
</div>

<p>
</p>
<p>Previously, we described the <a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">essentials of R programming</a> and provided quick start guides for <a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">importing data</a> into <strong>R</strong>. The traditional <a href="https://www.sthda.com/english/english/wiki/reading-data-from-txt-csv-files-r-base-functions">R base functions read.table(), read.delim() and read.csv()</a> import data into R as a <a href="https://www.sthda.com/english/english/wiki/easy-r-programming-basics#data-frames"><strong>data frame</strong></a>. However, the most modern R package <a href="https://www.sthda.com/english/english/wiki/fast-reading-of-data-from-txt-csv-files-into-r-readr-package"><strong>readr</strong></a> provides several functions (read_delim(), read_tsv() and read_csv()), which are faster than R base functions and import data into R as a <strong>tbl_df</strong> (pronounced as “tibble diff”).</p>
<p><span class="success"><strong>tbl_df</strong> object is a data frame providing a nicer printing method, useful when working with large data sets.</span></p>
<br/>
<div class="block">
In this article, we’ll present the <strong>tibble</strong> R package, developed by Hadley Wickham. The <strong>tibble</strong> R package provides easy to use functions for creating tibbles, which is a modern rethinking of data frames.
</div>
<p><br/></p>
<p><img src="https://www.sthda.com/english/sthda/RDoc/images/tibble-data-format-tbl_df.png" alt="tibble data format: tbl_df" /> <br/></p>
<div id="preleminary-tasks" class="section level1">
<h1>Preleminary tasks</h1>
<p><strong>Launch RStudio</strong> as described here: <a href="https://www.sthda.com/english/english/wiki/running-rstudio-and-setting-up-your-working-directory-easy-r-programming">Running RStudio and setting up your working directory</a></p>
</div>
<div id="installing-and-loading-tibble-package" class="section level1">
<h1>Installing and loading tibble package</h1>
<pre class="r"><code># Installing
install.packages("tibble")

# Loading
library("tibble")</code></pre>
</div>
<div id="create-a-new-tibble" class="section level1">
<h1>Create a new tibble</h1>
<p>To create a new tibble from combining multiple vectors, use the function <strong>data_frame</strong>():</p>
<pre class="r"><code># Create
friends_data <- data_frame(
  name = c("Nicolas", "Thierry", "Bernard", "Jerome"),
  age = c(27, 25, 29, 26),
  height = c(180, 170, 185, 169),
  married = c(TRUE, FALSE, TRUE, TRUE)
)

# Print
friends_data</code></pre>
<pre><code>Source: local data frame [4 x 4]

     name   age height married
    <chr> <dbl>  <dbl>   <lgl>
1 Nicolas    27    180    TRUE
2 Thierry    25    170   FALSE
3 Bernard    29    185    TRUE
4  Jerome    26    169    TRUE</code></pre>
<br/>
<div class="success">
<p>Compared to the traditional <a href="https://www.sthda.com/english/wiki/(easy-r-programming-basics#data-frames)"><strong>data.frame</strong>()</a>, the modern <strong>data_frame</strong>():</p>
<ul>
<li>never converts string as factor</li>
<li>never changes the names of variables</li>
<li>never create row names</li>
</ul>
</div>
<p><br/></p>
</div>
<div id="convert-your-data-as-a-tibble" class="section level1">
<h1>Convert your data as a tibble</h1>
<p><span class="warning">Note that, if you use the <a href="https://www.sthda.com/english/english/wiki/fast-reading-of-data-from-txt-csv-files-into-r-readr-package"><strong>readr</strong></a> package to import your data into R, then you don’t need to do this step. <strong>readr</strong> imports already data as <strong>tbl_df</strong>.</span></p>
<p>To convert a traditional data as a tibble use the function <strong>as_data_frame</strong>() [in <strong>tibble</strong> package], which works on data frames, lists, matrices and tables:</p>
<pre class="r"><code>library("tibble")

# Loading data
data("iris")
# Class of iris
class(iris)</code></pre>
<pre><code>[1] "data.frame"</code></pre>
<pre class="r"><code># Print the frist 6 rows
head(iris, 6)</code></pre>
<pre><code>  Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1          5.1         3.5          1.4         0.2  setosa
2          4.9         3.0          1.4         0.2  setosa
3          4.7         3.2          1.3         0.2  setosa
4          4.6         3.1          1.5         0.2  setosa
5          5.0         3.6          1.4         0.2  setosa
6          5.4         3.9          1.7         0.4  setosa</code></pre>
<pre class="r"><code># Convert iris data to a tibble
my_data <- as_data_frame(iris)
class(my_data)</code></pre>
<pre><code>[1] "tbl_df"     "tbl"        "data.frame"</code></pre>
<pre class="r"><code># Print my data
my_data</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
<p><span class="success">Note that, only the first 10 rows are displayed</span></p>
<p><span class="warning">In the situation where you want to turn a tibble back to a data frame, use the function <strong>as.data.frame</strong>(my_data).</span></p>
</div>
<div id="advantages-of-tibbles-compared-to-data-frames" class="section level1">
<h1>Advantages of tibbles compared to data frames</h1>
<ol style="list-style-type: decimal">
<li><p>Tibbles have nice printing method that show only the first 10 rows and all the columns that fit on the screen. This is useful when you work with large data sets.</p></li>
<li>When printed, the data type of each column is specified (see below):
<ul>
<li><dbl>: for double</li>
<li><fctr>: for factor</li>
<li><chr>: for character</li>
<li><lgl>: for logical</li>
</ul></li>
</ol>
<pre class="r"><code>my_data</code></pre>
<pre><code>Source: local data frame [150 x 5]

   Sepal.Length Sepal.Width Petal.Length Petal.Width Species
          <dbl>       <dbl>        <dbl>       <dbl>  <fctr>
1           5.1         3.5          1.4         0.2  setosa
2           4.9         3.0          1.4         0.2  setosa
3           4.7         3.2          1.3         0.2  setosa
4           4.6         3.1          1.5         0.2  setosa
5           5.0         3.6          1.4         0.2  setosa
6           5.4         3.9          1.7         0.4  setosa
7           4.6         3.4          1.4         0.3  setosa
8           5.0         3.4          1.5         0.2  setosa
9           4.4         2.9          1.4         0.2  setosa
10          4.9         3.1          1.5         0.1  setosa
..          ...         ...          ...         ...     ...</code></pre>
<p>It’s possible to change the default printing appearance as follow:</p>
<br/>
<div class="warning">
<ul>
<li>Change the maximum and the minimum rows to print: <em>options(tibble.print_max = 20, tibble.print_min = 6)</em></li>
<li>Always show all rows: <em>options(tibble.print_max = Inf)</em></li>
<li>Always show all columns: <em>options(tibble.width = Inf)</em></li>
</ul>
</div>
<p><br/></p>
<ol start="3" style="list-style-type: decimal">
<li>Subsetting a tibble will always return a tibble. You don’t need to use <em>drop = FALSE</em> compared to traditional data.frames.</li>
</ol>
</div>
<div id="summary" class="section level1">
<h1>Summary</h1>
<br/>
<div class="block">
<ul>
<li><p>Create a tibble: <strong>data_frame</strong>()</p></li>
<li><p>Convert your data to a tibble: <strong>as_data_frame</strong>()</p></li>
<li>Change default printing appearance of a tibble: <strong>options(tibble.print_max = 20, tibble.print_min = 6)</strong></li>
</ul>
</div>
<p><br/></p>
</div>
<div id="related-articles" class="section level1">
<h1>Related articles</h1>
<ul>
<li>Previous chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/r-basics-quick-and-easy">R Programming Basics</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/importing-data-into-r">Importing Data into R</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/exporting-data-from-r">Exporting Data from R</a></li>
</ul></li>
<li>Next chapters
<ul>
<li><a href="https://www.sthda.com/english/english/wiki/tidyr-crutial-step-reshaping-data-with-r-for-easier-analyses">Tidyr: Crutial Step Reshaping Data with R for Easier Analyses</a></li>
<li><a href="https://www.sthda.com/english/english/wiki/data-manipulation-in-r">Data Manipulation in R</a></li>
</ul></li>
</ul>
</div>
<div id="infos" class="section level1">
<h1>Infos</h1>
<p><span class="warning"> This analysis has been performed using R (ver. 3.2.3). </span></p>
</div>

<script>jQuery(document).ready(function () {
    jQuery('h1').addClass('wiki_paragraph1');
    jQuery('h2').addClass('wiki_paragraph2');
    jQuery('h3').addClass('wiki_paragraph3');
    jQuery('h4').addClass('wiki_paragraph4');
    });//add phpboost class to header</script>
<style>.content{padding:0px;}</style>
</div><!--end rdoc-->
<!--====================== stop here when you copy to sthda================-->



<!-- END HTML -->]]></description>
			<pubDate>Thu, 14 Apr 2016 18:44:02 +0200</pubDate>
			
		</item>
		
	</channel>
</rss>
