tidy data Principles

Environments, Ordering, and the Nuts and Bolts of R

Robert W. Walker

1/17/23

Let’s Get Started:

An Overview of R and RStudio

RStudio has tons of useful Addins that are useful for specialized tasks.

  • We will make use of one today: esquisse.
  • NB: It exports to powerpoint with some assistance packages 1
install.packages("esquisse")

Do that.

Let’s Have a Look at R

R is a object oriented programming language for data. All kinds of things can be objects. Data, models, graphics, essentially anything that results from applying a function to an object.

  • We will have functions on objects that create new objects.
  • It has a command prompt. >
  • Valid R functions, objects, and/or assignments ( <-) go there.
  • The help for any function is provided by ? before the command.

tidy data

We will think and talk about data organized in a tidy way where rows represent cases/units/observations and columns represent variables. Many standard forms of enterprise data are not stored in this way though they could be. Accounting data come to mind where there are data in the column names. There are tools that we will later encounter for pivoting from long to wide forms where the tidy and long forms are synonymous.

R’s Variable Types

  • Factor: Qualitative labels with attached numbers. Think key-value.
  • Character: Strings of letters and numbers demarcated by quotation marks.
There is 'something' or there is "something"
There is 'Hello World!' or there is "Hello World!"
  • Numeric [integer, double]
  • Complex [if you don’t know what this means, worry not]
  • Logical
  • Date

The global environment in RStudio helps us out. There is a special combined data structure in R – the data.frame – that combines data of different types organized with units defining the rows and variables defining the columns.

A Great Little Chapter

Though it is all base R, Keith McNulty, the Global Leader of Talent Science and Analytics at McKinsey and Company, has a great chapter on the basics of R that is linked here. It’s a really nice book.

A Tour of the RStudio

Their website is great. They have an excellent collection of webinars on special topics of a variety of sorts.

Tools > Global Options provides a lot of customization.

Markdown Quick Reference and Cheatsheets under Tools and on the RStudio website are both great.

A Brief Bit on R-Markdown

It is a wonderful technology for repeated tasks and for transparent communication with data. I will use it extensively. A Markdown is a Sandbox, it does not start with packages, libraries, or commands. It is best to work with RStudio via the play button for code chunks and the play all above to make sure that everything syncs up. One can find a template, with associated help under Help.

File > New file > R Markdown

We are working with FastFood.Rmd – an R Markdown file.

I would strongly encourage you to check out this video on R Markdown from RStudio. I forgot to mention that markdown is the language of reddit.

An Example with Excel Data

When the Environment tab is active in the top right of the RStudio. You will see a tab called Import Dataset. The first thing to note is that R reads a number of data types [and link to databases and things].

To import Excel data, From Excel
- We have to give it a file name. NB Paths.

library(readxl)
{{ url <- "https://github.com/robertwwalker/DADMStuff/raw/master/FastFood.xlsx" }}
destfile <- "FastFood.xlsx"
curl::curl_download(url, destfile)
FastFood <- read_excel(destfile)

That’s not exacly what I hoped for.
There are intermediate steps to downloading it and checking the sheets that it does not do with remote files. The code automagically reflects this.

An Example with Excel Data

When the Environment tab is active in the top right of the RStudio. You will see a tab called Import Dataset. The first thing to note is that R reads a number of data types [and link to databases and things].

To import Excel data, From Excel
- We have to give it a file name. NB Paths.
- We have to choose a sheet.
- Types of variables.
- Missing data values.
- Ranges

library(readxl)
url <- "https://github.com/robertwwalker/DADMStuff/raw/master/FastFood.xlsx" 
destfile <- "FastFood.xlsx"
curl::curl_download(url, destfile)
{{ FastFood <- read_excel(destfile, sheet = "FastFood", na = "NA") }}

This will have to be added to the RMarkdown file.

Some Crash R

Operators:

  • +
  • -
  • *
  • /
  • and many others.
  • We will also be concerned with the difference between equals as assigning and equals in math [denoted with two equals signs in succession].

Scoping and Environments

The hardest thing to many learners about R is the scoping requirements and environments. We will deploy a collection of packages that have a very data centric view of this problem. Let me use our Fast Food example. I can type the name to see what it is. It dumps a lot to my screen.

FastFood
# A tibble: 515 × 17
   restau…¹ item  calor…² cal_fat total…³ sat_fat trans…⁴ chole…⁵ sodium total…⁶
   <chr>    <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
 1 Mcdonal… Arti…     380      60       7       2     0        95   1110      44
 2 Mcdonal… Sing…     840     410      45      17     1.5     130   1580      62
 3 Mcdonal… Doub…    1130     600      67      27     3       220   1920      63
 4 Mcdonal… Gril…     750     280      31      10     0.5     155   1940      62
 5 Mcdonal… Cris…     920     410      45      12     0.5     120   1980      81
 6 Mcdonal… Big …     540     250      28      10     1        80    950      46
 7 Mcdonal… Chee…     300     100      12       5     0.5      40    680      33
 8 Mcdonal… Clas…     510     210      24       4     0        65   1040      49
 9 Mcdonal… Doub…     430     190      21      11     1        85   1040      35
10 Mcdonal… Doub…     770     400      45      21     2.5     175   1290      42
# … with 505 more rows, 7 more variables: fiber <dbl>, sugar <dbl>,
#   protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>, salad <chr>, and
#   abbreviated variable names ¹​restaurant, ²​calories, ³​total_fat, ⁴​trans_fat,
#   ⁵​cholesterol, ⁶​total_carb

The Data

In the Beginning [Base R]

We referred to things via $ to unpack it. The RStudio remains smart about this; the elements in an object are [mostly] unpacked via $. To find just calories in FastFood, I would need:

FastFood$calories
  [1]  380  840 1130  750  920  540  300  510  430  770  380  620  530  700  250
 [16]  290  640  580  740  350  380  480  580  520  680  570  530  530  670  560
 [31]  730  690  630  800  370  480  760 1210 1510 2430  180  270  440  890 1770
 [46]  640  960 1600  140  270  490  190  320  490  220  350  520  430  310  270
 [61]  120  230  350  470  500  130  190  260  390  970  490  440   70  110  140
 [76]  210  430  450  500  450  540  350  860  720  710  640  340  410  380  450
 [91]  600  870  640  650  740  710  720  800 1280 1120 1130 1220 1120  450  450
[106]  450  610  680  430  570  450  280  470  740 1000 1350  380  560  350  610
[121]  970 1080 1190  330  440  550  730  100  370  430  410  340  320  500  210
[136]  830  320  330  400  450  630  650  690  690  540  650  690  550  540  360
[151]  510  640  480  710  740  750  610  300  680  710  840  240  360  600  680
[166]  550  710  520  800  620  740  590  600  430  650 1030  730  470  980  290
[181]  290  220  210  240  300  240  200   70  430  420  230  720 1550 1000  330
[196]  290 1040  730 1100  300  520  450  360  900  580 1220  260  550  990  940
[211]  310 1250  730  970 1100  770  900  990  660  760  340  380  590  720  550
[226]  690  530  670  560  700  320  450  220  230  830  440  530  410  480  730
[241]  290  190  290  950  470  570  580  430  670  470  330  310  300  630  340
[256]  410  840  210  530  410  700  760 1000  800  630  540  570  400  630 1030
[271] 1260  420  390  380  330  290  350  310  250  550 1050  760  260  470  400
[286]  640  540  350  780  580  910  350  500  640  520  280  600  350  380  150
[301]  360  280   20  550  320  640  430  860  580 1160  500 1000  180  290  580
[316]  330  660  570 1140  570 1140  460  920  370  740  470  940  410  820  550
[331] 1100  480  960  320  640  200  320  640  350  700  480  960  380  760  310
[346]  620  370  740  420  840  380  760  470  940  390  780  180  280  560  280
[361]  560  490  980  150  230  460  390  780  300  150  400  330  110  360  280
[376]  150  510  180  220  230  230  310  140  140  310  210  140  200  200  310
[391]  110  110   50  760  730  810  740  680  790  820  540  460  510  370  550
[406]  440  410  420  390  390  760  780  740  420  380  610  610  630  550  620
[421]  630  650  400  710  650  670  540  550  570  410  880  830  820  170  320
[436]  160  200  170  200  320  350  340  350  380  320  170  200  250  320  170
[451]  200  230  200  250  340  340  370  600  420  440  600  350  340  340  150
[466]  140  150  170  490  490  490  490  570  300  270  270  710  760  430  320
[481]  260  410  210  420  430  560  580  540  480  190  520  620  380  290  650
[496]  540  270  580  470  540  270  440  440  460  180  400  200  390  520  700
[511]  780  580  780  720  720

Functions

And to embed that into a function, I could have a function that takes some variable, call it x, adds them all up and divides by the total number:

\bar{X}=\frac{1}{n}\sum_{i=1}^nX_i

mean(FastFood$calories)
[1] 530.9126

Now let’s try that for protein.

mean(FastFood$protein)
[1] NA

Though there is only one missing datum, that is enough to render the result missing.

How to Adjust for NA?

Now let’s try that for protein.

mean(FastFood$protein)
[1] NA

Though there is only one missing datum, that is enough to render the result missing.

If we wish to fix this, we need, from ?mean:

mean(FastFood$protein, na.rm=TRUE)
[1] 27.89105

mean

Deviations about the mean sum to zero

By definition, the sum of the deviations about the average must be zero.

sum(FastFood$protein - mean(FastFood$protein, na.rm=TRUE))
[1] NA

I tried to operate on a missing value. Let’s fix that for sum, also. NB: Floating point arithmetic

sum(FastFood$protein - mean(FastFood$protein, na.rm=TRUE), na.rm=TRUE)
[1] -3.490896e-11

On Summaries

The average is sensitive to outlying values. Think income in Seattle and the samples that include Jeff Bezos, Paul Allen, MacKenzie Scott, and Bill Gates. That is why we examine the median – the value such that half are above and half are below; magnitude doesn’t matter.

The median is a percentile; it is the 50th percentile. We are often also interested in the middle 50 percent: the 25th and 75th percentiles or the first and third quartiles. In R, generically, these are quantiles.

quantile(FastFood$protein, probs = c(0,0.25,0.5,0.75,1), na.rm=TRUE)
   0%   25%   50%   75%  100% 
  1.0  16.0  24.5  36.0 186.0 

As a technical matter, the median is only unique with an odd number of observations; we approximate it with the midpoint of the middle two.

The Mode

The most frequent value. If it is unique, it is meaningful but it is often not even a small set of values. R doesn’t calculate it. But it is visible in a density or histogram.

Variation

With means, we describe the standard deviation. Note, it is singular; it implies the two sides of the center are the same – symmetry. Because the deviations sum to zero, to measure variation, we can’t use untransformed deviation from an average. We work with squares [variance, in the squared metric] or the square root of squares [to maintain the original metric].

s=\sqrt{\frac{1}{N-1}\sum_{i=1}^{N}(x_{i}-\overline{x})^2}

mean(FastFood$protein, na.rm=TRUE)
[1] 27.89105
sd(FastFood$protein, na.rm=TRUE)
[1] 17.68392

Variation in Percentiles

We typically measure a range [min to max] or the interquartile range (IQR) – the span of the middle 50%.

quantile(FastFood$protein, probs = c(0,0.25,0.5,0.75,1), na.rm=TRUE)
   0%   25%   50%   75%  100% 
  1.0  16.0  24.5  36.0 186.0 
IQR(FastFood$protein, na.rm=TRUE)
[1] 20

Here it spans 20 grams of protein. The total range is 185 grams of protein.

Scoping

That operator $ is the first encounter with the scope of something. We are trying to pull protein from FastFood. There are other basic operators in R to accomplish the same thing. We could have asked for the [even less typing-efficient] relevant row and column definition with:

FastFood[,"protein"]
# A tibble: 515 × 1
   protein
     <dbl>
 1      37
 2      46
 3      70
 4      55
 5      46
 6      25
 7      15
 8      25
 9      25
10      51
# … with 505 more rows

If I want more than one, this becomes cumbersome. The tidyverse, built around a piping operator%>% –, was developed as a solution to a data.frame centric form of analysis. Here’s how it works. We start with data and pipe it so that the names are understood in the context of the data that we begin with. The main initial helper that we will make use of is skim from the skimr library.

library(skimr)
library(kableExtra)
FastFood %>% skim() %>% kable() %>% scroll_box(height="400px")
skim_type skim_variable n_missing complete_rate character.min character.max character.empty character.n_unique character.whitespace numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
character restaurant 0 1.0000000 5 11 0 8 0 NA NA NA NA NA NA NA NA
character item 0 1.0000000 5 63 0 505 0 NA NA NA NA NA NA NA NA
character salad 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA
numeric calories 0 1.0000000 NA NA NA NA NA 530.9126214 282.4361471 20 330.0 490.0 690 2430 ▇▆▁▁▁
numeric cal_fat 0 1.0000000 NA NA NA NA NA 238.8135922 166.4075099 0 120.0 210.0 310 1270 ▇▃▁▁▁
numeric total_fat 0 1.0000000 NA NA NA NA NA 26.5902913 18.4118761 0 14.0 23.0 35 141 ▇▃▁▁▁
numeric sat_fat 0 1.0000000 NA NA NA NA NA 8.1533981 6.4188107 0 4.0 7.0 11 47 ▇▃▁▁▁
numeric trans_fat 0 1.0000000 NA NA NA NA NA 0.4650485 0.8396438 0 0.0 0.0 1 8 ▇▁▁▁▁
numeric cholesterol 0 1.0000000 NA NA NA NA NA 72.4563107 63.1604061 0 35.0 60.0 95 805 ▇▁▁▁▁
numeric sodium 0 1.0000000 NA NA NA NA NA 1246.7378641 689.9542781 15 800.0 1110.0 1550 6080 ▇▆▁▁▁
numeric total_carb 0 1.0000000 NA NA NA NA NA 45.6640777 24.8833420 0 28.5 44.0 57 156 ▅▇▂▁▁
numeric fiber 12 0.9766990 NA NA NA NA NA 4.1371769 3.0374603 0 2.0 3.0 5 17 ▇▅▂▁▁
numeric sugar 0 1.0000000 NA NA NA NA NA 7.2621359 6.7613015 0 3.0 6.0 9 87 ▇▁▁▁▁
numeric protein 1 0.9980583 NA NA NA NA NA 27.8910506 17.6839207 1 16.0 24.5 36 186 ▇▂▁▁▁
numeric vit_a 214 0.5844660 NA NA NA NA NA 18.8571429 31.3843303 0 4.0 10.0 20 180 ▇▁▁▁▁
numeric vit_c 210 0.5922330 NA NA NA NA NA 20.1704918 30.5922427 0 4.0 10.0 30 400 ▇▁▁▁▁
numeric calcium 210 0.5922330 NA NA NA NA NA 24.8524590 25.5220725 0 8.0 20.0 30 290 ▇▁▁▁▁
FastFood %>% group_by(restaurant) %>% skim(calories)
# FastFood %>% group_by(restaurant) %>% skim(calories) %>% arrange(numeric.mean)
# FastFood %>% group_by(restaurant) %>% skim(calories) %>% arrange(desc(numeric.mean))
skim_type skim_variable restaurant n_missing complete_rate numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
numeric calories Arbys 0 1 532.7273 210.3388 70 360.0 550 690 1030 ▃▆▇▇▂
numeric calories Burger King 0 1 608.5714 290.4184 190 365.0 555 760 1550 ▇▇▃▂▁
numeric calories Chick Fil-A 0 1 384.4444 220.4948 70 220.0 390 480 970 ▇▇▇▁▂
numeric calories Dairy Queen 0 1 520.2381 259.3377 20 350.0 485 630 1260 ▂▇▆▂▁
numeric calories Mcdonalds 0 1 640.3509 410.6961 140 380.0 540 740 2430 ▇▅▁▁▁
numeric calories Sonic 0 1 631.6981 300.8816 100 410.0 570 740 1350 ▃▇▆▂▃
numeric calories Subway 0 1 503.0208 282.2210 50 287.5 460 740 1160 ▅▇▃▃▂
numeric calories Taco Bell 0 1 443.6522 184.3449 140 320.0 420 575 880 ▆▇▇▃▂
FastFood %>% group_by(restaurant) %>% skim(protein)
skim_type skim_variable restaurant n_missing complete_rate numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
numeric protein Arbys 0 1.0000000 29.25455 12.386101 5 20.0 29 38 62 ▅▆▇▃▁
numeric protein Burger King 1 0.9857143 30.01449 19.469050 5 16.0 29 36 134 ▇▅▁▁▁
numeric protein Chick Fil-A 0 1.0000000 31.70370 16.927026 11 23.5 29 37 103 ▇▆▁▁▁
numeric protein Dairy Queen 0 1.0000000 24.83333 11.544013 1 17.0 23 34 49 ▂▇▅▅▃
numeric protein Mcdonalds 0 1.0000000 40.29825 29.479390 7 25.0 33 46 186 ▇▂▁▁▁
numeric protein Sonic 0 1.0000000 29.18868 14.532532 6 18.0 30 35 67 ▆▆▇▁▂
numeric protein Subway 0 1.0000000 30.31250 16.144292 3 18.0 26 40 78 ▆▇▆▂▁
numeric protein Taco Bell 0 1.0000000 17.41739 7.135263 6 12.0 16 22 37 ▇▇▆▃▂
FastFood %>% group_by(restaurant) %>% skim(sodium)
skim_type skim_variable restaurant n_missing complete_rate numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
numeric sodium Arbys 0 1 1515.273 663.6651 100 960.0 1480 2020.0 3350 ▂▇▇▃▁
numeric sodium Burger King 0 1 1223.571 499.8841 310 850.0 1150 1635.0 2310 ▅▇▅▆▂
numeric sodium Chick Fil-A 0 1 1151.481 726.9203 220 700.0 1000 1405.0 3660 ▇▇▂▁▁
numeric sodium Dairy Queen 0 1 1181.786 609.9398 15 847.5 1030 1362.5 3500 ▂▇▂▁▁
numeric sodium Mcdonalds 0 1 1437.895 1036.1721 20 870.0 1120 1780.0 6080 ▇▅▁▁▁
numeric sodium Sonic 0 1 1350.755 665.1340 470 900.0 1250 1550.0 4520 ▇▆▂▁▁
numeric sodium Subway 0 1 1272.969 743.6346 65 697.5 1130 1605.0 3540 ▅▇▃▁▂
numeric sodium Taco Bell 0 1 1013.913 474.0544 290 615.0 960 1300.0 2260 ▇▇▆▂▂

dplyr verbs

There are four main dplyr verbs that we will play with, and some helpers.

  • filter
  • select
  • mutate
  • summarize or summarise

filter()

filter [1 of 3]

filter selects rows according to some set of conditions.

  • Valid with ==
FastFood %>% filter(restaurant == "Taco Bell")
# A tibble: 115 × 17
   restau…¹ item  calor…² cal_fat total…³ sat_fat trans…⁴ chole…⁵ sodium total…⁶
   <chr>    <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
 1 Taco Be… 1/2 …     540     230      26       7       1      45   1360      59
 2 Taco Be… 1/2 …     460     170      18       7       1      45   1320      53
 3 Taco Be… 7-La…     510     170      19       7       0      20   1090      68
 4 Taco Be… Bean…     370     100      11       4       0       5    960      56
 5 Taco Be… Beef…     550     200      22       8       0      35   1270      68
 6 Taco Be… Beef…     440     160      18       5       0      20   1030      55
 7 Taco Be… Blac…     410     110      12       4       0      10   1100      62
 8 Taco Be… Burr…     420     140      16       7       0      35   1090      53
 9 Taco Be… Burr…     390     110      12       5       0      40   1050      52
10 Taco Be… Burr…     390     120      13       5       0      30   1090      52
# … with 105 more rows, 7 more variables: fiber <dbl>, sugar <dbl>,
#   protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>, salad <chr>, and
#   abbreviated variable names ¹​restaurant, ²​calories, ³​total_fat, ⁴​trans_fat,
#   ⁵​cholesterol, ⁶​total_carb

filter [2 of 3]

filter selects rows according to some set of conditions.
- Can use %in% with a vector c().

FastFood %>% filter(restaurant%in%c("Taco Bell","Burger King"))
# A tibble: 185 × 17
   restau…¹ item  calor…² cal_fat total…³ sat_fat trans…⁴ chole…⁵ sodium total…⁶
   <chr>    <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl>
 1 Burger … Amer…    1550    1134     126      47     8       805   1820      21
 2 Burger … Baco…    1000     585      65      24     3       200   1320      48
 3 Burger … Baco…     330     140      16       7     0        55    830      32
 4 Burger … Baco…     290     120      14       6     0.5      40    720      28
 5 Burger … Baco…    1040     630      48      28     2.5     220   1900      48
 6 Burger … Baco…     730     351      39       9     0        90   1930      63
 7 Burger … BBQ …    1100     675      75      29     3       220   1850      51
 8 Burger … Chee…     300     130      14       6     0        45    710      28
 9 Burger … Doub…     520     280      31      14     1       105   1180      33
10 Burger … Doub…     450     230      26      12     1        95    960      29
# … with 175 more rows, 7 more variables: fiber <dbl>, sugar <dbl>,
#   protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>, salad <chr>, and
#   abbreviated variable names ¹​restaurant, ²​calories, ³​total_fat, ⁴​trans_fat,
#   ⁵​cholesterol, ⁶​total_carb

filter [3 of 3]

filter selects rows according to some set of conditions.

  • Or more elaborate combinations of elements
  • It has to return a logical [TRUE/FALSE] to filter the rows.
FastFood %>% filter(startsWith(restaurant,"S")==TRUE) %>% group_by(restaurant) %>% skim() %>% kable()  %>%
  scroll_box(width = "100%", height = "500px")
skim_type skim_variable restaurant n_missing complete_rate character.min character.max character.empty character.n_unique character.whitespace numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
character item Sonic 0 1.0000000 8 48 0 53 0 NA NA NA NA NA NA NA NA
character item Subway 0 1.0000000 7 58 0 96 0 NA NA NA NA NA NA NA NA
character salad Sonic 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA
character salad Subway 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA
numeric calories Sonic 0 1.0000000 NA NA NA NA NA 631.6981132 300.8816267 100.0 410.00 570.0 740.00 1350 ▃▇▆▂▃
numeric calories Subway 0 1.0000000 NA NA NA NA NA 503.0208333 282.2209653 50.0 287.50 460.0 740.00 1160 ▅▇▃▃▂
numeric cal_fat Sonic 0 1.0000000 NA NA NA NA NA 338.3018868 197.1404795 100.0 180.00 290.0 430.00 900 ▇▅▂▁▁
numeric cal_fat Subway 0 1.0000000 NA NA NA NA NA 165.1041667 134.8351414 10.0 48.75 137.5 242.50 620 ▇▅▂▁▁
numeric total_fat Sonic 0 1.0000000 NA NA NA NA NA 37.6415094 21.9729680 11.0 20.00 32.0 48.00 100 ▇▅▂▁▁
numeric total_fat Subway 0 1.0000000 NA NA NA NA NA 18.4791667 14.6092827 1.0 6.00 15.0 26.50 62 ▇▃▃▁▁
numeric sat_fat Sonic 0 1.0000000 NA NA NA NA NA 11.4150943 8.6673214 2.5 5.00 8.0 15.00 36 ▇▂▂▁▁
numeric sat_fat Subway 0 1.0000000 NA NA NA NA NA 6.1979167 5.2417751 0.0 2.00 4.5 9.25 22 ▇▃▃▁▁
numeric trans_fat Sonic 0 1.0000000 NA NA NA NA NA 0.9339623 1.2597227 0.0 0.00 0.0 2.00 4 ▇▂▂▁▁
numeric trans_fat Subway 0 1.0000000 NA NA NA NA NA 0.2187500 0.5222043 0.0 0.00 0.0 0.00 2 ▇▁▁▁▁
numeric cholesterol Sonic 0 1.0000000 NA NA NA NA NA 86.9811321 63.7704690 0.0 40.00 80.0 110.00 260 ▇▇▃▁▂
numeric cholesterol Subway 0 1.0000000 NA NA NA NA NA 61.3020833 40.9315526 0.0 40.00 50.0 85.00 190 ▅▇▅▁▁
numeric sodium Sonic 0 1.0000000 NA NA NA NA NA 1350.7547170 665.1340208 470.0 900.00 1250.0 1550.00 4520 ▇▆▂▁▁
numeric sodium Subway 0 1.0000000 NA NA NA NA NA 1272.9687500 743.6345941 65.0 697.50 1130.0 1605.00 3540 ▅▇▃▁▂
numeric total_carb Sonic 0 1.0000000 NA NA NA NA NA 47.2075472 21.5463351 16.0 33.00 44.0 51.00 126 ▆▇▂▁▁
numeric total_carb Subway 0 1.0000000 NA NA NA NA NA 54.7187500 33.3143570 8.0 25.75 47.0 92.00 118 ▇▇▂▆▂
numeric fiber Sonic 0 1.0000000 NA NA NA NA NA 2.6603774 1.7752941 0.0 2.00 2.0 3.00 8 ▃▇▂▂▁
numeric fiber Subway 0 1.0000000 NA NA NA NA NA 6.5625000 3.2373235 3.0 4.00 5.0 10.00 16 ▇▁▃▁▁
numeric sugar Sonic 0 1.0000000 NA NA NA NA NA 6.5283019 3.9448300 0.0 4.00 7.0 9.00 17 ▃▃▇▂▁
numeric sugar Subway 0 1.0000000 NA NA NA NA NA 10.0937500 5.6084112 3.0 6.00 8.0 14.00 36 ▇▃▁▁▁
numeric protein Sonic 0 1.0000000 NA NA NA NA NA 29.1886792 14.5325319 6.0 18.00 30.0 35.00 67 ▆▆▇▁▂
numeric protein Subway 0 1.0000000 NA NA NA NA NA 30.3125000 16.1442918 3.0 18.00 26.0 40.00 78 ▆▇▆▂▁
numeric vit_a Sonic 4 0.9245283 NA NA NA NA NA 6.9387755 5.6767368 0.0 2.00 6.0 10.00 20 ▇▃▃▃▁
numeric vit_a Subway 0 1.0000000 NA NA NA NA NA 22.3854167 15.1354375 6.0 10.00 16.0 30.00 60 ▇▃▁▁▂
numeric vit_c Sonic 4 0.9245283 NA NA NA NA NA 5.7551020 4.8111442 0.0 2.00 6.0 8.00 25 ▇▇▁▁▁
numeric vit_c Subway 0 1.0000000 NA NA NA NA NA 41.9687500 44.0113510 4.0 20.00 40.0 50.00 400 ▇▁▁▁▁
numeric calcium Sonic 4 0.9245283 NA NA NA NA NA 17.2448980 12.0701605 1.0 8.00 15.0 27.00 40 ▇▆▂▆▂
numeric calcium Subway 0 1.0000000 NA NA NA NA NA 39.1250000 25.1321769 4.0 20.00 35.0 60.00 100 ▇▇▆▂▁

Inversion with !

FastFood %>% filter(!(startsWith(restaurant,"S")==TRUE)) %>% group_by(restaurant) %>% skim() %>% kable()  %>%
  scroll_box(width = "100%", height = "500px")
skim_type skim_variable restaurant n_missing complete_rate character.min character.max character.empty character.n_unique character.whitespace numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
character item Arbys 0 1.0000000 10 39 0 55 0 NA NA NA NA NA NA NA NA
character item Burger King 0 1.0000000 9 63 0 70 0 NA NA NA NA NA NA NA NA
character item Chick Fil-A 0 1.0000000 12 36 0 27 0 NA NA NA NA NA NA NA NA
character item Dairy Queen 0 1.0000000 7 45 0 42 0 NA NA NA NA NA NA NA NA
character item Mcdonalds 0 1.0000000 5 49 0 57 0 NA NA NA NA NA NA NA NA
character item Taco Bell 0 1.0000000 7 46 0 113 0 NA NA NA NA NA NA NA NA
character salad Arbys 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA
character salad Burger King 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA
character salad Chick Fil-A 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA
character salad Dairy Queen 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA
character salad Mcdonalds 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA
character salad Taco Bell 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA
numeric calories Arbys 0 1.0000000 NA NA NA NA NA 532.7272727 210.3388320 70.0 360.00 550.0 690.000 1030 ▃▆▇▇▂
numeric calories Burger King 0 1.0000000 NA NA NA NA NA 608.5714286 290.4184174 190.0 365.00 555.0 760.000 1550 ▇▇▃▂▁
numeric calories Chick Fil-A 0 1.0000000 NA NA NA NA NA 384.4444444 220.4947816 70.0 220.00 390.0 480.000 970 ▇▇▇▁▂
numeric calories Dairy Queen 0 1.0000000 NA NA NA NA NA 520.2380952 259.3376939 20.0 350.00 485.0 630.000 1260 ▂▇▆▂▁
numeric calories Mcdonalds 0 1.0000000 NA NA NA NA NA 640.3508772 410.6961203 140.0 380.00 540.0 740.000 2430 ▇▅▁▁▁
numeric calories Taco Bell 0 1.0000000 NA NA NA NA NA 443.6521739 184.3448829 140.0 320.00 420.0 575.000 880 ▆▇▇▃▂
numeric cal_fat Arbys 0 1.0000000 NA NA NA NA NA 237.8363636 113.1696144 45.0 135.00 250.0 310.000 495 ▆▅▇▃▂
numeric cal_fat Burger King 0 1.0000000 NA NA NA NA NA 333.7571429 194.4993898 90.0 172.50 285.0 431.500 1134 ▇▅▂▁▁
numeric cal_fat Chick Fil-A 0 1.0000000 NA NA NA NA NA 145.3703704 102.3572733 18.0 67.50 126.0 171.000 423 ▆▇▂▁▁
numeric cal_fat Dairy Queen 0 1.0000000 NA NA NA NA NA 260.4761905 156.4850555 0.0 160.00 220.0 310.000 670 ▃▇▃▂▁
numeric cal_fat Mcdonalds 0 1.0000000 NA NA NA NA NA 285.6140351 220.8992785 50.0 160.00 240.0 320.000 1270 ▇▃▁▁▁
numeric cal_fat Taco Bell 0 1.0000000 NA NA NA NA NA 188.0000000 84.8109352 35.0 120.00 180.0 250.000 380 ▆▇▆▅▃
numeric total_fat Arbys 0 1.0000000 NA NA NA NA NA 26.9818182 13.2448325 5.0 15.50 28.0 35.000 59 ▅▅▇▂▂
numeric total_fat Burger King 0 1.0000000 NA NA NA NA NA 36.8142857 21.2434425 10.0 19.25 31.5 48.000 126 ▇▅▂▁▁
numeric total_fat Chick Fil-A 0 1.0000000 NA NA NA NA NA 16.1481481 11.3771150 2.0 7.50 14.0 19.000 47 ▆▇▂▁▁
numeric total_fat Dairy Queen 0 1.0000000 NA NA NA NA NA 28.8571429 17.5187306 0.0 18.00 24.5 34.750 75 ▃▇▃▂▁
numeric total_fat Mcdonalds 0 1.0000000 NA NA NA NA NA 31.8070175 24.5156208 5.0 18.00 27.0 36.000 141 ▇▃▁▁▁
numeric total_fat Taco Bell 0 1.0000000 NA NA NA NA NA 20.8956522 9.4082587 4.0 13.00 20.0 28.000 42 ▅▇▅▅▃
numeric sat_fat Arbys 0 1.0000000 NA NA NA NA NA 7.9727273 4.1626850 1.5 4.50 7.0 11.000 17 ▇▅▅▅▃
numeric sat_fat Burger King 0 1.0000000 NA NA NA NA NA 11.1500000 8.7676737 2.0 5.00 8.0 13.750 47 ▇▂▂▁▁
numeric sat_fat Chick Fil-A 0 1.0000000 NA NA NA NA NA 4.1111111 3.9034830 0.0 1.50 3.0 4.750 16 ▇▂▂▁▁
numeric sat_fat Dairy Queen 0 1.0000000 NA NA NA NA NA 10.4404762 8.2522742 0.0 5.00 9.0 12.500 43 ▇▆▂▁▁
numeric sat_fat Mcdonalds 0 1.0000000 NA NA NA NA NA 8.2894737 5.5347501 0.5 4.50 7.0 11.000 27 ▇▇▃▁▁
numeric sat_fat Taco Bell 0 1.0000000 NA NA NA NA NA 6.5913043 2.9813621 1.0 4.00 6.0 9.000 14 ▃▇▃▅▂
numeric trans_fat Arbys 0 1.0000000 NA NA NA NA NA 0.4181818 0.5913233 0.0 0.00 0.0 1.000 2 ▇▂▂▁▁
numeric trans_fat Burger King 0 1.0000000 NA NA NA NA NA 0.8642857 1.3457432 0.0 0.00 0.0 1.375 8 ▇▂▁▁▁
numeric trans_fat Chick Fil-A 0 1.0000000 NA NA NA NA NA 0.0370370 0.1924501 0.0 0.00 0.0 0.000 1 ▇▁▁▁▁
numeric trans_fat Dairy Queen 0 1.0000000 NA NA NA NA NA 0.6785714 0.7141550 0.0 0.00 1.0 1.000 2 ▇▁▇▁▂
numeric trans_fat Mcdonalds 0 1.0000000 NA NA NA NA NA 0.4649123 0.7249363 0.0 0.00 0.0 1.000 3 ▇▁▁▁▁
numeric trans_fat Taco Bell 0 1.0000000 NA NA NA NA NA 0.2565217 0.3993702 0.0 0.00 0.0 0.500 1 ▇▁▂▁▂
numeric cholesterol Arbys 0 1.0000000 NA NA NA NA NA 70.4545455 34.3089548 15.0 45.00 65.0 90.000 155 ▆▇▇▃▂
numeric cholesterol Burger King 0 1.0000000 NA NA NA NA NA 100.8571429 107.3223893 5.0 40.00 85.0 115.000 805 ▇▁▁▁▁
numeric cholesterol Chick Fil-A 0 1.0000000 NA NA NA NA NA 79.0740741 47.3131352 25.0 55.00 70.0 87.500 285 ▇▅▁▁▁
numeric cholesterol Dairy Queen 0 1.0000000 NA NA NA NA NA 71.5476190 43.7493985 0.0 41.25 60.0 100.000 180 ▃▇▃▂▂
numeric cholesterol Mcdonalds 0 1.0000000 NA NA NA NA NA 109.7368421 79.5322321 0.0 70.00 95.0 125.000 475 ▇▃▁▁▁
numeric cholesterol Taco Bell 0 1.0000000 NA NA NA NA NA 39.0434783 19.1931052 0.0 25.00 35.0 55.000 85 ▂▇▇▃▂
numeric sodium Arbys 0 1.0000000 NA NA NA NA NA 1515.2727273 663.6650610 100.0 960.00 1480.0 2020.000 3350 ▂▇▇▃▁
numeric sodium Burger King 0 1.0000000 NA NA NA NA NA 1223.5714286 499.8841481 310.0 850.00 1150.0 1635.000 2310 ▅▇▅▆▂
numeric sodium Chick Fil-A 0 1.0000000 NA NA NA NA NA 1151.4814815 726.9202882 220.0 700.00 1000.0 1405.000 3660 ▇▇▂▁▁
numeric sodium Dairy Queen 0 1.0000000 NA NA NA NA NA 1181.7857143 609.9398425 15.0 847.50 1030.0 1362.500 3500 ▂▇▂▁▁
numeric sodium Mcdonalds 0 1.0000000 NA NA NA NA NA 1437.8947368 1036.1721052 20.0 870.00 1120.0 1780.000 6080 ▇▅▁▁▁
numeric sodium Taco Bell 0 1.0000000 NA NA NA NA NA 1013.9130435 474.0543600 290.0 615.00 960.0 1300.000 2260 ▇▇▆▂▂
numeric total_carb Arbys 0 1.0000000 NA NA NA NA NA 44.8727273 19.0963003 4.0 34.00 46.0 54.000 83 ▂▃▇▅▂
numeric total_carb Burger King 0 1.0000000 NA NA NA NA NA 39.3142857 15.5559233 7.0 28.00 41.0 52.000 69 ▃▅▅▇▃
numeric total_carb Chick Fil-A 0 1.0000000 NA NA NA NA NA 28.6296296 20.4265064 1.0 8.00 29.0 42.500 70 ▇▂▆▃▂
numeric total_carb Dairy Queen 0 1.0000000 NA NA NA NA NA 38.6904762 23.7296646 0.0 25.25 34.0 44.750 121 ▃▇▁▁▁
numeric total_carb Mcdonalds 0 1.0000000 NA NA NA NA NA 48.7894737 26.4424826 9.0 32.00 46.0 62.000 156 ▆▇▂▁▁
numeric total_carb Taco Bell 0 1.0000000 NA NA NA NA NA 46.6347826 22.5183475 12.0 29.00 44.0 64.000 107 ▇▇▇▃▂
numeric fiber Arbys 0 1.0000000 NA NA NA NA NA 2.7090909 1.4099216 1.0 2.00 2.0 4.000 6 ▇▂▂▁▁
numeric fiber Burger King 10 0.8571429 NA NA NA NA NA 2.3833333 1.3541107 0.0 1.75 2.0 3.000 7 ▅▇▅▂▁
numeric fiber Chick Fil-A 2 0.9259259 NA NA NA NA NA 2.3200000 3.0512293 0.0 1.00 1.0 3.000 15 ▇▂▁▁▁
numeric fiber Dairy Queen 0 1.0000000 NA NA NA NA NA 2.8333333 2.9210478 0.0 1.00 2.0 3.000 12 ▇▂▁▁▁
numeric fiber Mcdonalds 0 1.0000000 NA NA NA NA NA 3.2280702 1.6585013 0.0 2.00 3.0 4.000 8 ▂▇▃▃▁
numeric fiber Taco Bell 0 1.0000000 NA NA NA NA NA 5.7130435 3.0572468 1.0 3.00 5.0 7.000 17 ▇▆▃▁▁
numeric sugar Arbys 0 1.0000000 NA NA NA NA NA 7.5636364 5.8587182 0.0 3.50 6.0 9.000 23 ▆▇▁▂▁
numeric sugar Burger King 0 1.0000000 NA NA NA NA NA 8.1857143 6.1862233 0.0 6.00 7.5 10.000 37 ▇▇▁▁▁
numeric sugar Chick Fil-A 0 1.0000000 NA NA NA NA NA 4.1481481 3.6658896 0.0 1.00 4.0 7.000 12 ▇▃▅▂▂
numeric sugar Dairy Queen 0 1.0000000 NA NA NA NA NA 6.3571429 5.0258912 0.0 3.00 6.0 8.750 30 ▇▅▁▁▁
numeric sugar Mcdonalds 0 1.0000000 NA NA NA NA NA 11.0701754 13.3454910 0.0 4.00 9.0 13.000 87 ▇▁▁▁▁
numeric sugar Taco Bell 0 1.0000000 NA NA NA NA NA 3.7043478 1.8870469 1.0 2.00 4.0 5.000 8 ▇▅▇▂▂
numeric protein Arbys 0 1.0000000 NA NA NA NA NA 29.2545455 12.3861006 5.0 20.00 29.0 38.000 62 ▅▆▇▃▁
numeric protein Burger King 1 0.9857143 NA NA NA NA NA 30.0144928 19.4690499 5.0 16.00 29.0 36.000 134 ▇▅▁▁▁
numeric protein Chick Fil-A 0 1.0000000 NA NA NA NA NA 31.7037037 16.9270262 11.0 23.50 29.0 37.000 103 ▇▆▁▁▁
numeric protein Dairy Queen 0 1.0000000 NA NA NA NA NA 24.8333333 11.5440126 1.0 17.00 23.0 34.000 49 ▂▇▅▅▃
numeric protein Mcdonalds 0 1.0000000 NA NA NA NA NA 40.2982456 29.4793904 7.0 25.00 33.0 46.000 186 ▇▂▁▁▁
numeric protein Taco Bell 0 1.0000000 NA NA NA NA NA 17.4173913 7.1352628 6.0 12.00 16.0 22.000 37 ▇▇▆▃▂
numeric vit_a Arbys 30 0.4545455 NA NA NA NA NA 12.5600000 16.8252786 0.0 2.00 6.0 20.000 60 ▇▂▁▁▁
numeric vit_a Burger King 70 0.0000000 NA NA NA NA NA NaN NA NA NA NA NA NA
numeric vit_a Chick Fil-A 6 0.7777778 NA NA NA NA NA 12.6190476 18.5188450 0.0 0.00 2.0 30.000 60 ▇▁▃▁▁
numeric vit_a Dairy Queen 15 0.6428571 NA NA NA NA NA 14.0000000 11.0069908 0.0 9.00 10.0 20.000 50 ▇▃▂▁▁
numeric vit_a Mcdonalds 0 1.0000000 NA NA NA NA NA 33.7192982 64.1598880 0.0 2.00 6.0 20.000 180 ▇▁▁▁▂
numeric vit_a Taco Bell 89 0.2260870 NA NA NA NA NA 11.8461538 3.9364177 6.0 8.50 12.5 15.000 20 ▅▃▁▇▁
numeric vit_c Arbys 30 0.4545455 NA NA NA NA NA 8.4000000 6.3442888 0.0 2.00 10.0 10.000 20 ▇▃▇▂▃
numeric vit_c Burger King 70 0.0000000 NA NA NA NA NA NaN NA NA NA NA NA NA
numeric vit_c Chick Fil-A 2 0.9259259 NA NA NA NA NA 14.0800000 15.2613892 0.0 2.00 8.0 20.000 50 ▇▁▁▂▁
numeric vit_c Dairy Queen 15 0.6428571 NA NA NA NA NA 4.3703704 5.8713656 0.0 0.00 4.0 6.000 30 ▇▁▁▁▁
numeric vit_c Mcdonalds 0 1.0000000 NA NA NA NA NA 18.2982456 18.0771807 0.0 2.00 15.0 25.000 70 ▇▆▂▁▁
numeric vit_c Taco Bell 89 0.2260870 NA NA NA NA NA 4.5384615 2.4368959 0.0 2.00 4.0 6.000 10 ▇▇▇▃▁
numeric calcium Arbys 30 0.4545455 NA NA NA NA NA 17.3600000 12.7111500 2.0 8.00 15.0 25.000 45 ▇▃▃▂▂
numeric calcium Burger King 70 0.0000000 NA NA NA NA NA NaN NA NA NA NA NA NA
numeric calcium Chick Fil-A 2 0.9259259 NA NA NA NA NA 11.3200000 10.4949194 0.0 2.00 8.0 20.000 35 ▇▂▃▂▁
numeric calcium Dairy Queen 15 0.6428571 NA NA NA NA NA 16.4074074 19.0167118 0.0 6.00 10.0 20.000 100 ▇▂▁▁▁
numeric calcium Mcdonalds 0 1.0000000 NA NA NA NA NA 20.5964912 37.9274965 0.0 6.00 15.0 20.000 290 ▇▁▁▁▁
numeric calcium Taco Bell 89 0.2260870 NA NA NA NA NA 24.8076923 11.3314403 6.0 15.00 25.0 35.000 45 ▃▆▂▇▂

select()

select [1/2]

select selects columns according to some set of names/conditions.

FastFood %>% select(restaurant, calories)
# A tibble: 515 × 2
   restaurant calories
   <chr>         <dbl>
 1 Mcdonalds       380
 2 Mcdonalds       840
 3 Mcdonalds      1130
 4 Mcdonalds       750
 5 Mcdonalds       920
 6 Mcdonalds       540
 7 Mcdonalds       300
 8 Mcdonalds       510
 9 Mcdonalds       430
10 Mcdonalds       770
# … with 505 more rows

select [2/2]

select selects columns according to some set of names/conditions.

FastFood %>% select(restaurant,starts_with("vit"))
# A tibble: 515 × 3
   restaurant vit_a vit_c
   <chr>      <dbl> <dbl>
 1 Mcdonalds      4    20
 2 Mcdonalds      6    20
 3 Mcdonalds     10    20
 4 Mcdonalds      6    25
 5 Mcdonalds      6    20
 6 Mcdonalds     10     2
 7 Mcdonalds     10     2
 8 Mcdonalds      0     4
 9 Mcdonalds     20     4
10 Mcdonalds     20     6
# … with 505 more rows

select [2/2]

select selects columns according to some set of names/conditions. Negative selection can occur.

FastFood %>% select(-restaurant)
# A tibble: 515 × 16
   item     calor…¹ cal_fat total…² sat_fat trans…³ chole…⁴ sodium total…⁵ fiber
   <chr>      <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>  <dbl>   <dbl> <dbl>
 1 Artisan…     380      60       7       2     0        95   1110      44     3
 2 Single …     840     410      45      17     1.5     130   1580      62     2
 3 Double …    1130     600      67      27     3       220   1920      63     3
 4 Grilled…     750     280      31      10     0.5     155   1940      62     2
 5 Crispy …     920     410      45      12     0.5     120   1980      81     4
 6 Big Mac      540     250      28      10     1        80    950      46     3
 7 Cheeseb…     300     100      12       5     0.5      40    680      33     2
 8 Classic…     510     210      24       4     0        65   1040      49     3
 9 Double …     430     190      21      11     1        85   1040      35     2
10 Double …     770     400      45      21     2.5     175   1290      42     3
# … with 505 more rows, 6 more variables: sugar <dbl>, protein <dbl>,
#   vit_a <dbl>, vit_c <dbl>, calcium <dbl>, salad <chr>, and abbreviated
#   variable names ¹​calories, ²​total_fat, ³​trans_fat, ⁴​cholesterol, ⁵​total_carb

mutate()

mutate [and transmute]

mutate() and transmute() are the core method for adding variables [columns] to existing data. The key difference is that mutate() retains existing variables while transmute() drops them. Let’s see it for sodium, rescaled to grams.

mutate will keep all columns.

FastFood %>% 
  mutate(Sodium.Grams = sodium / 1000) %>%
  select(restaurant,Sodium.Grams,sodium,everything())
# A tibble: 515 × 18
   restau…¹ Sodiu…² sodium item  calor…³ cal_fat total…⁴ sat_fat trans…⁵ chole…⁶
   <chr>      <dbl>  <dbl> <chr>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>   <dbl>
 1 Mcdonal…    1.11   1110 Arti…     380      60       7       2     0        95
 2 Mcdonal…    1.58   1580 Sing…     840     410      45      17     1.5     130
 3 Mcdonal…    1.92   1920 Doub…    1130     600      67      27     3       220
 4 Mcdonal…    1.94   1940 Gril…     750     280      31      10     0.5     155
 5 Mcdonal…    1.98   1980 Cris…     920     410      45      12     0.5     120
 6 Mcdonal…    0.95    950 Big …     540     250      28      10     1        80
 7 Mcdonal…    0.68    680 Chee…     300     100      12       5     0.5      40
 8 Mcdonal…    1.04   1040 Clas…     510     210      24       4     0        65
 9 Mcdonal…    1.04   1040 Doub…     430     190      21      11     1        85
10 Mcdonal…    1.29   1290 Doub…     770     400      45      21     2.5     175
# … with 505 more rows, 8 more variables: total_carb <dbl>, fiber <dbl>,
#   sugar <dbl>, protein <dbl>, vit_a <dbl>, vit_c <dbl>, calcium <dbl>,
#   salad <chr>, and abbreviated variable names ¹​restaurant, ²​Sodium.Grams,
#   ³​calories, ⁴​total_fat, ⁵​trans_fat, ⁶​cholesterol

transmute()

transmute

transmute will only keep the called columns.

FastFood %>% transmute(Sodium.Grams = sodium / 1000)
# A tibble: 515 × 1
   Sodium.Grams
          <dbl>
 1         1.11
 2         1.58
 3         1.92
 4         1.94
 5         1.98
 6         0.95
 7         0.68
 8         1.04
 9         1.04
10         1.29
# … with 505 more rows
# To keep a variable, copy it.
# FastFood %>% transmute(restaurant = restaurant, Sodium.Grams = sodium / 1000)

NB: Reassigning or newly assigning

To make these mutate() a part of the data, we assign it a new name or reassign it.

FastFood <- FastFood %>% mutate(Sodium.Grams = sodium / 1000)
My.Fast.Food <- FastFood %>% mutate(Sodium.Grams = sodium / 1000)

mutate() and transmute() can be fancy

Fixing a Frustration and a little Visual

Virtually all of these functions can embed other functions. We can use mutate with functions to do pretty fancy things. Let me isolate the chicken items.

FastFood <- FastFood %>% mutate(Chicken = stringr::str_detect(item, 'Chicken|Chick-n'))

What’s the distribution of Chicken items by chain?

ggplot(FastFood) +
 aes(x = restaurant, fill = Chicken) +
 geom_bar(position = "dodge") +
 coord_flip() +
 theme_minimal() +
 labs(x="", y="Menu Items", title="Chicken Menu Items by Fast Food Chain")

group_by()

The Magic of group_by

group_by is a core tidyverse operator for repeating something by groups. By itself, it simply splits a data object according to the grouping variable(s).

But that is exactly what a pivot table does.

Grouping and pipes

FastFood %>% group_by(restaurant) %>% skim() %>% kable()  %>%
  scroll_box(width = "100%", height = "500px")
skim_type skim_variable restaurant n_missing complete_rate character.min character.max character.empty character.n_unique character.whitespace logical.mean logical.count numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
character item Arbys 0 1.0000000 10 39 0 55 0 NA NA NA NA NA NA NA NA NA NA
character item Burger King 0 1.0000000 9 63 0 70 0 NA NA NA NA NA NA NA NA NA NA
character item Chick Fil-A 0 1.0000000 12 36 0 27 0 NA NA NA NA NA NA NA NA NA NA
character item Dairy Queen 0 1.0000000 7 45 0 42 0 NA NA NA NA NA NA NA NA NA NA
character item Mcdonalds 0 1.0000000 5 49 0 57 0 NA NA NA NA NA NA NA NA NA NA
character item Sonic 0 1.0000000 8 48 0 53 0 NA NA NA NA NA NA NA NA NA NA
character item Subway 0 1.0000000 7 58 0 96 0 NA NA NA NA NA NA NA NA NA NA
character item Taco Bell 0 1.0000000 7 46 0 113 0 NA NA NA NA NA NA NA NA NA NA
character salad Arbys 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA NA NA
character salad Burger King 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA NA NA
character salad Chick Fil-A 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA NA NA
character salad Dairy Queen 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA NA NA
character salad Mcdonalds 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA NA NA
character salad Sonic 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA NA NA
character salad Subway 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA NA NA
character salad Taco Bell 0 1.0000000 5 5 0 1 0 NA NA NA NA NA NA NA NA NA NA
logical Chicken Arbys 0 1.0000000 NA NA NA NA NA 0.2363636 FAL: 42, TRU: 13 NA NA NA NA NA NA NA NA
logical Chicken Burger King 0 1.0000000 NA NA NA NA NA 0.4857143 FAL: 36, TRU: 34 NA NA NA NA NA NA NA NA
logical Chicken Chick Fil-A 0 1.0000000 NA NA NA NA NA 0.9259259 TRU: 25, FAL: 2 NA NA NA NA NA NA NA NA
logical Chicken Dairy Queen 0 1.0000000 NA NA NA NA NA 0.3095238 FAL: 29, TRU: 13 NA NA NA NA NA NA NA NA
logical Chicken Mcdonalds 0 1.0000000 NA NA NA NA NA 0.6491228 TRU: 37, FAL: 20 NA NA NA NA NA NA NA NA
logical Chicken Sonic 0 1.0000000 NA NA NA NA NA 0.4150943 FAL: 31, TRU: 22 NA NA NA NA NA NA NA NA
logical Chicken Subway 0 1.0000000 NA NA NA NA NA 0.1666667 FAL: 80, TRU: 16 NA NA NA NA NA NA NA NA
logical Chicken Taco Bell 0 1.0000000 NA NA NA NA NA 0.2000000 FAL: 92, TRU: 23 NA NA NA NA NA NA NA NA
numeric calories Arbys 0 1.0000000 NA NA NA NA NA NA NA 532.7272727 210.3388320 70.000 360.0000 550.00 690.0000 1030.00 ▃▆▇▇▂
numeric calories Burger King 0 1.0000000 NA NA NA NA NA NA NA 608.5714286 290.4184174 190.000 365.0000 555.00 760.0000 1550.00 ▇▇▃▂▁
numeric calories Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 384.4444444 220.4947816 70.000 220.0000 390.00 480.0000 970.00 ▇▇▇▁▂
numeric calories Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 520.2380952 259.3376939 20.000 350.0000 485.00 630.0000 1260.00 ▂▇▆▂▁
numeric calories Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 640.3508772 410.6961203 140.000 380.0000 540.00 740.0000 2430.00 ▇▅▁▁▁
numeric calories Sonic 0 1.0000000 NA NA NA NA NA NA NA 631.6981132 300.8816267 100.000 410.0000 570.00 740.0000 1350.00 ▃▇▆▂▃
numeric calories Subway 0 1.0000000 NA NA NA NA NA NA NA 503.0208333 282.2209653 50.000 287.5000 460.00 740.0000 1160.00 ▅▇▃▃▂
numeric calories Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 443.6521739 184.3448829 140.000 320.0000 420.00 575.0000 880.00 ▆▇▇▃▂
numeric cal_fat Arbys 0 1.0000000 NA NA NA NA NA NA NA 237.8363636 113.1696144 45.000 135.0000 250.00 310.0000 495.00 ▆▅▇▃▂
numeric cal_fat Burger King 0 1.0000000 NA NA NA NA NA NA NA 333.7571429 194.4993898 90.000 172.5000 285.00 431.5000 1134.00 ▇▅▂▁▁
numeric cal_fat Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 145.3703704 102.3572733 18.000 67.5000 126.00 171.0000 423.00 ▆▇▂▁▁
numeric cal_fat Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 260.4761905 156.4850555 0.000 160.0000 220.00 310.0000 670.00 ▃▇▃▂▁
numeric cal_fat Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 285.6140351 220.8992785 50.000 160.0000 240.00 320.0000 1270.00 ▇▃▁▁▁
numeric cal_fat Sonic 0 1.0000000 NA NA NA NA NA NA NA 338.3018868 197.1404795 100.000 180.0000 290.00 430.0000 900.00 ▇▅▂▁▁
numeric cal_fat Subway 0 1.0000000 NA NA NA NA NA NA NA 165.1041667 134.8351414 10.000 48.7500 137.50 242.5000 620.00 ▇▅▂▁▁
numeric cal_fat Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 188.0000000 84.8109352 35.000 120.0000 180.00 250.0000 380.00 ▆▇▆▅▃
numeric total_fat Arbys 0 1.0000000 NA NA NA NA NA NA NA 26.9818182 13.2448325 5.000 15.5000 28.00 35.0000 59.00 ▅▅▇▂▂
numeric total_fat Burger King 0 1.0000000 NA NA NA NA NA NA NA 36.8142857 21.2434425 10.000 19.2500 31.50 48.0000 126.00 ▇▅▂▁▁
numeric total_fat Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 16.1481481 11.3771150 2.000 7.5000 14.00 19.0000 47.00 ▆▇▂▁▁
numeric total_fat Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 28.8571429 17.5187306 0.000 18.0000 24.50 34.7500 75.00 ▃▇▃▂▁
numeric total_fat Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 31.8070175 24.5156208 5.000 18.0000 27.00 36.0000 141.00 ▇▃▁▁▁
numeric total_fat Sonic 0 1.0000000 NA NA NA NA NA NA NA 37.6415094 21.9729680 11.000 20.0000 32.00 48.0000 100.00 ▇▅▂▁▁
numeric total_fat Subway 0 1.0000000 NA NA NA NA NA NA NA 18.4791667 14.6092827 1.000 6.0000 15.00 26.5000 62.00 ▇▃▃▁▁
numeric total_fat Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 20.8956522 9.4082587 4.000 13.0000 20.00 28.0000 42.00 ▅▇▅▅▃
numeric sat_fat Arbys 0 1.0000000 NA NA NA NA NA NA NA 7.9727273 4.1626850 1.500 4.5000 7.00 11.0000 17.00 ▇▅▅▅▃
numeric sat_fat Burger King 0 1.0000000 NA NA NA NA NA NA NA 11.1500000 8.7676737 2.000 5.0000 8.00 13.7500 47.00 ▇▂▂▁▁
numeric sat_fat Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 4.1111111 3.9034830 0.000 1.5000 3.00 4.7500 16.00 ▇▂▂▁▁
numeric sat_fat Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 10.4404762 8.2522742 0.000 5.0000 9.00 12.5000 43.00 ▇▆▂▁▁
numeric sat_fat Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 8.2894737 5.5347501 0.500 4.5000 7.00 11.0000 27.00 ▇▇▃▁▁
numeric sat_fat Sonic 0 1.0000000 NA NA NA NA NA NA NA 11.4150943 8.6673214 2.500 5.0000 8.00 15.0000 36.00 ▇▂▂▁▁
numeric sat_fat Subway 0 1.0000000 NA NA NA NA NA NA NA 6.1979167 5.2417751 0.000 2.0000 4.50 9.2500 22.00 ▇▃▃▁▁
numeric sat_fat Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 6.5913043 2.9813621 1.000 4.0000 6.00 9.0000 14.00 ▃▇▃▅▂
numeric trans_fat Arbys 0 1.0000000 NA NA NA NA NA NA NA 0.4181818 0.5913233 0.000 0.0000 0.00 1.0000 2.00 ▇▂▂▁▁
numeric trans_fat Burger King 0 1.0000000 NA NA NA NA NA NA NA 0.8642857 1.3457432 0.000 0.0000 0.00 1.3750 8.00 ▇▂▁▁▁
numeric trans_fat Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 0.0370370 0.1924501 0.000 0.0000 0.00 0.0000 1.00 ▇▁▁▁▁
numeric trans_fat Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 0.6785714 0.7141550 0.000 0.0000 1.00 1.0000 2.00 ▇▁▇▁▂
numeric trans_fat Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 0.4649123 0.7249363 0.000 0.0000 0.00 1.0000 3.00 ▇▁▁▁▁
numeric trans_fat Sonic 0 1.0000000 NA NA NA NA NA NA NA 0.9339623 1.2597227 0.000 0.0000 0.00 2.0000 4.00 ▇▂▂▁▁
numeric trans_fat Subway 0 1.0000000 NA NA NA NA NA NA NA 0.2187500 0.5222043 0.000 0.0000 0.00 0.0000 2.00 ▇▁▁▁▁
numeric trans_fat Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 0.2565217 0.3993702 0.000 0.0000 0.00 0.5000 1.00 ▇▁▂▁▂
numeric cholesterol Arbys 0 1.0000000 NA NA NA NA NA NA NA 70.4545455 34.3089548 15.000 45.0000 65.00 90.0000 155.00 ▆▇▇▃▂
numeric cholesterol Burger King 0 1.0000000 NA NA NA NA NA NA NA 100.8571429 107.3223893 5.000 40.0000 85.00 115.0000 805.00 ▇▁▁▁▁
numeric cholesterol Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 79.0740741 47.3131352 25.000 55.0000 70.00 87.5000 285.00 ▇▅▁▁▁
numeric cholesterol Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 71.5476190 43.7493985 0.000 41.2500 60.00 100.0000 180.00 ▃▇▃▂▂
numeric cholesterol Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 109.7368421 79.5322321 0.000 70.0000 95.00 125.0000 475.00 ▇▃▁▁▁
numeric cholesterol Sonic 0 1.0000000 NA NA NA NA NA NA NA 86.9811321 63.7704690 0.000 40.0000 80.00 110.0000 260.00 ▇▇▃▁▂
numeric cholesterol Subway 0 1.0000000 NA NA NA NA NA NA NA 61.3020833 40.9315526 0.000 40.0000 50.00 85.0000 190.00 ▅▇▅▁▁
numeric cholesterol Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 39.0434783 19.1931052 0.000 25.0000 35.00 55.0000 85.00 ▂▇▇▃▂
numeric sodium Arbys 0 1.0000000 NA NA NA NA NA NA NA 1515.2727273 663.6650610 100.000 960.0000 1480.00 2020.0000 3350.00 ▂▇▇▃▁
numeric sodium Burger King 0 1.0000000 NA NA NA NA NA NA NA 1223.5714286 499.8841481 310.000 850.0000 1150.00 1635.0000 2310.00 ▅▇▅▆▂
numeric sodium Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 1151.4814815 726.9202882 220.000 700.0000 1000.00 1405.0000 3660.00 ▇▇▂▁▁
numeric sodium Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 1181.7857143 609.9398425 15.000 847.5000 1030.00 1362.5000 3500.00 ▂▇▂▁▁
numeric sodium Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 1437.8947368 1036.1721052 20.000 870.0000 1120.00 1780.0000 6080.00 ▇▅▁▁▁
numeric sodium Sonic 0 1.0000000 NA NA NA NA NA NA NA 1350.7547170 665.1340208 470.000 900.0000 1250.00 1550.0000 4520.00 ▇▆▂▁▁
numeric sodium Subway 0 1.0000000 NA NA NA NA NA NA NA 1272.9687500 743.6345941 65.000 697.5000 1130.00 1605.0000 3540.00 ▅▇▃▁▂
numeric sodium Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 1013.9130435 474.0543600 290.000 615.0000 960.00 1300.0000 2260.00 ▇▇▆▂▂
numeric total_carb Arbys 0 1.0000000 NA NA NA NA NA NA NA 44.8727273 19.0963003 4.000 34.0000 46.00 54.0000 83.00 ▂▃▇▅▂
numeric total_carb Burger King 0 1.0000000 NA NA NA NA NA NA NA 39.3142857 15.5559233 7.000 28.0000 41.00 52.0000 69.00 ▃▅▅▇▃
numeric total_carb Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 28.6296296 20.4265064 1.000 8.0000 29.00 42.5000 70.00 ▇▂▆▃▂
numeric total_carb Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 38.6904762 23.7296646 0.000 25.2500 34.00 44.7500 121.00 ▃▇▁▁▁
numeric total_carb Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 48.7894737 26.4424826 9.000 32.0000 46.00 62.0000 156.00 ▆▇▂▁▁
numeric total_carb Sonic 0 1.0000000 NA NA NA NA NA NA NA 47.2075472 21.5463351 16.000 33.0000 44.00 51.0000 126.00 ▆▇▂▁▁
numeric total_carb Subway 0 1.0000000 NA NA NA NA NA NA NA 54.7187500 33.3143570 8.000 25.7500 47.00 92.0000 118.00 ▇▇▂▆▂
numeric total_carb Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 46.6347826 22.5183475 12.000 29.0000 44.00 64.0000 107.00 ▇▇▇▃▂
numeric fiber Arbys 0 1.0000000 NA NA NA NA NA NA NA 2.7090909 1.4099216 1.000 2.0000 2.00 4.0000 6.00 ▇▂▂▁▁
numeric fiber Burger King 10 0.8571429 NA NA NA NA NA NA NA 2.3833333 1.3541107 0.000 1.7500 2.00 3.0000 7.00 ▅▇▅▂▁
numeric fiber Chick Fil-A 2 0.9259259 NA NA NA NA NA NA NA 2.3200000 3.0512293 0.000 1.0000 1.00 3.0000 15.00 ▇▂▁▁▁
numeric fiber Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 2.8333333 2.9210478 0.000 1.0000 2.00 3.0000 12.00 ▇▂▁▁▁
numeric fiber Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 3.2280702 1.6585013 0.000 2.0000 3.00 4.0000 8.00 ▂▇▃▃▁
numeric fiber Sonic 0 1.0000000 NA NA NA NA NA NA NA 2.6603774 1.7752941 0.000 2.0000 2.00 3.0000 8.00 ▃▇▂▂▁
numeric fiber Subway 0 1.0000000 NA NA NA NA NA NA NA 6.5625000 3.2373235 3.000 4.0000 5.00 10.0000 16.00 ▇▁▃▁▁
numeric fiber Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 5.7130435 3.0572468 1.000 3.0000 5.00 7.0000 17.00 ▇▆▃▁▁
numeric sugar Arbys 0 1.0000000 NA NA NA NA NA NA NA 7.5636364 5.8587182 0.000 3.5000 6.00 9.0000 23.00 ▆▇▁▂▁
numeric sugar Burger King 0 1.0000000 NA NA NA NA NA NA NA 8.1857143 6.1862233 0.000 6.0000 7.50 10.0000 37.00 ▇▇▁▁▁
numeric sugar Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 4.1481481 3.6658896 0.000 1.0000 4.00 7.0000 12.00 ▇▃▅▂▂
numeric sugar Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 6.3571429 5.0258912 0.000 3.0000 6.00 8.7500 30.00 ▇▅▁▁▁
numeric sugar Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 11.0701754 13.3454910 0.000 4.0000 9.00 13.0000 87.00 ▇▁▁▁▁
numeric sugar Sonic 0 1.0000000 NA NA NA NA NA NA NA 6.5283019 3.9448300 0.000 4.0000 7.00 9.0000 17.00 ▃▃▇▂▁
numeric sugar Subway 0 1.0000000 NA NA NA NA NA NA NA 10.0937500 5.6084112 3.000 6.0000 8.00 14.0000 36.00 ▇▃▁▁▁
numeric sugar Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 3.7043478 1.8870469 1.000 2.0000 4.00 5.0000 8.00 ▇▅▇▂▂
numeric protein Arbys 0 1.0000000 NA NA NA NA NA NA NA 29.2545455 12.3861006 5.000 20.0000 29.00 38.0000 62.00 ▅▆▇▃▁
numeric protein Burger King 1 0.9857143 NA NA NA NA NA NA NA 30.0144928 19.4690499 5.000 16.0000 29.00 36.0000 134.00 ▇▅▁▁▁
numeric protein Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 31.7037037 16.9270262 11.000 23.5000 29.00 37.0000 103.00 ▇▆▁▁▁
numeric protein Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 24.8333333 11.5440126 1.000 17.0000 23.00 34.0000 49.00 ▂▇▅▅▃
numeric protein Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 40.2982456 29.4793904 7.000 25.0000 33.00 46.0000 186.00 ▇▂▁▁▁
numeric protein Sonic 0 1.0000000 NA NA NA NA NA NA NA 29.1886792 14.5325319 6.000 18.0000 30.00 35.0000 67.00 ▆▆▇▁▂
numeric protein Subway 0 1.0000000 NA NA NA NA NA NA NA 30.3125000 16.1442918 3.000 18.0000 26.00 40.0000 78.00 ▆▇▆▂▁
numeric protein Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 17.4173913 7.1352628 6.000 12.0000 16.00 22.0000 37.00 ▇▇▆▃▂
numeric vit_a Arbys 30 0.4545455 NA NA NA NA NA NA NA 12.5600000 16.8252786 0.000 2.0000 6.00 20.0000 60.00 ▇▂▁▁▁
numeric vit_a Burger King 70 0.0000000 NA NA NA NA NA NA NA NaN NA NA NA NA NA NA
numeric vit_a Chick Fil-A 6 0.7777778 NA NA NA NA NA NA NA 12.6190476 18.5188450 0.000 0.0000 2.00 30.0000 60.00 ▇▁▃▁▁
numeric vit_a Dairy Queen 15 0.6428571 NA NA NA NA NA NA NA 14.0000000 11.0069908 0.000 9.0000 10.00 20.0000 50.00 ▇▃▂▁▁
numeric vit_a Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 33.7192982 64.1598880 0.000 2.0000 6.00 20.0000 180.00 ▇▁▁▁▂
numeric vit_a Sonic 4 0.9245283 NA NA NA NA NA NA NA 6.9387755 5.6767368 0.000 2.0000 6.00 10.0000 20.00 ▇▃▃▃▁
numeric vit_a Subway 0 1.0000000 NA NA NA NA NA NA NA 22.3854167 15.1354375 6.000 10.0000 16.00 30.0000 60.00 ▇▃▁▁▂
numeric vit_a Taco Bell 89 0.2260870 NA NA NA NA NA NA NA 11.8461538 3.9364177 6.000 8.5000 12.50 15.0000 20.00 ▅▃▁▇▁
numeric vit_c Arbys 30 0.4545455 NA NA NA NA NA NA NA 8.4000000 6.3442888 0.000 2.0000 10.00 10.0000 20.00 ▇▃▇▂▃
numeric vit_c Burger King 70 0.0000000 NA NA NA NA NA NA NA NaN NA NA NA NA NA NA
numeric vit_c Chick Fil-A 2 0.9259259 NA NA NA NA NA NA NA 14.0800000 15.2613892 0.000 2.0000 8.00 20.0000 50.00 ▇▁▁▂▁
numeric vit_c Dairy Queen 15 0.6428571 NA NA NA NA NA NA NA 4.3703704 5.8713656 0.000 0.0000 4.00 6.0000 30.00 ▇▁▁▁▁
numeric vit_c Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 18.2982456 18.0771807 0.000 2.0000 15.00 25.0000 70.00 ▇▆▂▁▁
numeric vit_c Sonic 4 0.9245283 NA NA NA NA NA NA NA 5.7551020 4.8111442 0.000 2.0000 6.00 8.0000 25.00 ▇▇▁▁▁
numeric vit_c Subway 0 1.0000000 NA NA NA NA NA NA NA 41.9687500 44.0113510 4.000 20.0000 40.00 50.0000 400.00 ▇▁▁▁▁
numeric vit_c Taco Bell 89 0.2260870 NA NA NA NA NA NA NA 4.5384615 2.4368959 0.000 2.0000 4.00 6.0000 10.00 ▇▇▇▃▁
numeric calcium Arbys 30 0.4545455 NA NA NA NA NA NA NA 17.3600000 12.7111500 2.000 8.0000 15.00 25.0000 45.00 ▇▃▃▂▂
numeric calcium Burger King 70 0.0000000 NA NA NA NA NA NA NA NaN NA NA NA NA NA NA
numeric calcium Chick Fil-A 2 0.9259259 NA NA NA NA NA NA NA 11.3200000 10.4949194 0.000 2.0000 8.00 20.0000 35.00 ▇▂▃▂▁
numeric calcium Dairy Queen 15 0.6428571 NA NA NA NA NA NA NA 16.4074074 19.0167118 0.000 6.0000 10.00 20.0000 100.00 ▇▂▁▁▁
numeric calcium Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 20.5964912 37.9274965 0.000 6.0000 15.00 20.0000 290.00 ▇▁▁▁▁
numeric calcium Sonic 4 0.9245283 NA NA NA NA NA NA NA 17.2448980 12.0701605 1.000 8.0000 15.00 27.0000 40.00 ▇▆▂▆▂
numeric calcium Subway 0 1.0000000 NA NA NA NA NA NA NA 39.1250000 25.1321769 4.000 20.0000 35.00 60.0000 100.00 ▇▇▆▂▁
numeric calcium Taco Bell 89 0.2260870 NA NA NA NA NA NA NA 24.8076923 11.3314403 6.000 15.0000 25.00 35.0000 45.00 ▃▆▂▇▂
numeric Sodium.Grams Arbys 0 1.0000000 NA NA NA NA NA NA NA 1.5152727 0.6636651 0.100 0.9600 1.48 2.0200 3.35 ▂▇▇▃▁
numeric Sodium.Grams Burger King 0 1.0000000 NA NA NA NA NA NA NA 1.2235714 0.4998841 0.310 0.8500 1.15 1.6350 2.31 ▅▇▅▆▂
numeric Sodium.Grams Chick Fil-A 0 1.0000000 NA NA NA NA NA NA NA 1.1514815 0.7269203 0.220 0.7000 1.00 1.4050 3.66 ▇▇▂▁▁
numeric Sodium.Grams Dairy Queen 0 1.0000000 NA NA NA NA NA NA NA 1.1817857 0.6099398 0.015 0.8475 1.03 1.3625 3.50 ▂▇▂▁▁
numeric Sodium.Grams Mcdonalds 0 1.0000000 NA NA NA NA NA NA NA 1.4378947 1.0361721 0.020 0.8700 1.12 1.7800 6.08 ▇▅▁▁▁
numeric Sodium.Grams Sonic 0 1.0000000 NA NA NA NA NA NA NA 1.3507547 0.6651340 0.470 0.9000 1.25 1.5500 4.52 ▇▆▂▁▁
numeric Sodium.Grams Subway 0 1.0000000 NA NA NA NA NA NA NA 1.2729688 0.7436346 0.065 0.6975 1.13 1.6050 3.54 ▅▇▃▁▂
numeric Sodium.Grams Taco Bell 0 1.0000000 NA NA NA NA NA NA NA 1.0139130 0.4740544 0.290 0.6150 0.96 1.3000 2.26 ▇▇▆▂▂

A Two Variable Pivot

FastFood %>% group_by(restaurant,Chicken) %>% skim(Sodium.Grams) %>% kable()  %>% scroll_box(width = "100%", height = "50%")
skim_type skim_variable restaurant Chicken n_missing complete_rate numeric.mean numeric.sd numeric.p0 numeric.p25 numeric.p50 numeric.p75 numeric.p100 numeric.hist
numeric Sodium.Grams Arbys FALSE 0 1 1.5728571 0.6980191 0.100 0.9850 1.515 2.0875 3.35 ▂▇▆▅▁
numeric Sodium.Grams Arbys TRUE 0 1 1.3292308 0.5179038 0.640 0.9500 1.210 1.7500 2.11 ▆▇▂▃▆
numeric Sodium.Grams Burger King FALSE 0 1 1.2105556 0.5032009 0.450 0.8050 1.160 1.5825 2.27 ▇▇▆▇▂
numeric Sodium.Grams Burger King TRUE 0 1 1.2373529 0.5035348 0.310 0.8550 1.150 1.6400 2.31 ▃▇▅▆▃
numeric Sodium.Grams Chick Fil-A FALSE 0 1 1.4800000 0.3959798 1.200 1.3400 1.480 1.6200 1.76 ▇▁▁▁▇
numeric Sodium.Grams Chick Fil-A TRUE 0 1 1.1252000 0.7457888 0.220 0.6700 0.990 1.3500 3.66 ▇▇▂▁▁
numeric Sodium.Grams Dairy Queen FALSE 0 1 1.0812069 0.4583348 0.015 0.9000 1.000 1.2500 2.21 ▁▂▇▂▁
numeric Sodium.Grams Dairy Queen TRUE 0 1 1.4061538 0.8378200 0.670 0.8200 1.190 1.5300 3.50 ▇▅▁▁▁
numeric Sodium.Grams Mcdonalds FALSE 0 1 1.3100000 0.9047535 0.480 0.8275 1.025 1.3775 4.45 ▇▃▁▁▁
numeric Sodium.Grams Mcdonalds TRUE 0 1 1.5070270 1.1063902 0.020 0.9600 1.190 1.8200 6.08 ▇▅▁▁▁
numeric Sodium.Grams Sonic FALSE 0 1 1.1874194 0.4170449 0.470 0.8650 1.180 1.4100 2.31 ▆▇▇▃▁
numeric Sodium.Grams Sonic TRUE 0 1 1.5809091 0.8672557 0.670 0.9700 1.420 2.0250 4.52 ▇▆▂▁▁
numeric Sodium.Grams Subway FALSE 0 1 1.2979375 0.7695501 0.065 0.7150 1.195 1.6050 3.54 ▅▇▃▁▂
numeric Sodium.Grams Subway TRUE 0 1 1.1481250 0.6028070 0.280 0.6550 1.080 1.4100 2.28 ▇▆▅▂▅
numeric Sodium.Grams Taco Bell FALSE 0 1 0.9808696 0.4597793 0.290 0.5925 0.930 1.2700 2.26 ▇▇▆▂▁
numeric Sodium.Grams Taco Bell TRUE 0 1 1.1460870 0.5169644 0.460 0.7400 1.070 1.4950 2.23 ▇▇▇▁▃

summarise / summarize

summarise

Is the analog to creating a pivot table in R by whatever groupings we wish.

FastFood %>% group_by(restaurant, Chicken) %>% summarise(Mean.Protein = mean(protein), Mean.Protein.NA = mean(protein, na.rm=TRUE))
# A tibble: 16 × 4
# Groups:   restaurant [8]
   restaurant  Chicken Mean.Protein Mean.Protein.NA
   <chr>       <lgl>          <dbl>           <dbl>
 1 Arbys       FALSE           29.6            29.6
 2 Arbys       TRUE            28              28  
 3 Burger King FALSE           NA              34.1
 4 Burger King TRUE            25.8            25.8
 5 Chick Fil-A FALSE           33.5            33.5
 6 Chick Fil-A TRUE            31.6            31.6
 7 Dairy Queen FALSE           23.3            23.3
 8 Dairy Queen TRUE            28.3            28.3
 9 Mcdonalds   FALSE           36.2            36.2
10 Mcdonalds   TRUE            42.5            42.5
11 Sonic       FALSE           28.9            28.9
12 Sonic       TRUE            29.5            29.5
13 Subway      FALSE           28.7            28.7
14 Subway      TRUE            38.4            38.4
15 Taco Bell   FALSE           16.4            16.4
16 Taco Bell   TRUE            21.3            21.3

ungroup()

ungroup()

We need ungroup() when we want to combine mutate() and group_by() to calculate aggregate statistics for all relevant rows. Objects retain their grouped status unless we actively remove it.

FastFood <- FastFood %>% 
  group_by(restaurant) %>% 
  mutate(Avg.Protein = mean(protein, na.rm=TRUE), Protein.Dev = protein - Avg.Protein) %>%
  ungroup()

arrange()

arrange() [1/2]

We can use arrange to sort a result. For example,

FastFood %>% 
  group_by(restaurant) %>% 
  summarise(Avg.Calories = mean(calories)) %>% 
  arrange(Avg.Calories)
# A tibble: 8 × 2
  restaurant  Avg.Calories
  <chr>              <dbl>
1 Chick Fil-A         384.
2 Taco Bell           444.
3 Subway              503.
4 Dairy Queen         520.
5 Arbys               533.
6 Burger King         609.
7 Sonic               632.
8 Mcdonalds           640.

arrange() [2/2]

We can use arrange to sort a result, and desc() to flip it. For example,

FastFood %>% 
  group_by(restaurant) %>% 
  summarise(Avg.Calories = mean(calories)) %>% 
  arrange(desc(Avg.Calories))
# A tibble: 8 × 2
  restaurant  Avg.Calories
  <chr>              <dbl>
1 Mcdonalds           640.
2 Sonic               632.
3 Burger King         609.
4 Arbys               533.
5 Dairy Queen         520.
6 Subway              503.
7 Taco Bell           444.
8 Chick Fil-A         384.

A Basic Table: Counts

( Restaurant.Table <- FastFood %>% group_by(restaurant) %>% summarise(Count = n()) %>% arrange(Count) )
# A tibble: 8 × 2
  restaurant  Count
  <chr>       <int>
1 Chick Fil-A    27
2 Dairy Queen    42
3 Sonic          53
4 Arbys          55
5 Mcdonalds      57
6 Burger King    70
7 Subway         96
8 Taco Bell     115

A More Elaborate Table: Counts

( Rest.Chicken.Table <- FastFood %>% group_by(restaurant, Chicken) %>% summarise(Count = n()) )
# A tibble: 16 × 3
# Groups:   restaurant [8]
   restaurant  Chicken Count
   <chr>       <lgl>   <int>
 1 Arbys       FALSE      42
 2 Arbys       TRUE       13
 3 Burger King FALSE      36
 4 Burger King TRUE       34
 5 Chick Fil-A FALSE       2
 6 Chick Fil-A TRUE       25
 7 Dairy Queen FALSE      29
 8 Dairy Queen TRUE       13
 9 Mcdonalds   FALSE      20
10 Mcdonalds   TRUE       37
11 Sonic       FALSE      31
12 Sonic       TRUE       22
13 Subway      FALSE      80
14 Subway      TRUE       16
15 Taco Bell   FALSE      92
16 Taco Bell   TRUE       23

A First Data Visualisation

FastFood %>% group_by(restaurant) %>% summarise(Count = n()) %>% ggplot() + aes(x=restaurant, y=Count) + geom_col()

Adding in Chicken

FastFood %>%  group_by(restaurant, Chicken) %>% summarise(Count = n()) %>%
    ggplot() + aes(x=restaurant, y=Count, fill=Chicken) + geom_col() #<<

More Chaining [fct_reorder()]

FastFood %>% group_by(restaurant) %>% summarise(Count = n()) %>% ggplot() + aes(x=fct_reorder(restaurant, Count), y=Count) + geom_col() + labs(x="Chain", y="Count") + coord_flip() 

Even More Chaining

FastFood %>% group_by(restaurant) %>% summarise(Count = n()) %>% ggplot() + aes(x=fct_reorder(restaurant, desc(Count)), y=Count) + geom_col() + labs(x="Chain", y="Count") + coord_flip() 

A Note on Skim

We could do it by hand.

FastFood %>% group_by(restaurant) %>% 
  summarise(Mean = mean(calories, na.rm=TRUE), 
            SD = sd(calories, na.rm=TRUE), 
            Min = min(calories, na.rm=TRUE), 
            Median = median(calories, na.rm=TRUE), 
            Max = max(calories, na.rm=TRUE), 
            Q1 = quantile(calories, 0.25, na.rm=TRUE), 
            Q3 = quantile(calories, 0.75, na.rm=TRUE))
# A tibble: 8 × 8
  restaurant   Mean    SD   Min Median   Max    Q1    Q3
  <chr>       <dbl> <dbl> <dbl>  <dbl> <dbl> <dbl> <dbl>
1 Arbys        533.  210.    70    550  1030  360    690
2 Burger King  609.  290.   190    555  1550  365    760
3 Chick Fil-A  384.  220.    70    390   970  220    480
4 Dairy Queen  520.  259.    20    485  1260  350    630
5 Mcdonalds    640.  411.   140    540  2430  380    740
6 Sonic        632.  301.   100    570  1350  410    740
7 Subway       503.  282.    50    460  1160  288.   740
8 Taco Bell    444.  184.   140    420   880  320    575

A Recap

Four dplyr verbs:
- filter()
- select()
- mutate() / transmute()
- summarise()

Two helpers:
- group_by() and ungroup()
- arrange() and desc()

Something A Little Bit Crazy 🤷

Take our summary function from above. But now let me embed it in a function so that it will do it for any variable. Though it will work with any name [calories would do], I will be explicit. There are some [highlighted] programming tricks here but this could be adapted to any functions we might want.

library(rlang)
summarise.me <- function(data, var) {   
  data <- as.data.frame(data); var <- ensym(var)
Res <- data %>% summarise(
            Mean = mean(!! var, na.rm=TRUE),
            SD = sd(!! var, na.rm=TRUE),
            Min = min(!! var, na.rm=TRUE), 
            Q1 = quantile(!! var, 0.25, na.rm=TRUE),
            Median = median(!! var, na.rm=TRUE), 
            Q3 = quantile(!! var, 0.75, na.rm=TRUE),
            Max = max(!! var, na.rm=TRUE))
return(Res)
}
FastFood %>% summarise.me(., protein)
      Mean       SD Min Q1 Median Q3 Max
1 27.89105 17.68392   1 16   24.5 36 186
# Equivalent to summarise.me(FastFood, protein)