# A tibble: 7 × 4
name alignment gender publisher
<chr> <chr> <chr> <chr>
1 Magneto bad male Marvel
2 Storm good female Marvel
3 Mystique bad female Marvel
4 Batman good male DC
5 Joker bad male DC
6 Catwoman bad female DC
7 Hellboy good male Dark Horse Comics
How’s that done?
brands
# A tibble: 3 × 2
publisher yr_founded
<chr> <dbl>
1 DC 1934
2 Marvel 1939
3 Image 1992
Joins
Are the core idea behind SQL. R does all of these things though it does them in an R way. We will figure that out. The assignment? To learn joins. How? Here are the commands; use them and see how they work. It should be simple. inner_join(heroes,brands) produces what? Same for semi? left? right? full? In the following explanations, when I use left and right data, I mean heroes as left and brands as right where the sides are with respect to the ,.
inner_join
How’s that done?
inner_join(heroes,brands)
Joining with `by = join_by(publisher)`
# A tibble: 6 × 5
name alignment gender publisher yr_founded
<chr> <chr> <chr> <chr> <dbl>
1 Magneto bad male Marvel 1939
2 Storm good female Marvel 1939
3 Mystique bad female Marvel 1939
4 Batman good male DC 1934
5 Joker bad male DC 1934
6 Catwoman bad female DC 1934
We keep only those rows that exist in both the left and the right datasets and we join all columns.
semi_join
How’s that done?
semi_join(heroes,brands)
Joining with `by = join_by(publisher)`
# A tibble: 6 × 4
name alignment gender publisher
<chr> <chr> <chr> <chr>
1 Magneto bad male Marvel
2 Storm good female Marvel
3 Mystique bad female Marvel
4 Batman good male DC
5 Joker bad male DC
6 Catwoman bad female DC
The rows are the same as inner join but it does not keep columns from right data.
left_join
How’s that done?
left_join(heroes,brands)
Joining with `by = join_by(publisher)`
# A tibble: 7 × 5
name alignment gender publisher yr_founded
<chr> <chr> <chr> <chr> <dbl>
1 Magneto bad male Marvel 1939
2 Storm good female Marvel 1939
3 Mystique bad female Marvel 1939
4 Batman good male DC 1934
5 Joker bad male DC 1934
6 Catwoman bad female DC 1934
7 Hellboy good male Dark Horse Comics NA
Keep all rows from the left
right_join
How’s that done?
right_join(heroes,brands)
Joining with `by = join_by(publisher)`
# A tibble: 7 × 5
name alignment gender publisher yr_founded
<chr> <chr> <chr> <chr> <dbl>
1 Magneto bad male Marvel 1939
2 Storm good female Marvel 1939
3 Mystique bad female Marvel 1939
4 Batman good male DC 1934
5 Joker bad male DC 1934
6 Catwoman bad female DC 1934
7 <NA> <NA> <NA> Image 1992
Keep all rows from the right
full_join
How’s that done?
full_join(heroes,brands)
Joining with `by = join_by(publisher)`
# A tibble: 8 × 5
name alignment gender publisher yr_founded
<chr> <chr> <chr> <chr> <dbl>
1 Magneto bad male Marvel 1939
2 Storm good female Marvel 1939
3 Mystique bad female Marvel 1939
4 Batman good male DC 1934
5 Joker bad male DC 1934
6 Catwoman bad female DC 1934
7 Hellboy good male Dark Horse Comics NA
8 <NA> <NA> <NA> Image 1992
Keep everything, all rows and all columns
anti_join
How’s that done?
anti_join(heroes,brands)
Joining with `by = join_by(publisher)`
# A tibble: 1 × 4
name alignment gender publisher
<chr> <chr> <chr> <chr>
1 Hellboy good male Dark Horse Comics
Only keep left rows that do not exist in the right data.
The most important option here is by. Here is an example using it explicitly. We need to use c() to combine the two variable names where each name is specified as character name, with quotes surrounding it.
# A tibble: 1 × 4
name alignment gender publisher
<chr> <chr> <chr> <chr>
1 Hellboy good male Dark Horse Comics
If there were two [not applicable in this case], it would be by = c("publisher" = "publisher", "acronym" = "acronym") and can apply to as many variables as one wishes. The other argument of potential consequence is na_matches. By default, NA values are matched [as are NotaNumber – NaN values]. The option na_matches = "never" prevents this. This is also completely inconsequential in this case but will be shown for completeness.