<- read.csv(url("https://raw.githubusercontent.com/robertwwalker/DADMStuff/master/BondFunds.csv")) Bonds
Bonds
A dataset for illustrating the various available visualizations needs a certain degree of richness with manageable size. The dataset on Bonds contains three categorical and a few quantitative indicators sufficient to show what we might wish.
Loading the Data
A Summary
library(skimr)
%>%
Bonds skim()
Name | Piped data |
Number of rows | 184 |
Number of columns | 9 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Fund.Number | 0 | 1 | 4 | 6 | 0 | 184 | 0 |
Type | 0 | 1 | 20 | 23 | 0 | 2 | 0 |
Fees | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
Risk | 0 | 1 | 7 | 13 | 0 | 3 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Assets | 0 | 1 | 910.65 | 2253.27 | 12.40 | 113.72 | 268.4 | 621.95 | 18603.50 | ▇▁▁▁▁ |
Expense.Ratio | 0 | 1 | 0.71 | 0.26 | 0.12 | 0.53 | 0.7 | 0.90 | 1.94 | ▂▇▅▁▁ |
Return.2009 | 0 | 1 | 7.16 | 6.09 | -8.80 | 3.48 | 6.4 | 10.72 | 32.00 | ▁▇▅▁▁ |
X3.Year.Return | 0 | 1 | 4.66 | 2.52 | -13.80 | 4.05 | 5.1 | 6.10 | 9.40 | ▁▁▁▅▇ |
X5.Year.Return | 0 | 1 | 3.99 | 1.49 | -7.30 | 3.60 | 4.3 | 4.90 | 6.80 | ▁▁▁▅▇ |
Most data types are represented. There is no time variable so dates and the visualizations that go with time series are omitted.
Data Visualization
First, let us look at visualizations for one variable.
Bar plots and column plots
There are two ways to construct a barplot; we can let ggplot handle it on the raw data or calculate it ourselves. Let me focus on Risk.
geom_bar()
%>%
Bonds ggplot() + aes(x = Risk) + geom_bar()
Raw Data Bar Plot [color]
%>%
Bonds ggplot() + aes(x = Risk, color = Risk) + geom_bar()
Raw Data Bar Plot [color and fill]
We could color it.
%>%
Bonds ggplot() + aes(x = Risk, color = Risk) + geom_bar(fill = "white") + guides(color = FALSE)
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
Raw Data Bar Plot [Fill]
We can fill the shapes.
# guides(fill=FALSE) removes the legend
%>%
Bonds ggplot(., aes(x = Risk, fill = Risk)) + geom_bar() + guides(fill = FALSE)
geom_bar()
meets fill
We can also deploy fill
but x is no longer the axis; the axis is some constant value with frequencies filled by the fill. This will require some prettying.
A Cumulative Bar Plot
<- Bonds %>%
Basic.Bar ggplot(., aes(x = "", fill = Risk)) + geom_bar()
Basic.Bar
The prettying will require that I eliminate the x axis [set it to empty], include a theme, and give it proper labels.
Enhanced Cumulative Bar Plot
%>%
Bonds ggplot(., aes(x = "", fill = Risk)) + geom_bar() + labs(x = "", y = "Number of Funds") +
theme_minimal() + theme(axis.text.x = element_blank())
Proportion Bar Plot
%>%
Bonds ggplot(., aes(x = "", fill = Risk)) + geom_bar(position = "fill") + labs(x = "",
y = "Proportion of Funds")
The prettying will require that I eliminate the x axis [set it to empty], include a theme, and give it proper labels.
Enhanced Proportion Bar Plot
%>%
Bonds ggplot(., aes(x = "", fill = Risk)) + geom_bar(position = "fill") + labs(x = "",
y = "Propotion of Funds") + theme_minimal()
geom_col()
<- table(Bonds$Risk) %>%
Risk.Table data.frame()
%>%
Risk.Table ggplot(., aes(x = Var1, y = Freq)) + geom_col()
Beautifying geom_col()
Now it really needs some beautification.
%>%
Risk.Table ggplot(., aes(x = Var1, y = Freq, fill = Var1)) + geom_col() + labs(x = "Risk Levels",
y = "Number of Funds") + theme_minimal() + theme(axis.text.x = element_blank()) +
scale_fill_viridis_d() + guides(fill = FALSE)
position = "fill"
The two commands are symmetric in the sense that x as axis always splits it into multiple parts. fill
will prove very useful with a two dimensional table.
%>%
Risk.Table ggplot(., aes(x = 1, y = Freq, fill = Var1)) + geom_col(position = "fill") +
labs(x = "Risk Levels", y = "Number of Funds") + theme_minimal() + theme(axis.text.x = element_blank()) +
scale_fill_viridis_d() + guides(fill = FALSE)
A lollipop chart
A lollipop chart is a combination of two geometries. It is a basic scatterplot combining one qualitative variable and the quantitative count of the number of observations. The head of the lollipop is a point while there is an accompanying line segment from (x,0) to (x,Freq) where Freq is the default name for a count from a table.
Basic Lollipop Chart
%>%
Risk.Table ggplot(., aes(x = Var1, y = Freq, color = Var1)) + geom_point(size = 6) + labs(x = "Risk Level",
y = "Number of Funds", color = "Risk Level") + geom_segment(aes(xend = Var1,
y = 0, yend = Freq)) + theme_minimal()
Slicked Lollipop Chart by Adjusting Segment Size
%>%
Risk.Table ggplot(., aes(x = Var1, y = Freq, color = Var1)) + geom_point(size = 6) + labs(x = "Risk Levels",
y = "Number of Funds") + geom_segment(aes(xend = Var1, y = 0, yend = Freq), size = 1.5) +
theme_minimal() + guides(color = FALSE)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
%>%
Risk.Table ggplot(., aes(x = Var1, y = Freq, color = Var1)) + geom_point(size = 6) + labs(x = "Risk Levels",
y = "Number of Funds") + geom_segment(aes(xend = Var1, y = 0, yend = Freq)) +
theme_minimal() + scale_color_viridis_d() + guides(color = FALSE) + coord_flip()
A Lollipop Table [geom_label()]
Now I will switch up the points to be the actual values as text. For this, I use the geom_text
aesthetic that requires a label
to be assigned. I also want to put down the lines before the text to avoid overlap.
%>%
Risk.Table ggplot(., aes(x = Var1, y = Freq, color = Var1, label = Freq)) + labs(x = "Risk Levels",
y = "Number of Funds") + geom_segment(aes(xend = Var1, y = 0, yend = Freq)) +
geom_label(size = 6) + theme_minimal() + scale_color_viridis_d() + guides(color = FALSE) +
coord_flip()
A Lollipop Table [geom_text() inverse]
The ggplot is built in layers so the segment before the label makes sure that the white shows up. The fill and a discrete color are combined to create this graphical table.
%>%
Risk.Table ggplot(., aes(x = Var1, y = Freq, color = Var1, fill = Var1, label = Freq)) +
geom_segment(aes(xend = Var1, y = 0, yend = Freq), size = 1.5) + geom_label(size = 6,
color = "white") + labs(x = "Risk Levels", y = "Number of Funds") + theme_minimal() +
scale_color_viridis_d() + scale_fill_viridis_d() + guides(fill = FALSE, color = FALSE) +
coord_flip()
I HATE PIE CHARTS
A pie chart is fairly easy to do. Let’s go back and show something that I find pretty amazing. A pie chart is a bar chart [the fill variety] with coordinates that fill a circle rather than a square. We take the most basic bar plot – Basic.Bar – and add three things: new coordinates that are polar, labels, and a blank theme to eliminate axis labels.
+ coord_polar("y", start = 0) + labs(x = "", y = "") + theme_void() Basic.Bar