<- read.csv(url("https://raw.githubusercontent.com/robertwwalker/DADMStuff/master/BondFunds.csv")) Bonds
Bonds
A dataset for illustrating the various available visualizations needs a certain degree of richness with manageable size. The dataset on Bonds contains three categorical and a few quantitative indicators sufficient to show what we might wish.
Loading the Data
A Summary
library(skimr)
%>%
Bonds skim()
Name | Piped data |
Number of rows | 184 |
Number of columns | 9 |
_______________________ | |
Column type frequency: | |
character | 4 |
numeric | 5 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
Fund.Number | 0 | 1 | 4 | 6 | 0 | 184 | 0 |
Type | 0 | 1 | 20 | 23 | 0 | 2 | 0 |
Fees | 0 | 1 | 2 | 3 | 0 | 2 | 0 |
Risk | 0 | 1 | 7 | 13 | 0 | 3 | 0 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
---|---|---|---|---|---|---|---|---|---|---|
Assets | 0 | 1 | 910.65 | 2253.27 | 12.40 | 113.72 | 268.4 | 621.95 | 18603.50 | ▇▁▁▁▁ |
Expense.Ratio | 0 | 1 | 0.71 | 0.26 | 0.12 | 0.53 | 0.7 | 0.90 | 1.94 | ▂▇▅▁▁ |
Return.2009 | 0 | 1 | 7.16 | 6.09 | -8.80 | 3.48 | 6.4 | 10.72 | 32.00 | ▁▇▅▁▁ |
X3.Year.Return | 0 | 1 | 4.66 | 2.52 | -13.80 | 4.05 | 5.1 | 6.10 | 9.40 | ▁▁▁▅▇ |
X5.Year.Return | 0 | 1 | 3.99 | 1.49 | -7.30 | 3.60 | 4.3 | 4.90 | 6.80 | ▁▁▁▅▇ |
Most data types are represented. There is no time variable so dates and the visualizations that go with time series are omitted.
Data Visualization
There are three primary visualizations that we might use for the combination of one qualitative and one quantitative indicator. I will deploy boxplots, violin plots, overlaid densities, and dotplots.
geom_boxplot()
This will construct a boxplot of quantitative y for each value of the qualitative variable placed on x. This is very hard to read because of the extrema.
%>%
Bonds ggplot(., aes(x = Risk, y = Assets)) + geom_boxplot()
geom_dotplot()
The number of bins, the axis they are placed on, and the size of the dots are core to dotplots. It is often simply a matter of trial and error.
%>%
Bonds ggplot(., aes(x = Risk, y = Assets, color = Risk)) + geom_dotplot(binaxis = "y",
bins = 50, dotsize = 0.3)
Warning in geom_dotplot(binaxis = "y", bins = 50, dotsize = 0.3): Ignoring
unknown parameters: `bins`
Bin width defaults to 1/30 of the range of the data. Pick better value with
`binwidth`.
Improved
%>%
Bonds ggplot(., aes(x = Risk, y = Assets, color = Risk)) + geom_dotplot(binaxis = "y",
bins = 50, dotsize = 0.3) + guides(color = FALSE) + theme_minimal()
Warning in geom_dotplot(binaxis = "y", bins = 50, dotsize = 0.3): Ignoring
unknown parameters: `bins`
Warning: The `<scale>` argument of `guides()` cannot be `FALSE`. Use "none" instead as
of ggplot2 3.3.4.
Bin width defaults to 1/30 of the range of the data. Pick better value with
`binwidth`.
geom_violin()
The most basic violin plot shows a two-sided density plot for each value of x. By default, all violins have the same area.
%>%
Bonds ggplot(., aes(x = Risk, y = Assets)) + geom_violin()
Adjusting the area: scale=count
We can adjust the violins to have area proportional to the count of observations by deploying the scale argument set equal to count.
%>%
Bonds ggplot(., aes(x = Risk, y = Assets)) + geom_violin(scale = "count")
Adjusting the area: scale=width
We can also adjust the violins to have equal width.
%>%
Bonds ggplot(., aes(x = Risk, y = Assets)) + geom_violin(scale = "width")
Adjusting the bandwidth
We can also make the violins smoother or more rigid with the adjust=
argument. Numbers greater than one make it smoother.
%>%
Bonds ggplot(., aes(x = Risk, y = Assets)) + geom_violin(scale = "count", adjust = 2)
Numbers less than one make it less smooth.
%>%
Bonds ggplot(., aes(x = Risk, y = Assets)) + geom_violin(scale = "count", adjust = 1/2)
geom_density()
with color/fill
We can color the lines of a density plot to try and showcase the various distributions.
%>%
Bonds ggplot(., aes(x = Assets, color = Risk)) + geom_density()
We fill the shape of density plot to try and showcase the various distributions.
%>%
Bonds ggplot(., aes(x = Assets, fill = Risk)) + geom_density()
This almost always needs lightening.
%>%
Bonds ggplot(., aes(x = Assets, fill = Risk)) + geom_density(alpha = 0.2)
In this case, it helps to remove the outline.
%>%
Bonds ggplot(., aes(x = Assets, fill = Risk)) + geom_density(alpha = 0.2, color = NA)
geom_beeswarm()
Related to the dotplot is the beeswarm. It requires installing a package with the geometry known as ggbeeswarm.
install.packages("ggbeeswarm")
library(ggbeeswarm)
%>%
Bonds ggplot() + aes(x = Risk, y = Assets) + geom_beeswarm()
library(ggbeeswarm)
%>%
Bonds ggplot() + aes(x = Risk, y = Assets, color = Risk) + geom_beeswarm() + guides(color = FALSE)