Visualizing Two Quantitative Variables

R
dataviz
Author

Robert W. Walker

Published

January 25, 2023

Bonds

A dataset for illustrating the various available visualizations needs a certain degree of richness with manageable size. The dataset on Bonds contains three categorical and a few quantitative indicators sufficient to show what we might wish.

Loading the Data

Bonds <- read.csv(url("https://raw.githubusercontent.com/robertwwalker/DADMStuff/master/BondFunds.csv"))

A Summary

library(skimr)
Bonds %>%
    skim()
Data summary
Name Piped data
Number of rows 184
Number of columns 9
_______________________
Column type frequency:
character 4
numeric 5
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
Fund.Number 0 1 4 6 0 184 0
Type 0 1 20 23 0 2 0
Fees 0 1 2 3 0 2 0
Risk 0 1 7 13 0 3 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
Assets 0 1 910.65 2253.27 12.40 113.72 268.4 621.95 18603.50 ▇▁▁▁▁
Expense.Ratio 0 1 0.71 0.26 0.12 0.53 0.7 0.90 1.94 ▂▇▅▁▁
Return.2009 0 1 7.16 6.09 -8.80 3.48 6.4 10.72 32.00 ▁▇▅▁▁
X3.Year.Return 0 1 4.66 2.52 -13.80 4.05 5.1 6.10 9.40 ▁▁▁▅▇
X5.Year.Return 0 1 3.99 1.49 -7.30 3.60 4.3 4.90 6.80 ▁▁▁▅▇

Most data types are represented. There is no time variable so dates and the visualizations that go with time series are omitted.

Data Visualization

The primary visualization that we might use for a combination of two quantities is the scatterplot: geom_point(). I will also make use of facet. Let me work with Assets and Returns, then with Risk.

geom_point()

Bonds %>%
    ggplot() + aes(x = Assets, y = Return.2009) + geom_point()

A little better: theming

I do not like the default theme.

Bonds %>%
    ggplot() + aes(x = Assets, y = Return.2009) + geom_point() + theme_minimal()

Fixing Overplotting

ggplot has an alternative geometry known as geom_jitter() that jitters – adds arbitrarily small amounts to the x and y coordinates.

Bonds %>%
    ggplot() + aes(x = Assets, y = Return.2009) + geom_jitter() + theme_minimal()

The two controls for the jitter are width:

width

Bonds %>%
    ggplot() + aes(x = Assets, y = Return.2009) + geom_jitter(width = 0.25) + theme_minimal()

Bonds %>%
    ggplot() + aes(x = Assets, y = Return.2009) + geom_jitter(height = 0.25) + theme_minimal()

and the height

height ## Some other elements

Since I am going to work with Risk, let’s try that in color.

Bonds %>%
    ggplot() + aes(x = Assets, y = Return.2009, color = Risk) + geom_point() + theme_minimal()

A bubble plot?

Now I will put Risk on x and Returns on y with the size of the bubble determined by Assets. First, without jitter.

Bonds %>%
    ggplot() + aes(x = Risk, y = Return.2009, size = Assets, color = Risk) + geom_point() +
    theme_minimal() + guides(color = "none")

Bonds %>%
    ggplot() + aes(x = Risk, y = Return.2009, color = Risk, size = Assets) + geom_jitter(width = 0.45) +
    theme_minimal() + guides(color = "none")

Deploying Facets

Bonds %>%
    ggplot() + aes(x = Assets, y = Return.2009, color = Risk, facet = Risk) + geom_point() +
    theme_minimal() + guides(color = "none") + facet_wrap(~Risk)