Gemini on Models

Author

Robert W. Walker

Published

January 14, 2026

Bond Funds Analysis: Complete Conversation Log

This document contains the complete analytical workflow generated during our session, formatted for use in Quarto.

1. Initial Exploration: Asset Distribution

We began by examining the distribution of assets across the funds using a histogram with density overlays.

How’s that done?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde

# Load the dataset
df = pd.read_csv('./BondFunds.csv')
assets = df['Assets'].dropna()

# Define bin parameters
bin_size = 100
bins = np.arange(0, assets.max() + bin_size, bin_size)

# Create the plot
plt.figure(figsize=(12, 7))

# 1. Histogram
plt.hist(assets, bins=bins, density=True, color='skyblue', edgecolor='black', alpha=0.5, label='Histogram (Bins=100m)')

# Generate points for the KDE lines
x_eval = np.linspace(0, assets.max(), 1000)

# 2. Density with Default Bandwidth
kde_default = gaussian_kde(assets)
plt.plot(x_eval, kde_default(x_eval), color='blue', lw=2, label='Density (Default BW)')

# 3. Density with Bandwidth = 2
kde_bw2 = gaussian_kde(assets, bw_method=2.0)
plt.plot(x_eval, kde_bw2(x_eval), color='red', lw=2, linestyle='--', label='Density (BW Factor = 2)')

plt.title('Histogram of Assets with Density Overlays')
plt.xlabel('Assets (in millions)')
plt.ylabel('Density')
plt.legend()
plt.tight_layout()
plt.show()

2. Advanced Distribution Visualization: Boxenplots

To visualize the distribution of assets by fund type, specifically focusing on tail behavior, we used boxenplots.

2.1 Standard Scale

How’s that done?
import seaborn as sns

plt.figure(figsize=(10, 6))
sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')

plt.title('Boxenplot of Assets by Fund Type')
plt.xlabel('Fund Type')
plt.ylabel('Assets (in millions)')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

2.2 Logarithmic Scale

Recognizing the skew in the data, we reconstructed the boxenplot on a logarithmic scale.

How’s that done?
plt.figure(figsize=(10, 6))
ax = sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')
<string>:1: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
How’s that done?
ax.set_yscale("log")

plt.title('Boxenplot of Assets by Fund Type (Logarithmic Scale)')
plt.xlabel('Fund Type')
plt.ylabel('Assets (in millions, Log Scale)')
plt.grid(axis='y', which='both', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

3. Performance Analysis: 2009 Returns

We then shifted focus to the returns in 2009.

3.1 Boxenplot of Returns by Type

How’s that done?
plt.figure(figsize=(10, 6))
sns.boxenplot(x='Type', y='Return 2009', data=df, palette='muted')

plt.title('Boxenplot of 2009 Returns by Fund Type')
plt.xlabel('Fund Type')
plt.ylabel('Return in 2009 (%)')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

3.2 Violin Plot by Risk and Type

We introduced Risk into the analysis, visualizing returns with split violin plots.

How’s that done?
# Define categorical order for Risk
risk_order = ['Below average', 'Average', 'Above average']
# Ensure Risk column is categorical with order
if 'Risk' in df.columns:
    unique_risks = df['Risk'].unique()
    # Filter risk_order to only include present categories
    present_risks = [r for r in risk_order if r in unique_risks]
    # Add any missing risks to the end if necessary, or just use present_risks
    # This logic handles potential data mismatch
    df['Risk'] = pd.Categorical(df['Risk'], categories=risk_order, ordered=True)

plt.figure(figsize=(12, 7))
sns.violinplot(data=df, x='Risk', y='Return 2009', hue='Type', split=True, inner="quart")

plt.title('2009 Returns by Risk Level and Fund Type')
plt.xlabel('Risk Level')
plt.ylabel('Return 2009 (%)')
plt.legend(title='Fund Type')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

4. Horizontal Raincloud Plots

We refined the visualization into “Raincloud” plots to show density, raw data, and summary statistics simultaneously, oriented horizontally.

4.1 Bandwidth = 0.5 (Smoother)

How’s that done?
import numpy as np
from scipy.stats import gaussian_kde

types = sorted(df['Type'].unique())
colors = sns.color_palette("Set2", len(types))

fig, ax = plt.subplots(figsize=(12, 8))
type_offset = 0.4

for i, risk in enumerate(risk_order):
    for j, fund_type in enumerate(types):
        mask = (df['Risk'] == risk) & (df['Type'] == fund_type)
        if not mask.any(): continue
        data = df[mask]['Return 2009'].dropna()
        if len(data) < 2:
            continue
            
        # KDE with bandwidth 0.5
        kde = gaussian_kde(data, bw_method=0.5)
        x_range = np.linspace(df['Return 2009'].min() - 2, df['Return 2009'].max() + 2, 500)
        y_kde = (kde(x_range) / kde(x_range).max()) * 0.35
        baseline = i - (j * type_offset)
        
        # Plot Cloud
        ax.fill_between(x_range, baseline, baseline + y_kde, color=colors[j], alpha=0.5, label=fund_type if i == 0 else "")
        ax.plot(x_range, baseline + y_kde, color=colors[j], lw=1.5, alpha=0.8)
        # Plot Rain
        jitter = np.random.uniform(-0.12, -0.05, size=len(data))
        ax.scatter(data, baseline + jitter, color=colors[j], s=12, alpha=0.4, edgecolors='none')
        # Plot Box
        ax.boxplot(data, positions=[baseline - 0.22], vert=False, widths=0.1, patch_artist=True, showfliers=False, medianprops=dict(color='black', lw=1.5))
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a8882d0>
[<matplotlib.lines.Line2D object at 0x12a7e67b0>]
<matplotlib.collections.PathCollection object at 0x12a888410>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a7e6a50>, <matplotlib.lines.Line2D object at 0x12a7e6ba0>], 'caps': [<matplotlib.lines.Line2D object at 0x12a7e6cf0>, <matplotlib.lines.Line2D object at 0x12a7e6e40>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a7e6510>], 'medians': [<matplotlib.lines.Line2D object at 0x12a7e6f90>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a888550>
[<matplotlib.lines.Line2D object at 0x12a7e7380>]
<matplotlib.collections.PathCollection object at 0x12a888690>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a7e74d0>, <matplotlib.lines.Line2D object at 0x12a7e7620>], 'caps': [<matplotlib.lines.Line2D object at 0x12a7e7770>, <matplotlib.lines.Line2D object at 0x12a7e78c0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a888910>], 'medians': [<matplotlib.lines.Line2D object at 0x12a7e7a10>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a8887d0>
[<matplotlib.lines.Line2D object at 0x12a7e7b60>]
<matplotlib.collections.PathCollection object at 0x12a888a50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a68c440>, <matplotlib.lines.Line2D object at 0x12a68c590>], 'caps': [<matplotlib.lines.Line2D object at 0x12a68c1a0>, <matplotlib.lines.Line2D object at 0x12a68c050>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a888cd0>], 'medians': [<matplotlib.lines.Line2D object at 0x12a43be00>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a76e0d0>
[<matplotlib.lines.Line2D object at 0x12a43ba10>]
<matplotlib.collections.PathCollection object at 0x12a76df90>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a43bb60>, <matplotlib.lines.Line2D object at 0x12a43bcb0>], 'caps': [<matplotlib.lines.Line2D object at 0x12a43b8c0>, <matplotlib.lines.Line2D object at 0x12a43b770>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a76d590>], 'medians': [<matplotlib.lines.Line2D object at 0x12a43b620>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a76dd10>
[<matplotlib.lines.Line2D object at 0x12a43b4d0>]
<matplotlib.collections.PathCollection object at 0x12a76d310>
{'whiskers': [<matplotlib.lines.Line2D object at 0x128d68ad0>, <matplotlib.lines.Line2D object at 0x129fd92b0>], 'caps': [<matplotlib.lines.Line2D object at 0x129fd9400>, <matplotlib.lines.Line2D object at 0x129b2cd70>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a76de50>], 'medians': [<matplotlib.lines.Line2D object at 0x129b2d010>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a76d450>
[<matplotlib.lines.Line2D object at 0x129b2d160>]
<matplotlib.collections.PathCollection object at 0x12a76d6d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x129b2d400>, <matplotlib.lines.Line2D object at 0x12a7e70e0>], 'caps': [<matplotlib.lines.Line2D object at 0x12a7e7230>, <matplotlib.lines.Line2D object at 0x12a7e7cb0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a76dbd0>], 'medians': [<matplotlib.lines.Line2D object at 0x12a7e7e00>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i in range(len(risk_order))])
ax.set_yticklabels(risk_order)
ax.set_xlabel('Return 2009 (%)')
ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.5)')
ax.legend(title="Fund Type", loc='upper right')
plt.tight_layout()
plt.show()

4.2 Bandwidth = 0.25 (More Sensitive)

How’s that done?
fig, ax = plt.subplots(figsize=(12, 8))

for i, risk in enumerate(risk_order):
    for j, fund_type in enumerate(types):
        mask = (df['Risk'] == risk) & (df['Type'] == fund_type)
        if not mask.any(): continue
        data = df[mask]['Return 2009'].dropna()
        if len(data) < 2:
            continue
            
        # KDE with bandwidth 0.25
        kde = gaussian_kde(data, bw_method=0.25)
        x_range = np.linspace(df['Return 2009'].min() - 2, df['Return 2009'].max() + 2, 500)
        y_kde = (kde(x_range) / kde(x_range).max()) * 0.35
        baseline = i - (j * type_offset)
        
        ax.fill_between(x_range, baseline, baseline + y_kde, color=colors[j], alpha=0.5, label=fund_type if i == 0 else "")
        ax.plot(x_range, baseline + y_kde, color=colors[j], lw=1.5, alpha=0.8)
        jitter = np.random.uniform(-0.12, -0.05, size=len(data))
        ax.scatter(data, baseline + jitter, color=colors[j], s=12, alpha=0.4, edgecolors='none')
        ax.boxplot(data, positions=[baseline - 0.22], vert=False, widths=0.1, patch_artist=True, showfliers=False, medianprops=dict(color='black', lw=1.5))
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a889810>
[<matplotlib.lines.Line2D object at 0x12a862e40>]
<matplotlib.collections.PathCollection object at 0x12a889590>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a862f90>, <matplotlib.lines.Line2D object at 0x12a8630e0>], 'caps': [<matplotlib.lines.Line2D object at 0x12a863230>, <matplotlib.lines.Line2D object at 0x12a863380>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a889a90>], 'medians': [<matplotlib.lines.Line2D object at 0x12a8634d0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a889950>
[<matplotlib.lines.Line2D object at 0x12a863620>]
<matplotlib.collections.PathCollection object at 0x12a889e50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a863770>, <matplotlib.lines.Line2D object at 0x12a8638c0>], 'caps': [<matplotlib.lines.Line2D object at 0x12a863a10>, <matplotlib.lines.Line2D object at 0x12a863b60>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a88a0d0>], 'medians': [<matplotlib.lines.Line2D object at 0x12a863cb0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a889f90>
[<matplotlib.lines.Line2D object at 0x12a863e00>]
<matplotlib.collections.PathCollection object at 0x12a88a210>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a694050>, <matplotlib.lines.Line2D object at 0x12a6941a0>], 'caps': [<matplotlib.lines.Line2D object at 0x12a6942f0>, <matplotlib.lines.Line2D object at 0x12a694440>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a88a490>], 'medians': [<matplotlib.lines.Line2D object at 0x12a694590>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a88a350>
[<matplotlib.lines.Line2D object at 0x12a6946e0>]
<matplotlib.collections.PathCollection object at 0x12a88a5d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a694830>, <matplotlib.lines.Line2D object at 0x12a694980>], 'caps': [<matplotlib.lines.Line2D object at 0x12a694ad0>, <matplotlib.lines.Line2D object at 0x12a694c20>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a88a850>], 'medians': [<matplotlib.lines.Line2D object at 0x12a694d70>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a88a710>
[<matplotlib.lines.Line2D object at 0x12a694ec0>]
<matplotlib.collections.PathCollection object at 0x12a88aad0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a695010>, <matplotlib.lines.Line2D object at 0x12a695160>], 'caps': [<matplotlib.lines.Line2D object at 0x12a6952b0>, <matplotlib.lines.Line2D object at 0x12a695400>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a88ad50>], 'medians': [<matplotlib.lines.Line2D object at 0x12a695550>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12a88ac10>
[<matplotlib.lines.Line2D object at 0x12a6956a0>]
<matplotlib.collections.PathCollection object at 0x12a88a990>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12a6957f0>, <matplotlib.lines.Line2D object at 0x12a695940>], 'caps': [<matplotlib.lines.Line2D object at 0x12a695a90>, <matplotlib.lines.Line2D object at 0x12a695be0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12a88afd0>], 'medians': [<matplotlib.lines.Line2D object at 0x12a695d30>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i in range(len(risk_order))])
ax.set_yticklabels(risk_order)
ax.set_xlabel('Return 2009 (%)')
ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.25)')
ax.legend(title="Fund Type", loc='upper right')
plt.tight_layout()
plt.show()

5. Scatterplot Analysis

We concluded by looking at relationships between variables.

5.1 Returns vs. Assets

How’s that done?
plt.figure(figsize=(12, 7))
sns.scatterplot(data=df, x='Assets', y='Return 2009', hue='Type', alpha=0.7)

plt.title('Scatterplot of 2009 Returns by Assets')
plt.xlabel('Assets (in millions)')
plt.ylabel('Return 2009 (%)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title='Fund Type')
plt.tight_layout()
plt.show()

5.2 Returns vs. Expense Ratio (Custom Symbols)

Here we used custom markers to denote Fees (Checkmark vs X) and color for Risk.

How’s that done?
# Define custom markers: Check for 'Yes', X for 'No'
markers = {"Yes": r'$\checkmark$', "No": "X"}

plt.figure(figsize=(12, 8))
sns.scatterplot(
    data=df,
    x='Expense Ratio',
    y='Return 2009',
    hue='Risk',
    style='Fees',
    markers=markers,
    s=150,
    alpha=0.8
)

plt.title('2009 Returns vs. Expense Ratio (Risk Color-coded, Fees by Symbol)')
plt.xlabel('Expense Ratio (%)')
plt.ylabel('Return 2009 (%)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Risk / Fees")
plt.tight_layout()
plt.show()

5.3 Interactive Plotly Scatterplot

Finally, here is the code to generate an interactive version of the previous plot using Plotly.

How’s that done?
import plotly.express as px

fig = px.scatter(
    df, 
    x='Expense Ratio', 
    y='Return 2009', 
    color='Risk', 
    symbol='Fees',
    symbol_map={'Yes': 'star', 'No': 'x'},
    category_orders={'Risk': risk_order},
    title='Interactive: 2009 Returns vs. Expense Ratio',
    labels={'Expense Ratio': 'Expense Ratio (%)', 'Return 2009': 'Return 2009 (%)'},
    hover_data=['Fund Number', 'Type', 'Assets'],
    template='plotly_white'
)
fig.show()

Bond Funds Analysis: Complete Conversation Log

This document contains the complete analytical workflow generated during our session, formatted for use in Quarto.

1. Initial Exploration: Asset Distribution

We began by examining the distribution of assets across the funds using a histogram with density overlays.

How’s that done?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde

# Load the dataset
df = pd.read_csv('./BondFunds.csv')
assets = df['Assets'].dropna()

# Define bin parameters
bin_size = 100
bins = np.arange(0, assets.max() + bin_size, bin_size)

# Create the plot
plt.figure(figsize=(12, 7))

# 1. Histogram
plt.hist(assets, bins=bins, density=True, color='skyblue', edgecolor='black', alpha=0.5, label='Histogram (Bins=100m)')

# Generate points for the KDE lines
x_eval = np.linspace(0, assets.max(), 1000)

# 2. Density with Default Bandwidth
kde_default = gaussian_kde(assets)
plt.plot(x_eval, kde_default(x_eval), color='blue', lw=2, label='Density (Default BW)')

# 3. Density with Bandwidth = 2
kde_bw2 = gaussian_kde(assets, bw_method=2.0)
plt.plot(x_eval, kde_bw2(x_eval), color='red', lw=2, linestyle='--', label='Density (BW Factor = 2)')

plt.title('Histogram of Assets with Density Overlays')
plt.xlabel('Assets (in millions)')
plt.ylabel('Density')
plt.legend()
plt.tight_layout()
plt.show()

2. Advanced Distribution Visualization: Boxenplots

To visualize the distribution of assets by fund type, specifically focusing on tail behavior, we used boxenplots.

2.1 Standard Scale

How’s that done?
import seaborn as sns

plt.figure(figsize=(10, 6))
sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')

plt.title('Boxenplot of Assets by Fund Type')
plt.xlabel('Fund Type')
plt.ylabel('Assets (in millions)')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

2.2 Logarithmic Scale

Recognizing the skew in the data, we reconstructed the boxenplot on a logarithmic scale.

How’s that done?
plt.figure(figsize=(10, 6))
ax = sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')
<string>:1: FutureWarning:



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
How’s that done?
ax.set_yscale("log")

plt.title('Boxenplot of Assets by Fund Type (Logarithmic Scale)')
plt.xlabel('Fund Type')
plt.ylabel('Assets (in millions, Log Scale)')
plt.grid(axis='y', which='both', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

3. Performance Analysis: 2009 Returns

We then shifted focus to the returns in 2009.

3.1 Boxenplot of Returns by Type

How’s that done?
plt.figure(figsize=(10, 6))
sns.boxenplot(x='Type', y='Return 2009', data=df, palette='muted')

plt.title('Boxenplot of 2009 Returns by Fund Type')
plt.xlabel('Fund Type')
plt.ylabel('Return in 2009 (%)')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

3.2 Violin Plot by Risk and Type

We introduced Risk into the analysis, visualizing returns with split violin plots.

How’s that done?
# Define categorical order for Risk
risk_order = ['Below average', 'Average', 'Above average']
# Ensure Risk column is categorical with order
if 'Risk' in df.columns:
    unique_risks = df['Risk'].unique()
    # Filter risk_order to only include present categories
    present_risks = [r for r in risk_order if r in unique_risks]
    # Add any missing risks to the end if necessary, or just use present_risks
    # This logic handles potential data mismatch
    df['Risk'] = pd.Categorical(df['Risk'], categories=risk_order, ordered=True)

plt.figure(figsize=(12, 7))
sns.violinplot(data=df, x='Risk', y='Return 2009', hue='Type', split=True, inner="quart")

plt.title('2009 Returns by Risk Level and Fund Type')
plt.xlabel('Risk Level')
plt.ylabel('Return 2009 (%)')
plt.legend(title='Fund Type')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

4. Horizontal Raincloud Plots

We refined the visualization into “Raincloud” plots to show density, raw data, and summary statistics simultaneously, oriented horizontally.

4.1 Bandwidth = 0.5 (Smoother)

How’s that done?
import numpy as np
from scipy.stats import gaussian_kde

types = sorted(df['Type'].unique())
colors = sns.color_palette("Set2", len(types))

fig, ax = plt.subplots(figsize=(12, 8))
type_offset = 0.4

for i, risk in enumerate(risk_order):
    for j, fund_type in enumerate(types):
        mask = (df['Risk'] == risk) & (df['Type'] == fund_type)
        if not mask.any(): continue
        data = df[mask]['Return 2009'].dropna()
        if len(data) < 2:
            continue
            
        # KDE with bandwidth 0.5
        kde = gaussian_kde(data, bw_method=0.5)
        x_range = np.linspace(df['Return 2009'].min() - 2, df['Return 2009'].max() + 2, 500)
        y_kde = (kde(x_range) / kde(x_range).max()) * 0.35
        baseline = i - (j * type_offset)
        
        # Plot Cloud
        ax.fill_between(x_range, baseline, baseline + y_kde, color=colors[j], alpha=0.5, label=fund_type if i == 0 else "")
        ax.plot(x_range, baseline + y_kde, color=colors[j], lw=1.5, alpha=0.8)
        # Plot Rain
        jitter = np.random.uniform(-0.12, -0.05, size=len(data))
        ax.scatter(data, baseline + jitter, color=colors[j], s=12, alpha=0.4, edgecolors='none')
        # Plot Box
        ax.boxplot(data, positions=[baseline - 0.22], vert=False, widths=0.1, patch_artist=True, showfliers=False, medianprops=dict(color='black', lw=1.5))
<matplotlib.collections.FillBetweenPolyCollection object at 0x12d8af890>
[<matplotlib.lines.Line2D object at 0x12d9192b0>]
<matplotlib.collections.PathCollection object at 0x12d8af9d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12d919400>, <matplotlib.lines.Line2D object at 0x12d919550>], 'caps': [<matplotlib.lines.Line2D object at 0x12d9196a0>, <matplotlib.lines.Line2D object at 0x12d9197f0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12d8afc50>], 'medians': [<matplotlib.lines.Line2D object at 0x12d919940>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12d8afb10>
[<matplotlib.lines.Line2D object at 0x12d919a90>]
<matplotlib.collections.PathCollection object at 0x12d8afed0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12d919be0>, <matplotlib.lines.Line2D object at 0x12d919d30>], 'caps': [<matplotlib.lines.Line2D object at 0x12d919e80>, <matplotlib.lines.Line2D object at 0x12d919fd0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12d968190>], 'medians': [<matplotlib.lines.Line2D object at 0x12d91a120>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12d968050>
[<matplotlib.lines.Line2D object at 0x12d91a270>]
<matplotlib.collections.PathCollection object at 0x12d9682d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12d91a3c0>, <matplotlib.lines.Line2D object at 0x12d91a510>], 'caps': [<matplotlib.lines.Line2D object at 0x12d91a660>, <matplotlib.lines.Line2D object at 0x12d91a7b0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12d968550>], 'medians': [<matplotlib.lines.Line2D object at 0x12d91a900>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12d968410>
[<matplotlib.lines.Line2D object at 0x12d91aa50>]
<matplotlib.collections.PathCollection object at 0x12d968690>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12d91aba0>, <matplotlib.lines.Line2D object at 0x12d91acf0>], 'caps': [<matplotlib.lines.Line2D object at 0x12d91ae40>, <matplotlib.lines.Line2D object at 0x12d91af90>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12d968910>], 'medians': [<matplotlib.lines.Line2D object at 0x12d91b0e0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12d9687d0>
[<matplotlib.lines.Line2D object at 0x12d91b230>]
<matplotlib.collections.PathCollection object at 0x12d968b90>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12d91b380>, <matplotlib.lines.Line2D object at 0x12d91b4d0>], 'caps': [<matplotlib.lines.Line2D object at 0x12d91b620>, <matplotlib.lines.Line2D object at 0x12d91b770>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12d968e10>], 'medians': [<matplotlib.lines.Line2D object at 0x12d91b8c0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12d968cd0>
[<matplotlib.lines.Line2D object at 0x12d91ba10>]
<matplotlib.collections.PathCollection object at 0x12d968a50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12d91bb60>, <matplotlib.lines.Line2D object at 0x12d91bcb0>], 'caps': [<matplotlib.lines.Line2D object at 0x12d91be00>, <matplotlib.lines.Line2D object at 0x12d9d8050>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12d969090>], 'medians': [<matplotlib.lines.Line2D object at 0x12d9d81a0>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i in range(len(risk_order))])
ax.set_yticklabels(risk_order)
ax.set_xlabel('Return 2009 (%)')
ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.5)')
ax.legend(title="Fund Type", loc='upper right')
plt.tight_layout()
plt.show()

4.2 Bandwidth = 0.25 (More Sensitive)

How’s that done?
fig, ax = plt.subplots(figsize=(12, 8))

for i, risk in enumerate(risk_order):
    for j, fund_type in enumerate(types):
        mask = (df['Risk'] == risk) & (df['Type'] == fund_type)
        if not mask.any(): continue
        data = df[mask]['Return 2009'].dropna()
        if len(data) < 2:
            continue
            
        # KDE with bandwidth 0.25
        kde = gaussian_kde(data, bw_method=0.25)
        x_range = np.linspace(df['Return 2009'].min() - 2, df['Return 2009'].max() + 2, 500)
        y_kde = (kde(x_range) / kde(x_range).max()) * 0.35
        baseline = i - (j * type_offset)
        
        ax.fill_between(x_range, baseline, baseline + y_kde, color=colors[j], alpha=0.5, label=fund_type if i == 0 else "")
        ax.plot(x_range, baseline + y_kde, color=colors[j], lw=1.5, alpha=0.8)
        jitter = np.random.uniform(-0.12, -0.05, size=len(data))
        ax.scatter(data, baseline + jitter, color=colors[j], s=12, alpha=0.4, edgecolors='none')
        ax.boxplot(data, positions=[baseline - 0.22], vert=False, widths=0.1, patch_artist=True, showfliers=False, medianprops=dict(color='black', lw=1.5))
<matplotlib.collections.FillBetweenPolyCollection object at 0x12da2a850>
[<matplotlib.lines.Line2D object at 0x12d9db380>]
<matplotlib.collections.PathCollection object at 0x12da2a5d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12d9db4d0>, <matplotlib.lines.Line2D object at 0x12d9db620>], 'caps': [<matplotlib.lines.Line2D object at 0x12d9db770>, <matplotlib.lines.Line2D object at 0x12d9db8c0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12da2aad0>], 'medians': [<matplotlib.lines.Line2D object at 0x12d9dba10>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12da2a990>
[<matplotlib.lines.Line2D object at 0x12d9dbb60>]
<matplotlib.collections.PathCollection object at 0x12da2ac10>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12d9dbcb0>, <matplotlib.lines.Line2D object at 0x12d9dbe00>], 'caps': [<matplotlib.lines.Line2D object at 0x12dac0050>, <matplotlib.lines.Line2D object at 0x12dac01a0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12da2ae90>], 'medians': [<matplotlib.lines.Line2D object at 0x12dac02f0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12da2ad50>
[<matplotlib.lines.Line2D object at 0x12dac0440>]
<matplotlib.collections.PathCollection object at 0x12da2afd0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12dac0590>, <matplotlib.lines.Line2D object at 0x12dac06e0>], 'caps': [<matplotlib.lines.Line2D object at 0x12dac0830>, <matplotlib.lines.Line2D object at 0x12dac0980>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12da2b250>], 'medians': [<matplotlib.lines.Line2D object at 0x12dac0ad0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12da2b110>
[<matplotlib.lines.Line2D object at 0x12dac0c20>]
<matplotlib.collections.PathCollection object at 0x12da2b390>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12dac0d70>, <matplotlib.lines.Line2D object at 0x12dac0ec0>], 'caps': [<matplotlib.lines.Line2D object at 0x12dac1010>, <matplotlib.lines.Line2D object at 0x12dac1160>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12da2b610>], 'medians': [<matplotlib.lines.Line2D object at 0x12dac12b0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12da2b4d0>
[<matplotlib.lines.Line2D object at 0x12dac1400>]
<matplotlib.collections.PathCollection object at 0x12da2b890>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12dac1550>, <matplotlib.lines.Line2D object at 0x12dac16a0>], 'caps': [<matplotlib.lines.Line2D object at 0x12dac17f0>, <matplotlib.lines.Line2D object at 0x12dac1940>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12da2bb10>], 'medians': [<matplotlib.lines.Line2D object at 0x12dac1a90>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12da2b9d0>
[<matplotlib.lines.Line2D object at 0x12dac1be0>]
<matplotlib.collections.PathCollection object at 0x12da2b750>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12dac1d30>, <matplotlib.lines.Line2D object at 0x12dac1e80>], 'caps': [<matplotlib.lines.Line2D object at 0x12dac1fd0>, <matplotlib.lines.Line2D object at 0x12dac2120>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12da2bd90>], 'medians': [<matplotlib.lines.Line2D object at 0x12dac2270>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i in range(len(risk_order))])
ax.set_yticklabels(risk_order)
ax.set_xlabel('Return 2009 (%)')
ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.25)')
ax.legend(title="Fund Type", loc='upper right')
plt.tight_layout()
plt.show()

5. Scatterplot Analysis

We concluded by looking at relationships between variables.

5.1 Returns vs. Assets

How’s that done?
plt.figure(figsize=(12, 7))
sns.scatterplot(data=df, x='Assets', y='Return 2009', hue='Type', alpha=0.7)

plt.title('Scatterplot of 2009 Returns by Assets')
plt.xlabel('Assets (in millions)')
plt.ylabel('Return 2009 (%)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title='Fund Type')
plt.tight_layout()
plt.show()

5.2 Returns vs. Expense Ratio (Custom Symbols)

Here we used custom markers to denote Fees (Checkmark vs X) and color for Risk.

How’s that done?
# Define custom markers: Check for 'Yes', X for 'No'
markers = {"Yes": r'$\checkmark$', "No": "X"}

plt.figure(figsize=(12, 8))
sns.scatterplot(
    data=df,
    x='Expense Ratio',
    y='Return 2009',
    hue='Risk',
    style='Fees',
    markers=markers,
    s=150,
    alpha=0.8
)

plt.title('2009 Returns vs. Expense Ratio (Risk Color-coded, Fees by Symbol)')
plt.xlabel('Expense Ratio (%)')
plt.ylabel('Return 2009 (%)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Risk / Fees")
plt.tight_layout()
plt.show()

5.3 Interactive Plotly Scatterplot

Finally, here is the code to generate an interactive version of the previous plot using Plotly.

How’s that done?
import plotly.express as px

fig = px.scatter(
    df, 
    x='Expense Ratio', 
    y='Return 2009', 
    color='Risk', 
    symbol='Fees',
    symbol_map={'Yes': 'star', 'No': 'x'},
    category_orders={'Risk': risk_order},
    title='Interactive: 2009 Returns vs. Expense Ratio',
    labels={'Expense Ratio': 'Expense Ratio (%)', 'Return 2009': 'Return 2009 (%)'},
    hover_data=['Fund Number', 'Type', 'Assets'],
    template='plotly_white'
)
fig.show()