Gemini on Models

Author

Robert W. Walker

Published

January 14, 2026

Bond Funds Analysis: Complete Conversation Log

This document contains the complete analytical workflow generated during our session, formatted for use in Quarto.

1. Initial Exploration: Asset Distribution

We began by examining the distribution of assets across the funds using a histogram with density overlays.

How’s that done?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde

# Load the dataset
df = pd.read_csv('./BondFunds.csv')
assets = df['Assets'].dropna()

# Define bin parameters
bin_size = 100
bins = np.arange(0, assets.max() + bin_size, bin_size)

# Create the plot
plt.figure(figsize=(12, 7))

# 1. Histogram
plt.hist(assets, bins=bins, density=True, color='skyblue', edgecolor='black', alpha=0.5, label='Histogram (Bins=100m)')

# Generate points for the KDE lines
x_eval = np.linspace(0, assets.max(), 1000)

# 2. Density with Default Bandwidth
kde_default = gaussian_kde(assets)
plt.plot(x_eval, kde_default(x_eval), color='blue', lw=2, label='Density (Default BW)')

# 3. Density with Bandwidth = 2
kde_bw2 = gaussian_kde(assets, bw_method=2.0)
plt.plot(x_eval, kde_bw2(x_eval), color='red', lw=2, linestyle='--', label='Density (BW Factor = 2)')

plt.title('Histogram of Assets with Density Overlays')
plt.xlabel('Assets (in millions)')
plt.ylabel('Density')
plt.legend()
plt.tight_layout()
plt.show()

2. Advanced Distribution Visualization: Boxenplots

To visualize the distribution of assets by fund type, specifically focusing on tail behavior, we used boxenplots.

2.1 Standard Scale

How’s that done?
import seaborn as sns

plt.figure(figsize=(10, 6))
sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')

plt.title('Boxenplot of Assets by Fund Type')
plt.xlabel('Fund Type')
plt.ylabel('Assets (in millions)')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

2.2 Logarithmic Scale

Recognizing the skew in the data, we reconstructed the boxenplot on a logarithmic scale.

How’s that done?
plt.figure(figsize=(10, 6))
ax = sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')
<string>:1: FutureWarning: 

Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
How’s that done?
ax.set_yscale("log")

plt.title('Boxenplot of Assets by Fund Type (Logarithmic Scale)')
plt.xlabel('Fund Type')
plt.ylabel('Assets (in millions, Log Scale)')
plt.grid(axis='y', which='both', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

3. Performance Analysis: 2009 Returns

We then shifted focus to the returns in 2009.

3.1 Boxenplot of Returns by Type

How’s that done?
plt.figure(figsize=(10, 6))
sns.boxenplot(x='Type', y='Return 2009', data=df, palette='muted')

plt.title('Boxenplot of 2009 Returns by Fund Type')
plt.xlabel('Fund Type')
plt.ylabel('Return in 2009 (%)')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

3.2 Violin Plot by Risk and Type

We introduced Risk into the analysis, visualizing returns with split violin plots.

How’s that done?
# Define categorical order for Risk
risk_order = ['Below average', 'Average', 'Above average']
# Ensure Risk column is categorical with order
if 'Risk' in df.columns:
    unique_risks = df['Risk'].unique()
    # Filter risk_order to only include present categories
    present_risks = [r for r in risk_order if r in unique_risks]
    # Add any missing risks to the end if necessary, or just use present_risks
    # This logic handles potential data mismatch
    df['Risk'] = pd.Categorical(df['Risk'], categories=risk_order, ordered=True)

plt.figure(figsize=(12, 7))
sns.violinplot(data=df, x='Risk', y='Return 2009', hue='Type', split=True, inner="quart")

plt.title('2009 Returns by Risk Level and Fund Type')
plt.xlabel('Risk Level')
plt.ylabel('Return 2009 (%)')
plt.legend(title='Fund Type')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

4. Horizontal Raincloud Plots

We refined the visualization into “Raincloud” plots to show density, raw data, and summary statistics simultaneously, oriented horizontally.

4.1 Bandwidth = 0.5 (Smoother)

How’s that done?
import numpy as np
from scipy.stats import gaussian_kde

types = sorted(df['Type'].unique())
colors = sns.color_palette("Set2", len(types))

fig, ax = plt.subplots(figsize=(12, 8))
type_offset = 0.4

for i, risk in enumerate(risk_order):
    for j, fund_type in enumerate(types):
        mask = (df['Risk'] == risk) & (df['Type'] == fund_type)
        if not mask.any(): continue
        data = df[mask]['Return 2009'].dropna()
        if len(data) < 2:
            continue
            
        # KDE with bandwidth 0.5
        kde = gaussian_kde(data, bw_method=0.5)
        x_range = np.linspace(df['Return 2009'].min() - 2, df['Return 2009'].max() + 2, 500)
        y_kde = (kde(x_range) / kde(x_range).max()) * 0.35
        baseline = i - (j * type_offset)
        
        # Plot Cloud
        ax.fill_between(x_range, baseline, baseline + y_kde, color=colors[j], alpha=0.5, label=fund_type if i == 0 else "")
        ax.plot(x_range, baseline + y_kde, color=colors[j], lw=1.5, alpha=0.8)
        # Plot Rain
        jitter = np.random.uniform(-0.12, -0.05, size=len(data))
        ax.scatter(data, baseline + jitter, color=colors[j], s=12, alpha=0.4, edgecolors='none')
        # Plot Box
        ax.boxplot(data, positions=[baseline - 0.22], vert=False, widths=0.1, patch_artist=True, showfliers=False, medianprops=dict(color='black', lw=1.5))
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a44190>
[<matplotlib.lines.Line2D object at 0x12098a660>]
<matplotlib.collections.PathCollection object at 0x120a442d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12098a900>, <matplotlib.lines.Line2D object at 0x12098aa50>], 'caps': [<matplotlib.lines.Line2D object at 0x12098aba0>, <matplotlib.lines.Line2D object at 0x12098acf0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12098a3c0>], 'medians': [<matplotlib.lines.Line2D object at 0x12098ae40>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a44410>
[<matplotlib.lines.Line2D object at 0x12098b230>]
<matplotlib.collections.PathCollection object at 0x120a44550>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12098b380>, <matplotlib.lines.Line2D object at 0x12098b4d0>], 'caps': [<matplotlib.lines.Line2D object at 0x12098b620>, <matplotlib.lines.Line2D object at 0x12098b770>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a447d0>], 'medians': [<matplotlib.lines.Line2D object at 0x12098b8c0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a44690>
[<matplotlib.lines.Line2D object at 0x12098ba10>]
<matplotlib.collections.PathCollection object at 0x120a44910>
{'whiskers': [<matplotlib.lines.Line2D object at 0x120824440>, <matplotlib.lines.Line2D object at 0x120824590>], 'caps': [<matplotlib.lines.Line2D object at 0x1208241a0>, <matplotlib.lines.Line2D object at 0x120824050>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a44b90>], 'medians': [<matplotlib.lines.Line2D object at 0x1205cfe00>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12090df90>
[<matplotlib.lines.Line2D object at 0x1205cfa10>]
<matplotlib.collections.PathCollection object at 0x12090de50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x1205cfb60>, <matplotlib.lines.Line2D object at 0x1205cfcb0>], 'caps': [<matplotlib.lines.Line2D object at 0x1205cf8c0>, <matplotlib.lines.Line2D object at 0x1205cf770>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12090d450>], 'medians': [<matplotlib.lines.Line2D object at 0x1205cf620>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12090dbd0>
[<matplotlib.lines.Line2D object at 0x1205cf4d0>]
<matplotlib.collections.PathCollection object at 0x12090d1d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x11eef0ad0>, <matplotlib.lines.Line2D object at 0x1201592b0>], 'caps': [<matplotlib.lines.Line2D object at 0x120159400>, <matplotlib.lines.Line2D object at 0x11fca0c20>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12090dd10>], 'medians': [<matplotlib.lines.Line2D object at 0x11fca0ec0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12090d310>
[<matplotlib.lines.Line2D object at 0x11fca1010>]
<matplotlib.collections.PathCollection object at 0x12090d590>
{'whiskers': [<matplotlib.lines.Line2D object at 0x11fca12b0>, <matplotlib.lines.Line2D object at 0x12098af90>], 'caps': [<matplotlib.lines.Line2D object at 0x12098b0e0>, <matplotlib.lines.Line2D object at 0x12098bb60>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12090da90>], 'medians': [<matplotlib.lines.Line2D object at 0x12098bcb0>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i in range(len(risk_order))])
ax.set_yticklabels(risk_order)
ax.set_xlabel('Return 2009 (%)')
ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.5)')
ax.legend(title="Fund Type", loc='upper right')
plt.tight_layout()
plt.show()

4.2 Bandwidth = 0.25 (More Sensitive)

How’s that done?
fig, ax = plt.subplots(figsize=(12, 8))

for i, risk in enumerate(risk_order):
    for j, fund_type in enumerate(types):
        mask = (df['Risk'] == risk) & (df['Type'] == fund_type)
        if not mask.any(): continue
        data = df[mask]['Return 2009'].dropna()
        if len(data) < 2:
            continue
            
        # KDE with bandwidth 0.25
        kde = gaussian_kde(data, bw_method=0.25)
        x_range = np.linspace(df['Return 2009'].min() - 2, df['Return 2009'].max() + 2, 500)
        y_kde = (kde(x_range) / kde(x_range).max()) * 0.35
        baseline = i - (j * type_offset)
        
        ax.fill_between(x_range, baseline, baseline + y_kde, color=colors[j], alpha=0.5, label=fund_type if i == 0 else "")
        ax.plot(x_range, baseline + y_kde, color=colors[j], lw=1.5, alpha=0.8)
        jitter = np.random.uniform(-0.12, -0.05, size=len(data))
        ax.scatter(data, baseline + jitter, color=colors[j], s=12, alpha=0.4, edgecolors='none')
        ax.boxplot(data, positions=[baseline - 0.22], vert=False, widths=0.1, patch_artist=True, showfliers=False, medianprops=dict(color='black', lw=1.5))
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a45810>
[<matplotlib.lines.Line2D object at 0x12040acf0>]
<matplotlib.collections.PathCollection object at 0x120a45590>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12040ae40>, <matplotlib.lines.Line2D object at 0x12040af90>], 'caps': [<matplotlib.lines.Line2D object at 0x12040b0e0>, <matplotlib.lines.Line2D object at 0x12040b230>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a45a90>], 'medians': [<matplotlib.lines.Line2D object at 0x12040b380>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a45950>
[<matplotlib.lines.Line2D object at 0x12040b4d0>]
<matplotlib.collections.PathCollection object at 0x120a45e50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12040b620>, <matplotlib.lines.Line2D object at 0x12040b770>], 'caps': [<matplotlib.lines.Line2D object at 0x12040b8c0>, <matplotlib.lines.Line2D object at 0x12040ba10>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a460d0>], 'medians': [<matplotlib.lines.Line2D object at 0x12040bb60>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a45f90>
[<matplotlib.lines.Line2D object at 0x12040bcb0>]
<matplotlib.collections.PathCollection object at 0x120a46210>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12040be00>, <matplotlib.lines.Line2D object at 0x120830050>], 'caps': [<matplotlib.lines.Line2D object at 0x1208301a0>, <matplotlib.lines.Line2D object at 0x1208302f0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a46490>], 'medians': [<matplotlib.lines.Line2D object at 0x120830440>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a46350>
[<matplotlib.lines.Line2D object at 0x120830590>]
<matplotlib.collections.PathCollection object at 0x120a465d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x1208306e0>, <matplotlib.lines.Line2D object at 0x120830830>], 'caps': [<matplotlib.lines.Line2D object at 0x120830980>, <matplotlib.lines.Line2D object at 0x120830ad0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a46850>], 'medians': [<matplotlib.lines.Line2D object at 0x120830c20>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a46710>
[<matplotlib.lines.Line2D object at 0x120830d70>]
<matplotlib.collections.PathCollection object at 0x120a46ad0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x120830ec0>, <matplotlib.lines.Line2D object at 0x120831010>], 'caps': [<matplotlib.lines.Line2D object at 0x120831160>, <matplotlib.lines.Line2D object at 0x1208312b0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a46d50>], 'medians': [<matplotlib.lines.Line2D object at 0x120831400>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a46c10>
[<matplotlib.lines.Line2D object at 0x120831550>]
<matplotlib.collections.PathCollection object at 0x120a46990>
{'whiskers': [<matplotlib.lines.Line2D object at 0x1208316a0>, <matplotlib.lines.Line2D object at 0x1208317f0>], 'caps': [<matplotlib.lines.Line2D object at 0x120831940>, <matplotlib.lines.Line2D object at 0x120831a90>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a46fd0>], 'medians': [<matplotlib.lines.Line2D object at 0x120831be0>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i in range(len(risk_order))])
ax.set_yticklabels(risk_order)
ax.set_xlabel('Return 2009 (%)')
ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.25)')
ax.legend(title="Fund Type", loc='upper right')
plt.tight_layout()
plt.show()

5. Scatterplot Analysis

We concluded by looking at relationships between variables.

5.1 Returns vs. Assets

How’s that done?
plt.figure(figsize=(12, 7))
sns.scatterplot(data=df, x='Assets', y='Return 2009', hue='Type', alpha=0.7)

plt.title('Scatterplot of 2009 Returns by Assets')
plt.xlabel('Assets (in millions)')
plt.ylabel('Return 2009 (%)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title='Fund Type')
plt.tight_layout()
plt.show()

5.2 Returns vs. Expense Ratio (Custom Symbols)

Here we used custom markers to denote Fees (Checkmark vs X) and color for Risk.

How’s that done?
# Define custom markers: Check for 'Yes', X for 'No'
markers = {"Yes": r'$\checkmark$', "No": "X"}

plt.figure(figsize=(12, 8))
sns.scatterplot(
    data=df,
    x='Expense Ratio',
    y='Return 2009',
    hue='Risk',
    style='Fees',
    markers=markers,
    s=150,
    alpha=0.8
)

plt.title('2009 Returns vs. Expense Ratio (Risk Color-coded, Fees by Symbol)')
plt.xlabel('Expense Ratio (%)')
plt.ylabel('Return 2009 (%)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Risk / Fees")
plt.tight_layout()
plt.show()

5.3 Interactive Plotly Scatterplot

Finally, here is the code to generate an interactive version of the previous plot using Plotly.

How’s that done?
import plotly.express as px

fig = px.scatter(
    df, 
    x='Expense Ratio', 
    y='Return 2009', 
    color='Risk', 
    symbol='Fees',
    symbol_map={'Yes': 'star', 'No': 'x'},
    category_orders={'Risk': risk_order},
    title='Interactive: 2009 Returns vs. Expense Ratio',
    labels={'Expense Ratio': 'Expense Ratio (%)', 'Return 2009': 'Return 2009 (%)'},
    hover_data=['Fund Number', 'Type', 'Assets'],
    template='plotly_white'
)
fig.show()

Bond Funds Analysis: Complete Conversation Log

This document contains the complete analytical workflow generated during our session, formatted for use in Quarto.

1. Initial Exploration: Asset Distribution

We began by examining the distribution of assets across the funds using a histogram with density overlays.

How’s that done?
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import gaussian_kde

# Load the dataset
df = pd.read_csv('./BondFunds.csv')
assets = df['Assets'].dropna()

# Define bin parameters
bin_size = 100
bins = np.arange(0, assets.max() + bin_size, bin_size)

# Create the plot
plt.figure(figsize=(12, 7))

# 1. Histogram
plt.hist(assets, bins=bins, density=True, color='skyblue', edgecolor='black', alpha=0.5, label='Histogram (Bins=100m)')

# Generate points for the KDE lines
x_eval = np.linspace(0, assets.max(), 1000)

# 2. Density with Default Bandwidth
kde_default = gaussian_kde(assets)
plt.plot(x_eval, kde_default(x_eval), color='blue', lw=2, label='Density (Default BW)')

# 3. Density with Bandwidth = 2
kde_bw2 = gaussian_kde(assets, bw_method=2.0)
plt.plot(x_eval, kde_bw2(x_eval), color='red', lw=2, linestyle='--', label='Density (BW Factor = 2)')

plt.title('Histogram of Assets with Density Overlays')
plt.xlabel('Assets (in millions)')
plt.ylabel('Density')
plt.legend()
plt.tight_layout()
plt.show()

2. Advanced Distribution Visualization: Boxenplots

To visualize the distribution of assets by fund type, specifically focusing on tail behavior, we used boxenplots.

2.1 Standard Scale

How’s that done?
import seaborn as sns

plt.figure(figsize=(10, 6))
sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')

plt.title('Boxenplot of Assets by Fund Type')
plt.xlabel('Fund Type')
plt.ylabel('Assets (in millions)')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

2.2 Logarithmic Scale

Recognizing the skew in the data, we reconstructed the boxenplot on a logarithmic scale.

How’s that done?
plt.figure(figsize=(10, 6))
ax = sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')
<string>:1: FutureWarning:



Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
How’s that done?
ax.set_yscale("log")

plt.title('Boxenplot of Assets by Fund Type (Logarithmic Scale)')
plt.xlabel('Fund Type')
plt.ylabel('Assets (in millions, Log Scale)')
plt.grid(axis='y', which='both', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

3. Performance Analysis: 2009 Returns

We then shifted focus to the returns in 2009.

3.1 Boxenplot of Returns by Type

How’s that done?
plt.figure(figsize=(10, 6))
sns.boxenplot(x='Type', y='Return 2009', data=df, palette='muted')

plt.title('Boxenplot of 2009 Returns by Fund Type')
plt.xlabel('Fund Type')
plt.ylabel('Return in 2009 (%)')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

3.2 Violin Plot by Risk and Type

We introduced Risk into the analysis, visualizing returns with split violin plots.

How’s that done?
# Define categorical order for Risk
risk_order = ['Below average', 'Average', 'Above average']
# Ensure Risk column is categorical with order
if 'Risk' in df.columns:
    unique_risks = df['Risk'].unique()
    # Filter risk_order to only include present categories
    present_risks = [r for r in risk_order if r in unique_risks]
    # Add any missing risks to the end if necessary, or just use present_risks
    # This logic handles potential data mismatch
    df['Risk'] = pd.Categorical(df['Risk'], categories=risk_order, ordered=True)

plt.figure(figsize=(12, 7))
sns.violinplot(data=df, x='Risk', y='Return 2009', hue='Type', split=True, inner="quart")

plt.title('2009 Returns by Risk Level and Fund Type')
plt.xlabel('Risk Level')
plt.ylabel('Return 2009 (%)')
plt.legend(title='Fund Type')
plt.grid(axis='y', linestyle='--', alpha=0.3)
plt.tight_layout()
plt.show()

4. Horizontal Raincloud Plots

We refined the visualization into “Raincloud” plots to show density, raw data, and summary statistics simultaneously, oriented horizontally.

4.1 Bandwidth = 0.5 (Smoother)

How’s that done?
import numpy as np
from scipy.stats import gaussian_kde

types = sorted(df['Type'].unique())
colors = sns.color_palette("Set2", len(types))

fig, ax = plt.subplots(figsize=(12, 8))
type_offset = 0.4

for i, risk in enumerate(risk_order):
    for j, fund_type in enumerate(types):
        mask = (df['Risk'] == risk) & (df['Type'] == fund_type)
        if not mask.any(): continue
        data = df[mask]['Return 2009'].dropna()
        if len(data) < 2:
            continue
            
        # KDE with bandwidth 0.5
        kde = gaussian_kde(data, bw_method=0.5)
        x_range = np.linspace(df['Return 2009'].min() - 2, df['Return 2009'].max() + 2, 500)
        y_kde = (kde(x_range) / kde(x_range).max()) * 0.35
        baseline = i - (j * type_offset)
        
        # Plot Cloud
        ax.fill_between(x_range, baseline, baseline + y_kde, color=colors[j], alpha=0.5, label=fund_type if i == 0 else "")
        ax.plot(x_range, baseline + y_kde, color=colors[j], lw=1.5, alpha=0.8)
        # Plot Rain
        jitter = np.random.uniform(-0.12, -0.05, size=len(data))
        ax.scatter(data, baseline + jitter, color=colors[j], s=12, alpha=0.4, edgecolors='none')
        # Plot Box
        ax.boxplot(data, positions=[baseline - 0.22], vert=False, widths=0.1, patch_artist=True, showfliers=False, medianprops=dict(color='black', lw=1.5))
<matplotlib.collections.FillBetweenPolyCollection object at 0x123a73890>
[<matplotlib.lines.Line2D object at 0x123ad92b0>]
<matplotlib.collections.PathCollection object at 0x123a739d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123ad9400>, <matplotlib.lines.Line2D object at 0x123ad9550>], 'caps': [<matplotlib.lines.Line2D object at 0x123ad96a0>, <matplotlib.lines.Line2D object at 0x123ad97f0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123a73c50>], 'medians': [<matplotlib.lines.Line2D object at 0x123ad9940>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123a73b10>
[<matplotlib.lines.Line2D object at 0x123ad9a90>]
<matplotlib.collections.PathCollection object at 0x123a73ed0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123ad9be0>, <matplotlib.lines.Line2D object at 0x123ad9d30>], 'caps': [<matplotlib.lines.Line2D object at 0x123ad9e80>, <matplotlib.lines.Line2D object at 0x123ad9fd0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b30190>], 'medians': [<matplotlib.lines.Line2D object at 0x123ada120>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123b30050>
[<matplotlib.lines.Line2D object at 0x123ada270>]
<matplotlib.collections.PathCollection object at 0x123b302d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123ada3c0>, <matplotlib.lines.Line2D object at 0x123ada510>], 'caps': [<matplotlib.lines.Line2D object at 0x123ada660>, <matplotlib.lines.Line2D object at 0x123ada7b0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b30550>], 'medians': [<matplotlib.lines.Line2D object at 0x123ada900>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123b30410>
[<matplotlib.lines.Line2D object at 0x123adaa50>]
<matplotlib.collections.PathCollection object at 0x123b30690>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123adaba0>, <matplotlib.lines.Line2D object at 0x123adacf0>], 'caps': [<matplotlib.lines.Line2D object at 0x123adae40>, <matplotlib.lines.Line2D object at 0x123adaf90>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b30910>], 'medians': [<matplotlib.lines.Line2D object at 0x123adb0e0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123b307d0>
[<matplotlib.lines.Line2D object at 0x123adb230>]
<matplotlib.collections.PathCollection object at 0x123b30b90>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123adb380>, <matplotlib.lines.Line2D object at 0x123adb4d0>], 'caps': [<matplotlib.lines.Line2D object at 0x123adb620>, <matplotlib.lines.Line2D object at 0x123adb770>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b30e10>], 'medians': [<matplotlib.lines.Line2D object at 0x123adb8c0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123b30cd0>
[<matplotlib.lines.Line2D object at 0x123adba10>]
<matplotlib.collections.PathCollection object at 0x123b30a50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123adbb60>, <matplotlib.lines.Line2D object at 0x123adbcb0>], 'caps': [<matplotlib.lines.Line2D object at 0x123adbe00>, <matplotlib.lines.Line2D object at 0x123b98050>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b31090>], 'medians': [<matplotlib.lines.Line2D object at 0x123b981a0>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i in range(len(risk_order))])
ax.set_yticklabels(risk_order)
ax.set_xlabel('Return 2009 (%)')
ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.5)')
ax.legend(title="Fund Type", loc='upper right')
plt.tight_layout()
plt.show()

4.2 Bandwidth = 0.25 (More Sensitive)

How’s that done?
fig, ax = plt.subplots(figsize=(12, 8))

for i, risk in enumerate(risk_order):
    for j, fund_type in enumerate(types):
        mask = (df['Risk'] == risk) & (df['Type'] == fund_type)
        if not mask.any(): continue
        data = df[mask]['Return 2009'].dropna()
        if len(data) < 2:
            continue
            
        # KDE with bandwidth 0.25
        kde = gaussian_kde(data, bw_method=0.25)
        x_range = np.linspace(df['Return 2009'].min() - 2, df['Return 2009'].max() + 2, 500)
        y_kde = (kde(x_range) / kde(x_range).max()) * 0.35
        baseline = i - (j * type_offset)
        
        ax.fill_between(x_range, baseline, baseline + y_kde, color=colors[j], alpha=0.5, label=fund_type if i == 0 else "")
        ax.plot(x_range, baseline + y_kde, color=colors[j], lw=1.5, alpha=0.8)
        jitter = np.random.uniform(-0.12, -0.05, size=len(data))
        ax.scatter(data, baseline + jitter, color=colors[j], s=12, alpha=0.4, edgecolors='none')
        ax.boxplot(data, positions=[baseline - 0.22], vert=False, widths=0.1, patch_artist=True, showfliers=False, medianprops=dict(color='black', lw=1.5))
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bee850>
[<matplotlib.lines.Line2D object at 0x123b9b380>]
<matplotlib.collections.PathCollection object at 0x123bee5d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123b9b4d0>, <matplotlib.lines.Line2D object at 0x123b9b620>], 'caps': [<matplotlib.lines.Line2D object at 0x123b9b770>, <matplotlib.lines.Line2D object at 0x123b9b8c0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123beead0>], 'medians': [<matplotlib.lines.Line2D object at 0x123b9ba10>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bee990>
[<matplotlib.lines.Line2D object at 0x123b9bb60>]
<matplotlib.collections.PathCollection object at 0x123beec10>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123b9bcb0>, <matplotlib.lines.Line2D object at 0x123b9be00>], 'caps': [<matplotlib.lines.Line2D object at 0x123c84050>, <matplotlib.lines.Line2D object at 0x123c841a0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123beee90>], 'medians': [<matplotlib.lines.Line2D object at 0x123c842f0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123beed50>
[<matplotlib.lines.Line2D object at 0x123c84440>]
<matplotlib.collections.PathCollection object at 0x123beefd0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123c84590>, <matplotlib.lines.Line2D object at 0x123c846e0>], 'caps': [<matplotlib.lines.Line2D object at 0x123c84830>, <matplotlib.lines.Line2D object at 0x123c84980>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123bef250>], 'medians': [<matplotlib.lines.Line2D object at 0x123c84ad0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bef110>
[<matplotlib.lines.Line2D object at 0x123c84c20>]
<matplotlib.collections.PathCollection object at 0x123bef390>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123c84d70>, <matplotlib.lines.Line2D object at 0x123c84ec0>], 'caps': [<matplotlib.lines.Line2D object at 0x123c85010>, <matplotlib.lines.Line2D object at 0x123c85160>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123bef610>], 'medians': [<matplotlib.lines.Line2D object at 0x123c852b0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bef4d0>
[<matplotlib.lines.Line2D object at 0x123c85400>]
<matplotlib.collections.PathCollection object at 0x123bef890>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123c85550>, <matplotlib.lines.Line2D object at 0x123c856a0>], 'caps': [<matplotlib.lines.Line2D object at 0x123c857f0>, <matplotlib.lines.Line2D object at 0x123c85940>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123befb10>], 'medians': [<matplotlib.lines.Line2D object at 0x123c85a90>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bef9d0>
[<matplotlib.lines.Line2D object at 0x123c85be0>]
<matplotlib.collections.PathCollection object at 0x123bef750>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123c85d30>, <matplotlib.lines.Line2D object at 0x123c85e80>], 'caps': [<matplotlib.lines.Line2D object at 0x123c85fd0>, <matplotlib.lines.Line2D object at 0x123c86120>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123befd90>], 'medians': [<matplotlib.lines.Line2D object at 0x123c86270>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i in range(len(risk_order))])
ax.set_yticklabels(risk_order)
ax.set_xlabel('Return 2009 (%)')
ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.25)')
ax.legend(title="Fund Type", loc='upper right')
plt.tight_layout()
plt.show()

5. Scatterplot Analysis

We concluded by looking at relationships between variables.

5.1 Returns vs. Assets

How’s that done?
plt.figure(figsize=(12, 7))
sns.scatterplot(data=df, x='Assets', y='Return 2009', hue='Type', alpha=0.7)

plt.title('Scatterplot of 2009 Returns by Assets')
plt.xlabel('Assets (in millions)')
plt.ylabel('Return 2009 (%)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(title='Fund Type')
plt.tight_layout()
plt.show()

5.2 Returns vs. Expense Ratio (Custom Symbols)

Here we used custom markers to denote Fees (Checkmark vs X) and color for Risk.

How’s that done?
# Define custom markers: Check for 'Yes', X for 'No'
markers = {"Yes": r'$\checkmark$', "No": "X"}

plt.figure(figsize=(12, 8))
sns.scatterplot(
    data=df,
    x='Expense Ratio',
    y='Return 2009',
    hue='Risk',
    style='Fees',
    markers=markers,
    s=150,
    alpha=0.8
)

plt.title('2009 Returns vs. Expense Ratio (Risk Color-coded, Fees by Symbol)')
plt.xlabel('Expense Ratio (%)')
plt.ylabel('Return 2009 (%)')
plt.grid(True, linestyle='--', alpha=0.5)
plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Risk / Fees")
plt.tight_layout()
plt.show()

5.3 Interactive Plotly Scatterplot

Finally, here is the code to generate an interactive version of the previous plot using Plotly.

How’s that done?
import plotly.express as px

fig = px.scatter(
    df, 
    x='Expense Ratio', 
    y='Return 2009', 
    color='Risk', 
    symbol='Fees',
    symbol_map={'Yes': 'star', 'No': 'x'},
    category_orders={'Risk': risk_order},
    title='Interactive: 2009 Returns vs. Expense Ratio',
    labels={'Expense Ratio': 'Expense Ratio (%)', 'Return 2009': 'Return 2009 (%)'},
    hover_data=['Fund Number', 'Type', 'Assets'],
    template='plotly_white'
)
fig.show()