This document contains the complete analytical workflow generated during our session, formatted for use in Quarto.
1. Initial Exploration: Asset Distribution
We began by examining the distribution of assets across the funds using a histogram with density overlays.
How’s that done?
import pandas as pdimport matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import gaussian_kde# Load the datasetdf = pd.read_csv('./BondFunds.csv')assets = df['Assets'].dropna()# Define bin parametersbin_size =100bins = np.arange(0, assets.max() + bin_size, bin_size)# Create the plotplt.figure(figsize=(12, 7))# 1. Histogramplt.hist(assets, bins=bins, density=True, color='skyblue', edgecolor='black', alpha=0.5, label='Histogram (Bins=100m)')# Generate points for the KDE linesx_eval = np.linspace(0, assets.max(), 1000)# 2. Density with Default Bandwidthkde_default = gaussian_kde(assets)plt.plot(x_eval, kde_default(x_eval), color='blue', lw=2, label='Density (Default BW)')# 3. Density with Bandwidth = 2kde_bw2 = gaussian_kde(assets, bw_method=2.0)plt.plot(x_eval, kde_bw2(x_eval), color='red', lw=2, linestyle='--', label='Density (BW Factor = 2)')plt.title('Histogram of Assets with Density Overlays')plt.xlabel('Assets (in millions)')plt.ylabel('Density')plt.legend()plt.tight_layout()plt.show()
2. Advanced Distribution Visualization: Boxenplots
To visualize the distribution of assets by fund type, specifically focusing on tail behavior, we used boxenplots.
2.1 Standard Scale
How’s that done?
import seaborn as snsplt.figure(figsize=(10, 6))sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')plt.title('Boxenplot of Assets by Fund Type')plt.xlabel('Fund Type')plt.ylabel('Assets (in millions)')plt.grid(axis='y', linestyle='--', alpha=0.3)plt.tight_layout()plt.show()
2.2 Logarithmic Scale
Recognizing the skew in the data, we reconstructed the boxenplot on a logarithmic scale.
<string>:1: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
How’s that done?
ax.set_yscale("log")plt.title('Boxenplot of Assets by Fund Type (Logarithmic Scale)')plt.xlabel('Fund Type')plt.ylabel('Assets (in millions, Log Scale)')plt.grid(axis='y', which='both', linestyle='--', alpha=0.3)plt.tight_layout()plt.show()
3. Performance Analysis: 2009 Returns
We then shifted focus to the returns in 2009.
3.1 Boxenplot of Returns by Type
How’s that done?
plt.figure(figsize=(10, 6))sns.boxenplot(x='Type', y='Return 2009', data=df, palette='muted')plt.title('Boxenplot of 2009 Returns by Fund Type')plt.xlabel('Fund Type')plt.ylabel('Return in 2009 (%)')plt.grid(axis='y', linestyle='--', alpha=0.3)plt.tight_layout()plt.show()
3.2 Violin Plot by Risk and Type
We introduced Risk into the analysis, visualizing returns with split violin plots.
How’s that done?
# Define categorical order for Riskrisk_order = ['Below average', 'Average', 'Above average']# Ensure Risk column is categorical with orderif'Risk'in df.columns: unique_risks = df['Risk'].unique()# Filter risk_order to only include present categories present_risks = [r for r in risk_order if r in unique_risks]# Add any missing risks to the end if necessary, or just use present_risks# This logic handles potential data mismatch df['Risk'] = pd.Categorical(df['Risk'], categories=risk_order, ordered=True)plt.figure(figsize=(12, 7))sns.violinplot(data=df, x='Risk', y='Return 2009', hue='Type', split=True, inner="quart")plt.title('2009 Returns by Risk Level and Fund Type')plt.xlabel('Risk Level')plt.ylabel('Return 2009 (%)')plt.legend(title='Fund Type')plt.grid(axis='y', linestyle='--', alpha=0.3)plt.tight_layout()plt.show()
4. Horizontal Raincloud Plots
We refined the visualization into “Raincloud” plots to show density, raw data, and summary statistics simultaneously, oriented horizontally.
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a44190>
[<matplotlib.lines.Line2D object at 0x12098a660>]
<matplotlib.collections.PathCollection object at 0x120a442d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12098a900>, <matplotlib.lines.Line2D object at 0x12098aa50>], 'caps': [<matplotlib.lines.Line2D object at 0x12098aba0>, <matplotlib.lines.Line2D object at 0x12098acf0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12098a3c0>], 'medians': [<matplotlib.lines.Line2D object at 0x12098ae40>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a44410>
[<matplotlib.lines.Line2D object at 0x12098b230>]
<matplotlib.collections.PathCollection object at 0x120a44550>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12098b380>, <matplotlib.lines.Line2D object at 0x12098b4d0>], 'caps': [<matplotlib.lines.Line2D object at 0x12098b620>, <matplotlib.lines.Line2D object at 0x12098b770>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a447d0>], 'medians': [<matplotlib.lines.Line2D object at 0x12098b8c0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a44690>
[<matplotlib.lines.Line2D object at 0x12098ba10>]
<matplotlib.collections.PathCollection object at 0x120a44910>
{'whiskers': [<matplotlib.lines.Line2D object at 0x120824440>, <matplotlib.lines.Line2D object at 0x120824590>], 'caps': [<matplotlib.lines.Line2D object at 0x1208241a0>, <matplotlib.lines.Line2D object at 0x120824050>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a44b90>], 'medians': [<matplotlib.lines.Line2D object at 0x1205cfe00>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12090df90>
[<matplotlib.lines.Line2D object at 0x1205cfa10>]
<matplotlib.collections.PathCollection object at 0x12090de50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x1205cfb60>, <matplotlib.lines.Line2D object at 0x1205cfcb0>], 'caps': [<matplotlib.lines.Line2D object at 0x1205cf8c0>, <matplotlib.lines.Line2D object at 0x1205cf770>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12090d450>], 'medians': [<matplotlib.lines.Line2D object at 0x1205cf620>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12090dbd0>
[<matplotlib.lines.Line2D object at 0x1205cf4d0>]
<matplotlib.collections.PathCollection object at 0x12090d1d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x11eef0ad0>, <matplotlib.lines.Line2D object at 0x1201592b0>], 'caps': [<matplotlib.lines.Line2D object at 0x120159400>, <matplotlib.lines.Line2D object at 0x11fca0c20>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12090dd10>], 'medians': [<matplotlib.lines.Line2D object at 0x11fca0ec0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x12090d310>
[<matplotlib.lines.Line2D object at 0x11fca1010>]
<matplotlib.collections.PathCollection object at 0x12090d590>
{'whiskers': [<matplotlib.lines.Line2D object at 0x11fca12b0>, <matplotlib.lines.Line2D object at 0x12098af90>], 'caps': [<matplotlib.lines.Line2D object at 0x12098b0e0>, <matplotlib.lines.Line2D object at 0x12098bb60>], 'boxes': [<matplotlib.patches.PathPatch object at 0x12090da90>], 'medians': [<matplotlib.lines.Line2D object at 0x12098bcb0>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i inrange(len(risk_order))])ax.set_yticklabels(risk_order)ax.set_xlabel('Return 2009 (%)')ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.5)')ax.legend(title="Fund Type", loc='upper right')plt.tight_layout()plt.show()
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a45810>
[<matplotlib.lines.Line2D object at 0x12040acf0>]
<matplotlib.collections.PathCollection object at 0x120a45590>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12040ae40>, <matplotlib.lines.Line2D object at 0x12040af90>], 'caps': [<matplotlib.lines.Line2D object at 0x12040b0e0>, <matplotlib.lines.Line2D object at 0x12040b230>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a45a90>], 'medians': [<matplotlib.lines.Line2D object at 0x12040b380>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a45950>
[<matplotlib.lines.Line2D object at 0x12040b4d0>]
<matplotlib.collections.PathCollection object at 0x120a45e50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12040b620>, <matplotlib.lines.Line2D object at 0x12040b770>], 'caps': [<matplotlib.lines.Line2D object at 0x12040b8c0>, <matplotlib.lines.Line2D object at 0x12040ba10>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a460d0>], 'medians': [<matplotlib.lines.Line2D object at 0x12040bb60>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a45f90>
[<matplotlib.lines.Line2D object at 0x12040bcb0>]
<matplotlib.collections.PathCollection object at 0x120a46210>
{'whiskers': [<matplotlib.lines.Line2D object at 0x12040be00>, <matplotlib.lines.Line2D object at 0x120830050>], 'caps': [<matplotlib.lines.Line2D object at 0x1208301a0>, <matplotlib.lines.Line2D object at 0x1208302f0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a46490>], 'medians': [<matplotlib.lines.Line2D object at 0x120830440>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a46350>
[<matplotlib.lines.Line2D object at 0x120830590>]
<matplotlib.collections.PathCollection object at 0x120a465d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x1208306e0>, <matplotlib.lines.Line2D object at 0x120830830>], 'caps': [<matplotlib.lines.Line2D object at 0x120830980>, <matplotlib.lines.Line2D object at 0x120830ad0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a46850>], 'medians': [<matplotlib.lines.Line2D object at 0x120830c20>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a46710>
[<matplotlib.lines.Line2D object at 0x120830d70>]
<matplotlib.collections.PathCollection object at 0x120a46ad0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x120830ec0>, <matplotlib.lines.Line2D object at 0x120831010>], 'caps': [<matplotlib.lines.Line2D object at 0x120831160>, <matplotlib.lines.Line2D object at 0x1208312b0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a46d50>], 'medians': [<matplotlib.lines.Line2D object at 0x120831400>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x120a46c10>
[<matplotlib.lines.Line2D object at 0x120831550>]
<matplotlib.collections.PathCollection object at 0x120a46990>
{'whiskers': [<matplotlib.lines.Line2D object at 0x1208316a0>, <matplotlib.lines.Line2D object at 0x1208317f0>], 'caps': [<matplotlib.lines.Line2D object at 0x120831940>, <matplotlib.lines.Line2D object at 0x120831a90>], 'boxes': [<matplotlib.patches.PathPatch object at 0x120a46fd0>], 'medians': [<matplotlib.lines.Line2D object at 0x120831be0>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i inrange(len(risk_order))])ax.set_yticklabels(risk_order)ax.set_xlabel('Return 2009 (%)')ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.25)')ax.legend(title="Fund Type", loc='upper right')plt.tight_layout()plt.show()
5. Scatterplot Analysis
We concluded by looking at relationships between variables.
5.1 Returns vs. Assets
How’s that done?
plt.figure(figsize=(12, 7))sns.scatterplot(data=df, x='Assets', y='Return 2009', hue='Type', alpha=0.7)plt.title('Scatterplot of 2009 Returns by Assets')plt.xlabel('Assets (in millions)')plt.ylabel('Return 2009 (%)')plt.grid(True, linestyle='--', alpha=0.5)plt.legend(title='Fund Type')plt.tight_layout()plt.show()
5.2 Returns vs. Expense Ratio (Custom Symbols)
Here we used custom markers to denote Fees (Checkmark vs X) and color for Risk.
How’s that done?
# Define custom markers: Check for 'Yes', X for 'No'markers = {"Yes": r'$\checkmark$', "No": "X"}plt.figure(figsize=(12, 8))sns.scatterplot( data=df, x='Expense Ratio', y='Return 2009', hue='Risk', style='Fees', markers=markers, s=150, alpha=0.8)plt.title('2009 Returns vs. Expense Ratio (Risk Color-coded, Fees by Symbol)')plt.xlabel('Expense Ratio (%)')plt.ylabel('Return 2009 (%)')plt.grid(True, linestyle='--', alpha=0.5)plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Risk / Fees")plt.tight_layout()plt.show()
5.3 Interactive Plotly Scatterplot
Finally, here is the code to generate an interactive version of the previous plot using Plotly.
This document contains the complete analytical workflow generated during our session, formatted for use in Quarto.
1. Initial Exploration: Asset Distribution
We began by examining the distribution of assets across the funds using a histogram with density overlays.
How’s that done?
import pandas as pdimport matplotlib.pyplot as pltimport numpy as npfrom scipy.stats import gaussian_kde# Load the datasetdf = pd.read_csv('./BondFunds.csv')assets = df['Assets'].dropna()# Define bin parametersbin_size =100bins = np.arange(0, assets.max() + bin_size, bin_size)# Create the plotplt.figure(figsize=(12, 7))# 1. Histogramplt.hist(assets, bins=bins, density=True, color='skyblue', edgecolor='black', alpha=0.5, label='Histogram (Bins=100m)')# Generate points for the KDE linesx_eval = np.linspace(0, assets.max(), 1000)# 2. Density with Default Bandwidthkde_default = gaussian_kde(assets)plt.plot(x_eval, kde_default(x_eval), color='blue', lw=2, label='Density (Default BW)')# 3. Density with Bandwidth = 2kde_bw2 = gaussian_kde(assets, bw_method=2.0)plt.plot(x_eval, kde_bw2(x_eval), color='red', lw=2, linestyle='--', label='Density (BW Factor = 2)')plt.title('Histogram of Assets with Density Overlays')plt.xlabel('Assets (in millions)')plt.ylabel('Density')plt.legend()plt.tight_layout()plt.show()
2. Advanced Distribution Visualization: Boxenplots
To visualize the distribution of assets by fund type, specifically focusing on tail behavior, we used boxenplots.
2.1 Standard Scale
How’s that done?
import seaborn as snsplt.figure(figsize=(10, 6))sns.boxenplot(x='Type', y='Assets', data=df, palette='muted')plt.title('Boxenplot of Assets by Fund Type')plt.xlabel('Fund Type')plt.ylabel('Assets (in millions)')plt.grid(axis='y', linestyle='--', alpha=0.3)plt.tight_layout()plt.show()
2.2 Logarithmic Scale
Recognizing the skew in the data, we reconstructed the boxenplot on a logarithmic scale.
<string>:1: FutureWarning:
Passing `palette` without assigning `hue` is deprecated and will be removed in v0.14.0. Assign the `x` variable to `hue` and set `legend=False` for the same effect.
How’s that done?
ax.set_yscale("log")plt.title('Boxenplot of Assets by Fund Type (Logarithmic Scale)')plt.xlabel('Fund Type')plt.ylabel('Assets (in millions, Log Scale)')plt.grid(axis='y', which='both', linestyle='--', alpha=0.3)plt.tight_layout()plt.show()
3. Performance Analysis: 2009 Returns
We then shifted focus to the returns in 2009.
3.1 Boxenplot of Returns by Type
How’s that done?
plt.figure(figsize=(10, 6))sns.boxenplot(x='Type', y='Return 2009', data=df, palette='muted')plt.title('Boxenplot of 2009 Returns by Fund Type')plt.xlabel('Fund Type')plt.ylabel('Return in 2009 (%)')plt.grid(axis='y', linestyle='--', alpha=0.3)plt.tight_layout()plt.show()
3.2 Violin Plot by Risk and Type
We introduced Risk into the analysis, visualizing returns with split violin plots.
How’s that done?
# Define categorical order for Riskrisk_order = ['Below average', 'Average', 'Above average']# Ensure Risk column is categorical with orderif'Risk'in df.columns: unique_risks = df['Risk'].unique()# Filter risk_order to only include present categories present_risks = [r for r in risk_order if r in unique_risks]# Add any missing risks to the end if necessary, or just use present_risks# This logic handles potential data mismatch df['Risk'] = pd.Categorical(df['Risk'], categories=risk_order, ordered=True)plt.figure(figsize=(12, 7))sns.violinplot(data=df, x='Risk', y='Return 2009', hue='Type', split=True, inner="quart")plt.title('2009 Returns by Risk Level and Fund Type')plt.xlabel('Risk Level')plt.ylabel('Return 2009 (%)')plt.legend(title='Fund Type')plt.grid(axis='y', linestyle='--', alpha=0.3)plt.tight_layout()plt.show()
4. Horizontal Raincloud Plots
We refined the visualization into “Raincloud” plots to show density, raw data, and summary statistics simultaneously, oriented horizontally.
<matplotlib.collections.FillBetweenPolyCollection object at 0x123a73890>
[<matplotlib.lines.Line2D object at 0x123ad92b0>]
<matplotlib.collections.PathCollection object at 0x123a739d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123ad9400>, <matplotlib.lines.Line2D object at 0x123ad9550>], 'caps': [<matplotlib.lines.Line2D object at 0x123ad96a0>, <matplotlib.lines.Line2D object at 0x123ad97f0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123a73c50>], 'medians': [<matplotlib.lines.Line2D object at 0x123ad9940>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123a73b10>
[<matplotlib.lines.Line2D object at 0x123ad9a90>]
<matplotlib.collections.PathCollection object at 0x123a73ed0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123ad9be0>, <matplotlib.lines.Line2D object at 0x123ad9d30>], 'caps': [<matplotlib.lines.Line2D object at 0x123ad9e80>, <matplotlib.lines.Line2D object at 0x123ad9fd0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b30190>], 'medians': [<matplotlib.lines.Line2D object at 0x123ada120>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123b30050>
[<matplotlib.lines.Line2D object at 0x123ada270>]
<matplotlib.collections.PathCollection object at 0x123b302d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123ada3c0>, <matplotlib.lines.Line2D object at 0x123ada510>], 'caps': [<matplotlib.lines.Line2D object at 0x123ada660>, <matplotlib.lines.Line2D object at 0x123ada7b0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b30550>], 'medians': [<matplotlib.lines.Line2D object at 0x123ada900>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123b30410>
[<matplotlib.lines.Line2D object at 0x123adaa50>]
<matplotlib.collections.PathCollection object at 0x123b30690>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123adaba0>, <matplotlib.lines.Line2D object at 0x123adacf0>], 'caps': [<matplotlib.lines.Line2D object at 0x123adae40>, <matplotlib.lines.Line2D object at 0x123adaf90>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b30910>], 'medians': [<matplotlib.lines.Line2D object at 0x123adb0e0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123b307d0>
[<matplotlib.lines.Line2D object at 0x123adb230>]
<matplotlib.collections.PathCollection object at 0x123b30b90>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123adb380>, <matplotlib.lines.Line2D object at 0x123adb4d0>], 'caps': [<matplotlib.lines.Line2D object at 0x123adb620>, <matplotlib.lines.Line2D object at 0x123adb770>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b30e10>], 'medians': [<matplotlib.lines.Line2D object at 0x123adb8c0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123b30cd0>
[<matplotlib.lines.Line2D object at 0x123adba10>]
<matplotlib.collections.PathCollection object at 0x123b30a50>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123adbb60>, <matplotlib.lines.Line2D object at 0x123adbcb0>], 'caps': [<matplotlib.lines.Line2D object at 0x123adbe00>, <matplotlib.lines.Line2D object at 0x123b98050>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123b31090>], 'medians': [<matplotlib.lines.Line2D object at 0x123b981a0>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i inrange(len(risk_order))])ax.set_yticklabels(risk_order)ax.set_xlabel('Return 2009 (%)')ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.5)')ax.legend(title="Fund Type", loc='upper right')plt.tight_layout()plt.show()
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bee850>
[<matplotlib.lines.Line2D object at 0x123b9b380>]
<matplotlib.collections.PathCollection object at 0x123bee5d0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123b9b4d0>, <matplotlib.lines.Line2D object at 0x123b9b620>], 'caps': [<matplotlib.lines.Line2D object at 0x123b9b770>, <matplotlib.lines.Line2D object at 0x123b9b8c0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123beead0>], 'medians': [<matplotlib.lines.Line2D object at 0x123b9ba10>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bee990>
[<matplotlib.lines.Line2D object at 0x123b9bb60>]
<matplotlib.collections.PathCollection object at 0x123beec10>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123b9bcb0>, <matplotlib.lines.Line2D object at 0x123b9be00>], 'caps': [<matplotlib.lines.Line2D object at 0x123c84050>, <matplotlib.lines.Line2D object at 0x123c841a0>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123beee90>], 'medians': [<matplotlib.lines.Line2D object at 0x123c842f0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123beed50>
[<matplotlib.lines.Line2D object at 0x123c84440>]
<matplotlib.collections.PathCollection object at 0x123beefd0>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123c84590>, <matplotlib.lines.Line2D object at 0x123c846e0>], 'caps': [<matplotlib.lines.Line2D object at 0x123c84830>, <matplotlib.lines.Line2D object at 0x123c84980>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123bef250>], 'medians': [<matplotlib.lines.Line2D object at 0x123c84ad0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bef110>
[<matplotlib.lines.Line2D object at 0x123c84c20>]
<matplotlib.collections.PathCollection object at 0x123bef390>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123c84d70>, <matplotlib.lines.Line2D object at 0x123c84ec0>], 'caps': [<matplotlib.lines.Line2D object at 0x123c85010>, <matplotlib.lines.Line2D object at 0x123c85160>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123bef610>], 'medians': [<matplotlib.lines.Line2D object at 0x123c852b0>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bef4d0>
[<matplotlib.lines.Line2D object at 0x123c85400>]
<matplotlib.collections.PathCollection object at 0x123bef890>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123c85550>, <matplotlib.lines.Line2D object at 0x123c856a0>], 'caps': [<matplotlib.lines.Line2D object at 0x123c857f0>, <matplotlib.lines.Line2D object at 0x123c85940>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123befb10>], 'medians': [<matplotlib.lines.Line2D object at 0x123c85a90>], 'fliers': [], 'means': []}
<matplotlib.collections.FillBetweenPolyCollection object at 0x123bef9d0>
[<matplotlib.lines.Line2D object at 0x123c85be0>]
<matplotlib.collections.PathCollection object at 0x123bef750>
{'whiskers': [<matplotlib.lines.Line2D object at 0x123c85d30>, <matplotlib.lines.Line2D object at 0x123c85e80>], 'caps': [<matplotlib.lines.Line2D object at 0x123c85fd0>, <matplotlib.lines.Line2D object at 0x123c86120>], 'boxes': [<matplotlib.patches.PathPatch object at 0x123befd90>], 'medians': [<matplotlib.lines.Line2D object at 0x123c86270>], 'fliers': [], 'means': []}
How’s that done?
ax.set_yticks([i - (type_offset/2) for i inrange(len(risk_order))])ax.set_yticklabels(risk_order)ax.set_xlabel('Return 2009 (%)')ax.set_title('Horizontal Raincloud Plot: 2009 Returns (BW=0.25)')ax.legend(title="Fund Type", loc='upper right')plt.tight_layout()plt.show()
5. Scatterplot Analysis
We concluded by looking at relationships between variables.
5.1 Returns vs. Assets
How’s that done?
plt.figure(figsize=(12, 7))sns.scatterplot(data=df, x='Assets', y='Return 2009', hue='Type', alpha=0.7)plt.title('Scatterplot of 2009 Returns by Assets')plt.xlabel('Assets (in millions)')plt.ylabel('Return 2009 (%)')plt.grid(True, linestyle='--', alpha=0.5)plt.legend(title='Fund Type')plt.tight_layout()plt.show()
5.2 Returns vs. Expense Ratio (Custom Symbols)
Here we used custom markers to denote Fees (Checkmark vs X) and color for Risk.
How’s that done?
# Define custom markers: Check for 'Yes', X for 'No'markers = {"Yes": r'$\checkmark$', "No": "X"}plt.figure(figsize=(12, 8))sns.scatterplot( data=df, x='Expense Ratio', y='Return 2009', hue='Risk', style='Fees', markers=markers, s=150, alpha=0.8)plt.title('2009 Returns vs. Expense Ratio (Risk Color-coded, Fees by Symbol)')plt.xlabel('Expense Ratio (%)')plt.ylabel('Return 2009 (%)')plt.grid(True, linestyle='--', alpha=0.5)plt.legend(bbox_to_anchor=(1.05, 1), loc='upper left', title="Risk / Fees")plt.tight_layout()plt.show()
5.3 Interactive Plotly Scatterplot
Finally, here is the code to generate an interactive version of the previous plot using Plotly.