Importing Excel Data

R
Author

Robert W. Walker

Published

September 13, 2020

How To Import a Microsoft Excel File

The go to tool comes from the readxl library in R. We can install it with:

install.packages("readxl")

To use it, the Markdown must call it – make it active – just as we must at the command line to make it work. The Files pane will make this easier, we can right click to import and get code from the subsequent interaction.

library(readxl)

The command to read Excel files comes in three forms:
- read_excel()
- read_xls()
- read_xlsx()

where the first works for all file types while the second is written specifically for older .xls files and the third is written for newer .xlsx file types. If we type ?read_excel we can obtain the help file that guides us through a host of situations including specifying a range of cells, whether or not to use the first row as column names, the data type in a column, what is missing [NA], whether rows should be skipped, and a host of others. The one thing that is required is the file that we wish to import [known to this command as the path]. We can acquire the file from the following link. If my file is known to my computer as /home/rob/Downloads/BondFunds.xlsx then I also want to be careful about the sheet, in this case, it is JustData:

library(tidyverse)
library(readxl)
Bonds <- read_excel(path="~/Downloads/BondFunds.xlsx", sheet="JustData")
Bonds
# A tibble: 184 × 9
   `Fund Number` Type         Assets Fees  Expen…¹ Retur…² 3-Yea…³ 5-Yea…⁴ Risk 
   <chr>         <chr>         <dbl> <chr>   <dbl>   <dbl>   <dbl>   <dbl> <chr>
 1 FN-1          Intermediat…  7268. No       0.45     6.9     6.9     5.5 Belo…
 2 FN-2          Intermediat…   475. No       0.5      9.8     7.5     6.1 Belo…
 3 FN-3          Intermediat…   193  No       0.71     6.3     7       5.6 Aver…
 4 FN-4          Intermediat… 18604. No       0.13     5.4     6.6     5.5 Aver…
 5 FN-5          Intermediat…   143. No       0.6      5.9     6.7     5.4 Aver…
 6 FN-6          Intermediat…  1402. No       0.54     5.7     6.4     6.2 Aver…
 7 FN-7          Intermediat…   986. No       0.49     3       6.8     5.3 Aver…
 8 FN-8          Intermediat…  2189. No       0.55     7.4     6.4     5.2 Belo…
 9 FN-9          Intermediat…   391. No       0.67     5.3     6.1     5   Belo…
10 FN-10         Intermediat…   544. No       0.63     5.7     6.2     5.1 Belo…
# … with 174 more rows, and abbreviated variable names ¹​`Expense Ratio`,
#   ²​`Return 2009`, ³​`3-Year Return`, ⁴​`5-Year Return`

That works just as planned. If I leave the specification of the sheet out, it will load the first sheet.

FirstSheet <- read_excel(path="~/Downloads/BondFunds.xlsx")
New names:
• `` -> `...1`
• `` -> `...2`
• `` -> `...3`
• `` -> `...4`
• `` -> `...5`
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
FirstSheet
# A tibble: 18 × 9
   ...1                    ...2      ...3  ...4  ...5  ...6  ...7  ...8  mosai…¹
   <chr>                   <chr>     <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl>  
 1 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
 2 Fees and Bond Type      Fees      <NA>  <NA>  <NA>  NA    NA    NA    NA     
 3 Type                    No        Yes   Gran… <NA>  NA    NA    NA    NA     
 4 Intermediate Government 0.288043… 0.18… 0.47… <NA>  NA    NA    NA    NA     
 5 Short Term Corporate    0.418478… 0.10… 0.52… <NA>  NA    NA    NA    NA     
 6 Grand Total             0.706521… 0.29… 1     <NA>  NA    NA    NA    NA     
 7 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
 8 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
 9 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
10 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
11 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
12 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
13 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
14 Risk and Bond Type      Risk      <NA>  <NA>  <NA>  NA    NA    NA    NA     
15 Type                    Above av… Aver… Belo… Gran… NA    NA    NA    NA     
16 Intermediate Government 29        32    26    87    NA    NA    NA    NA     
17 Short Term Corporate    30        37    30    97    NA    NA    NA    NA     
18 Grand Total             59        69    56    184   NA    NA    NA    NA     
# … with abbreviated variable name ¹​`mosaic(xtabs(~Type+Fees, data=Bonds))`

That is not well-formatted because the first sheet is a table of sorts. A little bit of Rvangelism.

A look at the magic of R.

I am going to use R’s ability to define variable(s) to make my life easier. I do not wish to know what the tabs are; let R figure it out.

AllSheets <- function(path) { # Feed a path to an Excel file
  WorkBook <- excel_sheets(path=path) %>% # Get a vector of sheet names
    map(~ read_excel(path=path, sheet=.x)) # Load each sheet by name
  names(WorkBook) <- excel_sheets(path=path) # Rename the sheets by their names
  return(WorkBook)
}
MyWB <- AllSheets(path="~/Downloads/BondFunds.xlsx")
New names:
New names:
• `` -> `...1`
• `` -> `...2`
• `` -> `...3`
• `` -> `...4`
• `` -> `...5`
• `` -> `...6`
• `` -> `...7`
• `` -> `...8`
MyWB
$ContingencyTable
# A tibble: 18 × 9
   ...1                    ...2      ...3  ...4  ...5  ...6  ...7  ...8  mosai…¹
   <chr>                   <chr>     <chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl>  
 1 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
 2 Fees and Bond Type      Fees      <NA>  <NA>  <NA>  NA    NA    NA    NA     
 3 Type                    No        Yes   Gran… <NA>  NA    NA    NA    NA     
 4 Intermediate Government 0.288043… 0.18… 0.47… <NA>  NA    NA    NA    NA     
 5 Short Term Corporate    0.418478… 0.10… 0.52… <NA>  NA    NA    NA    NA     
 6 Grand Total             0.706521… 0.29… 1     <NA>  NA    NA    NA    NA     
 7 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
 8 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
 9 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
10 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
11 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
12 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
13 <NA>                    <NA>      <NA>  <NA>  <NA>  NA    NA    NA    NA     
14 Risk and Bond Type      Risk      <NA>  <NA>  <NA>  NA    NA    NA    NA     
15 Type                    Above av… Aver… Belo… Gran… NA    NA    NA    NA     
16 Intermediate Government 29        32    26    87    NA    NA    NA    NA     
17 Short Term Corporate    30        37    30    97    NA    NA    NA    NA     
18 Grand Total             59        69    56    184   NA    NA    NA    NA     
# … with abbreviated variable name ¹​`mosaic(xtabs(~Type+Fees, data=Bonds))`

$DATA
# A tibble: 184 × 11
   `Fund Number` Type   Assets Fees  Expen…¹ Retur…² 3-Yea…³ 5-Yea…⁴ Risk  ...10
   <chr>         <chr>   <dbl> <chr>   <dbl>   <dbl>   <dbl>   <dbl> <chr> <lgl>
 1 FN-1          Inter…  7268. No       0.45     6.9     6.9     5.5 Belo… NA   
 2 FN-2          Inter…   475. No       0.5      9.8     7.5     6.1 Belo… NA   
 3 FN-3          Inter…   193  No       0.71     6.3     7       5.6 Aver… NA   
 4 FN-4          Inter… 18604. No       0.13     5.4     6.6     5.5 Aver… NA   
 5 FN-5          Inter…   143. No       0.6      5.9     6.7     5.4 Aver… NA   
 6 FN-6          Inter…  1402. No       0.54     5.7     6.4     6.2 Aver… NA   
 7 FN-7          Inter…   986. No       0.49     3       6.8     5.3 Aver… NA   
 8 FN-8          Inter…  2189. No       0.55     7.4     6.4     5.2 Belo… NA   
 9 FN-9          Inter…   391. No       0.67     5.3     6.1     5   Belo… NA   
10 FN-10         Inter…   544. No       0.63     5.7     6.2     5.1 Belo… NA   
# … with 174 more rows, 1 more variable: `par(mfrow=c(2,2))` <chr>, and
#   abbreviated variable names ¹​`Expense Ratio`, ²​`Return 2009`,
#   ³​`3-Year Return`, ⁴​`5-Year Return`

$IGDATA
# A tibble: 87 × 11
   `Fund Number` Type  Assets Fees  Expen…¹ Retur…² 3-Yea…³ 5-Yea…⁴ Risk    Bins
   <chr>         <chr>  <dbl> <chr>   <dbl>   <dbl>   <dbl>   <dbl> <chr>  <dbl>
 1 FN-1          Inte…  7268. No       0.45     6.9     6.9     5.5 Belo… -10.0 
 2 FN-2          Inte…   475. No       0.5      9.8     7.5     6.1 Belo…  -5.01
 3 FN-3          Inte…   193  No       0.71     6.3     7       5.6 Aver…  -0.01
 4 FN-4          Inte… 18604. No       0.13     5.4     6.6     5.5 Aver…   4.99
 5 FN-5          Inte…   143. No       0.6      5.9     6.7     5.4 Aver…   9.99
 6 FN-6          Inte…  1402. No       0.54     5.7     6.4     6.2 Aver…  15.0 
 7 FN-7          Inte…   986. No       0.49     3       6.8     5.3 Aver…  20.0 
 8 FN-8          Inte…  2189. No       0.55     7.4     6.4     5.2 Belo…  25.0 
 9 FN-9          Inte…   391. No       0.67     5.3     6.1     5   Belo…  30.0 
10 FN-10         Inte…   544. No       0.63     5.7     6.2     5.1 Belo…  35.0 
# … with 77 more rows, 1 more variable: Midpoints <dbl>, and abbreviated
#   variable names ¹​`Expense Ratio`, ²​`Return 2009`, ³​`3-Year Return`,
#   ⁴​`5-Year Return`

$STCDATA
# A tibble: 97 × 11
   `Fund Number` Type  Assets Fees  Expen…¹ Retur…² 3-Yea…³ 5-Yea…⁴ Risk    Bins
   <chr>         <chr>  <dbl> <chr>   <dbl>   <dbl>   <dbl>   <dbl> <chr>  <dbl>
 1 FN-88         Shor…  139.  No       0.51     5.5     5.1     4.3 Belo… -10.0 
 2 FN-89         Shor…  124.  No       0.32     5       4.4     4.1 Belo…  -5.01
 3 FN-90         Shor… 1922   Yes      1.08    12.1     5.5     5   Aver…  -0.01
 4 FN-91         Shor…  203.  Yes      1        8.3     5.2     4.4 Belo…   4.99
 5 FN-92         Shor…   66.1 No       0.71     6.8     4.9     4   Belo…   9.99
 6 FN-93         Shor… 1346   No       0.65     8.6     6.1     4.8 Belo…  15.0 
 7 FN-94         Shor… 4773.  No       0.56     5       4.9     4.2 Belo…  20.0 
 8 FN-95         Shor…   77.5 No       0.51     2.2     3.7     3.5 Belo…  25.0 
 9 FN-96         Shor…   76.2 No       0.68     2.5     5.2     4.2 Belo…  30.0 
10 FN-97         Shor…  146.  No       0.55    12.2     5.8     4.8 Aver…  35.0 
# … with 87 more rows, 1 more variable: Midpoints <dbl>, and abbreviated
#   variable names ¹​`Expense Ratio`, ²​`Return 2009`, ³​`3-Year Return`,
#   ⁴​`5-Year Return`

$RETURN2009
# A tibble: 97 × 2
   `Intermediate Government` `Short Term Corporate`
                       <dbl>                  <dbl>
 1                       6.9                    5.5
 2                       9.8                    5  
 3                       6.3                   12.1
 4                       5.4                    8.3
 5                       5.9                    6.8
 6                       5.7                    8.6
 7                       3                      5  
 8                       7.4                    2.2
 9                       5.3                    2.5
10                       5.7                   12.2
# … with 87 more rows

$JustData
# A tibble: 184 × 9
   `Fund Number` Type         Assets Fees  Expen…¹ Retur…² 3-Yea…³ 5-Yea…⁴ Risk 
   <chr>         <chr>         <dbl> <chr>   <dbl>   <dbl>   <dbl>   <dbl> <chr>
 1 FN-1          Intermediat…  7268. No       0.45     6.9     6.9     5.5 Belo…
 2 FN-2          Intermediat…   475. No       0.5      9.8     7.5     6.1 Belo…
 3 FN-3          Intermediat…   193  No       0.71     6.3     7       5.6 Aver…
 4 FN-4          Intermediat… 18604. No       0.13     5.4     6.6     5.5 Aver…
 5 FN-5          Intermediat…   143. No       0.6      5.9     6.7     5.4 Aver…
 6 FN-6          Intermediat…  1402. No       0.54     5.7     6.4     6.2 Aver…
 7 FN-7          Intermediat…   986. No       0.49     3       6.8     5.3 Aver…
 8 FN-8          Intermediat…  2189. No       0.55     7.4     6.4     5.2 Belo…
 9 FN-9          Intermediat…   391. No       0.67     5.3     6.1     5   Belo…
10 FN-10         Intermediat…   544. No       0.63     5.7     6.2     5.1 Belo…
# … with 174 more rows, and abbreviated variable names ¹​`Expense Ratio`,
#   ²​`Return 2009`, ³​`3-Year Return`, ⁴​`5-Year Return`