timetk is really, really handy

R
time_series
tidy
Author

Robert W. Walker

Published

March 24, 2021

timetk

The timetk or time toolkit package for R provides a glorious complementary fork to the tsibble adopted in the FPP 3 text [and the preceding forecast package built around fpp2]. If you want to know about time series data types, I cannot stress how useful and complete the vignette on time series coercion that is written intothe documentation for timetk

wrangling

Wrangling is tidy and some of the things that you may have found frustrating about aggregation before may make more sense when shown using this approach. For example, we needed effort to massage daily data on equities, that’s a specific function of the indexing in a daily time series where markets are closed on weekends. Time needs to be redefined. Questions like this get some attention here

condense_period

This function allow you easily aggregate data.frame data with the declaration of a time variable. Elsewhere, I have an example of pivoting data from wide to long. It is very handy.

library(tidyverse)
NWS <- read.csv(
  url("https://www.weather.gov/source/pqr/climate/webdata/Portland_dailyclimatedata.csv"), 
                skip=6, 
                na.strings = c("M","-", "")) %>% 
  rename(Variable = X) %>%
  mutate(across(where(is.character), 
                ~str_remove(.x, "/A"))) %>%
  filter(!(MO==1 & YR==2020))
library(magrittr)
# I really like the magrittr %<>% pipe for updating data during cleaning
# Start the daily data
NWS.Daily <- NWS %>% select(-AVG.or.Total)
# Rename the columns because Variable is actually X
names(NWS.Daily) <- c("YR","MO","Variable",paste0("Day.",1:31))
# Create the daily data frame though it contains days that do not actually exist. 
# Every month nominally has 31 days.
NWS.DF <- NWS.Daily %>% 
  pivot_longer(., cols=starts_with("Day."), names_to = "Day", values_to = "value") %>% 
  mutate(Day = str_remove(Day, "Day."))
NWS.DF %<>% pivot_wider(., names_from = "Variable", values_from = "value")
NWS.DF %<>% mutate(date = as.Date(paste(MO,Day,YR,sep="-"), format="%m-%d-%Y"))
NWS.DF$SN[NWS.DF$date==as.Date("1978-12-07")] <- 0
NWS.DF %<>% 
  mutate(PR = recode(PR, T = "0.005"), 
         SN = recode(SN, T = "0.005")) %>%
  mutate(High = as.numeric(TX), 
         Low = as.numeric(TN), 
         Precipitation = as.numeric(PR), 
         Snow = as.numeric(SN)
         ) %>%
    select(date, High, Low, Precipitation, Snow)
library(kableExtra)
head(NWS.DF, n=40) %>% kable() %>% scroll_box(width="600px", height="400px")
date High Low Precipitation Snow
1940-10-01 NA NA NA NA
1940-10-02 NA NA NA NA
1940-10-03 NA NA NA NA
1940-10-04 NA NA NA NA
1940-10-05 NA NA NA NA
1940-10-06 NA NA NA NA
1940-10-07 NA NA NA NA
1940-10-08 NA NA NA NA
1940-10-09 NA NA NA NA
1940-10-10 NA NA NA NA
1940-10-11 NA NA NA NA
1940-10-12 NA NA NA NA
1940-10-13 75 57 0.010 0
1940-10-14 70 53 0.005 0
1940-10-15 64 52 0.005 0
1940-10-16 72 50 0.000 0
1940-10-17 72 58 0.130 0
1940-10-18 78 58 0.000 0
1940-10-19 78 59 0.005 0
1940-10-20 64 54 0.140 0
1940-10-21 63 48 0.050 0
1940-10-22 61 41 0.000 0
1940-10-23 58 53 0.630 0
1940-10-24 57 48 1.030 0
1940-10-25 57 41 0.000 0
1940-10-26 57 38 0.000 0
1940-10-27 56 37 0.005 0
1940-10-28 53 45 0.180 0
1940-10-29 59 48 0.580 0
1940-10-30 59 50 0.500 0
1940-10-31 52 46 0.250 0
1940-11-01 52 40 0.170 0
1940-11-02 53 38 0.020 0
1940-11-03 47 36 0.005 0
1940-11-04 55 32 0.000 0
1940-11-05 51 42 0.070 0
1940-11-06 58 46 0.280 0
1940-11-07 56 46 0.850 0
1940-11-08 50 42 0.290 0
1940-11-09 48 35 0.020 0

Aggregating

This would normally be a pain; timetk makes it easy. I want to aggregate them by the last period of whatever month it happens to be.

library(timetk)
NWS.Analysis.M <- NWS.DF %>% condense_period(., date, .period="1 month", .side="end")

fpp3

library(fpp3)
── Attaching packages ──────────────────────────────────────────── fpp3 0.4.0 ──
✔ lubridate   1.9.0     ✔ feasts      0.3.0
✔ tsibble     1.1.3     ✔ fable       0.3.2
✔ tsibbledata 0.4.1     
── Conflicts ───────────────────────────────────────────────── fpp3_conflicts ──
✖ lubridate::date()        masks base::date()
✖ magrittr::extract()      masks tidyr::extract()
✖ dplyr::filter()          masks stats::filter()
✖ kableExtra::group_rows() masks dplyr::group_rows()
✖ tsibble::intersect()     masks base::intersect()
✖ tsibble::interval()      masks lubridate::interval()
✖ dplyr::lag()             masks stats::lag()
✖ magrittr::set_names()    masks purrr::set_names()
✖ tsibble::setdiff()       masks base::setdiff()
✖ tsibble::union()         masks base::union()
NWS.TS.M <- NWS.Analysis.M %>% mutate(YM=yearmonth(date)) %>% as_tsibble(index=YM)
NWS.Analysis.M %>%
  mutate(YM=yearmonth(date)) %>%
  select(YM, High, Low) %>%
  pivot_longer(c(High, Low)) %>%
  mutate(Temperature=name) %>%
  select(-name) %>%
  as_tsibble(index=YM, key=Temperature) %>%
  autoplot(value, alpha=0.3) + 
  labs(title="High and Low Temperatures in Portland, Oregon",
       x = "Month",
       y = "Temperature (F)") +
  theme_minimal()