Something is easier to forecast if:
``He who sees the past as surprise-free is bound to have a future full of surprises.’’ - Amos Tversky
tsibble
objects are index
and key
tsibble
objects# A tsibble: 15,150 x 6 [1Y]
# Key: Country [263]
Year Country GDP Imports Exports Population
<dbl> <fct> <dbl> <dbl> <dbl> <dbl>
1 1960 Afghanistan 537777811. 7.02 4.13 8996351
2 1961 Afghanistan 548888896. 8.10 4.45 9166764
3 1962 Afghanistan 546666678. 9.35 4.88 9345868
4 1963 Afghanistan 751111191. 16.9 9.17 9533954
5 1964 Afghanistan 800000044. 18.1 8.89 9731361
6 1965 Afghanistan 1006666638. 21.4 11.3 9938414
7 1966 Afghanistan 1399999967. 18.6 8.57 10152331
8 1967 Afghanistan 1673333418. 14.2 6.77 10372630
9 1968 Afghanistan 1373333367. 15.2 8.90 10604346
10 1969 Afghanistan 1408888922. 15.0 10.1 10854428
# … with 15,140 more rows
tsibble
objects# A tsibble: 24,320 x 5 [1Q]
# Key: Region, State, Purpose [304]
Quarter Region State Purpose Trips
<qtr> <chr> <chr> <chr> <dbl>
1 1998 Q1 Adelaide SA Business 135.
2 1998 Q2 Adelaide SA Business 110.
3 1998 Q3 Adelaide SA Business 166.
4 1998 Q4 Adelaide SA Business 127.
5 1999 Q1 Adelaide SA Business 137.
6 1999 Q2 Adelaide SA Business 200.
7 1999 Q3 Adelaide SA Business 169.
8 1999 Q4 Adelaide SA Business 134.
9 2000 Q1 Adelaide SA Business 154.
10 2000 Q2 Adelaide SA Business 169.
# … with 24,310 more rows
tsibble
objectsA tsibble
allows storage and manipulation of multiple time series in R.
It contains:
It works with tidyverse functions.
tsibble
indextsibble
indextsibble
indexFor observations more frequent than once per year, we need to use a time class function on the index.
tsibble
indexFor observations more frequent than once per year, we need to use a time class function on the index.
tsibble
indexCommon time index variables can be created with these functions:
Frequency | Function |
---|---|
Annual | start:end |
Quarterly | yearquarter() |
Monthly | yearmonth() |
Weekly | yearweek() |
Daily | as_date() , ymd() |
Sub-daily | as_datetime() |
# A tibble: 3,072 × 6
date state gender legal indigenous count
<date> <chr> <chr> <chr> <chr> <dbl>
1 2005-03-01 ACT Female Remanded ATSI 0
2 2005-03-01 ACT Female Remanded Other 2
3 2005-03-01 ACT Female Sentenced ATSI 0
4 2005-03-01 ACT Female Sentenced Other 0
5 2005-03-01 ACT Male Remanded ATSI 7
6 2005-03-01 ACT Male Remanded Other 58
7 2005-03-01 ACT Male Sentenced ATSI 0
8 2005-03-01 ACT Male Sentenced Other 0
9 2005-03-01 NSW Female Remanded ATSI 51
10 2005-03-01 NSW Female Remanded Other 131
# … with 3,062 more rows
# A tibble: 3,072 × 7
date state gender legal indigenous count Quarter
<date> <chr> <chr> <chr> <chr> <dbl> <qtr>
1 2005-03-01 ACT Female Remanded ATSI 0 2005 Q1
2 2005-03-01 ACT Female Remanded Other 2 2005 Q1
3 2005-03-01 ACT Female Sentenced ATSI 0 2005 Q1
4 2005-03-01 ACT Female Sentenced Other 0 2005 Q1
5 2005-03-01 ACT Male Remanded ATSI 7 2005 Q1
6 2005-03-01 ACT Male Remanded Other 58 2005 Q1
7 2005-03-01 ACT Male Sentenced ATSI 0 2005 Q1
8 2005-03-01 ACT Male Sentenced Other 0 2005 Q1
9 2005-03-01 NSW Female Remanded ATSI 51 2005 Q1
10 2005-03-01 NSW Female Remanded Other 131 2005 Q1
# … with 3,062 more rows
# A tibble: 3,072 × 6
state gender legal indigenous count Quarter
<chr> <chr> <chr> <chr> <dbl> <qtr>
1 ACT Female Remanded ATSI 0 2005 Q1
2 ACT Female Remanded Other 2 2005 Q1
3 ACT Female Sentenced ATSI 0 2005 Q1
4 ACT Female Sentenced Other 0 2005 Q1
5 ACT Male Remanded ATSI 7 2005 Q1
6 ACT Male Remanded Other 58 2005 Q1
7 ACT Male Sentenced ATSI 0 2005 Q1
8 ACT Male Sentenced Other 0 2005 Q1
9 NSW Female Remanded ATSI 51 2005 Q1
10 NSW Female Remanded Other 131 2005 Q1
# … with 3,062 more rows
# A tsibble: 3,072 x 6 [1Q]
# Key: state, gender, legal, indigenous [64]
state gender legal indigenous count Quarter
<chr> <chr> <chr> <chr> <dbl> <qtr>
1 ACT Female Remanded ATSI 0 2005 Q1
2 ACT Female Remanded ATSI 1 2005 Q2
3 ACT Female Remanded ATSI 0 2005 Q3
4 ACT Female Remanded ATSI 0 2005 Q4
5 ACT Female Remanded ATSI 1 2006 Q1
6 ACT Female Remanded ATSI 1 2006 Q2
7 ACT Female Remanded ATSI 1 2006 Q3
8 ACT Female Remanded ATSI 0 2006 Q4
9 ACT Female Remanded ATSI 0 2007 Q1
10 ACT Female Remanded ATSI 1 2007 Q2
# … with 3,062 more rows
The Pharmaceutical Benefits Scheme (PBS) is the Australian government drugs subsidy scheme.
tsibble
objects# A tsibble: 67,596 x 9 [1M]
# Key: Concession, Type, ATC1, ATC2 [336]
Month Concession Type ATC1 ATC1_…¹ ATC2 ATC2_…² Scripts Cost
<mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1991 Jul Concessional Co-payments A Alimen… A01 STOMAT… 18228 67877
2 1991 Aug Concessional Co-payments A Alimen… A01 STOMAT… 15327 57011
3 1991 Sep Concessional Co-payments A Alimen… A01 STOMAT… 14775 55020
4 1991 Oct Concessional Co-payments A Alimen… A01 STOMAT… 15380 57222
5 1991 Nov Concessional Co-payments A Alimen… A01 STOMAT… 14371 52120
6 1991 Dec Concessional Co-payments A Alimen… A01 STOMAT… 15028 54299
7 1992 Jan Concessional Co-payments A Alimen… A01 STOMAT… 11040 39753
8 1992 Feb Concessional Co-payments A Alimen… A01 STOMAT… 15165 54405
9 1992 Mar Concessional Co-payments A Alimen… A01 STOMAT… 16898 61108
10 1992 Apr Concessional Co-payments A Alimen… A01 STOMAT… 18141 65356
# … with 67,586 more rows, and abbreviated variable names ¹ATC1_desc,
# ²ATC2_desc
tsibble
objectsWe can use the filter()
function to select rows.
# A tsibble: 816 x 9 [1M]
# Key: Concession, Type, ATC1, ATC2 [4]
Month Concession Type ATC1 ATC1_…¹ ATC2 ATC2_…² Scripts Cost
<mth> <chr> <chr> <chr> <chr> <chr> <chr> <dbl> <dbl>
1 1991 Jul Concessional Co-paymen… A Alimen… A10 ANTIDI… 89733 2.09e6
2 1991 Aug Concessional Co-paymen… A Alimen… A10 ANTIDI… 77101 1.80e6
3 1991 Sep Concessional Co-paymen… A Alimen… A10 ANTIDI… 76255 1.78e6
4 1991 Oct Concessional Co-paymen… A Alimen… A10 ANTIDI… 78681 1.85e6
5 1991 Nov Concessional Co-paymen… A Alimen… A10 ANTIDI… 70554 1.69e6
6 1991 Dec Concessional Co-paymen… A Alimen… A10 ANTIDI… 75814 1.84e6
7 1992 Jan Concessional Co-paymen… A Alimen… A10 ANTIDI… 64186 1.56e6
8 1992 Feb Concessional Co-paymen… A Alimen… A10 ANTIDI… 75899 1.73e6
9 1992 Mar Concessional Co-paymen… A Alimen… A10 ANTIDI… 89445 2.05e6
10 1992 Apr Concessional Co-paymen… A Alimen… A10 ANTIDI… 97315 2.23e6
# … with 806 more rows, and abbreviated variable names ¹ATC1_desc, ²ATC2_desc
tsibble
objectsWe can use the select()
function to select columns.
# A tsibble: 816 x 4 [1M]
# Key: Concession, Type [4]
Month Concession Type Cost
<mth> <chr> <chr> <dbl>
1 1991 Jul Concessional Co-payments 2092878
2 1991 Aug Concessional Co-payments 1795733
3 1991 Sep Concessional Co-payments 1777231
4 1991 Oct Concessional Co-payments 1848507
5 1991 Nov Concessional Co-payments 1686458
6 1991 Dec Concessional Co-payments 1843079
7 1992 Jan Concessional Co-payments 1564702
8 1992 Feb Concessional Co-payments 1732508
9 1992 Mar Concessional Co-payments 2046102
10 1992 Apr Concessional Co-payments 2225977
# … with 806 more rows
tsibble
objectsWe can use the summarise()
function to summarise over keys.
PBS %>%
filter(ATC2 == "A10") %>%
select(Month, Concession, Type, Cost) %>%
summarise(total_cost = sum(Cost))
# A tsibble: 204 x 2 [1M]
Month total_cost
<mth> <dbl>
1 1991 Jul 3526591
2 1991 Aug 3180891
3 1991 Sep 3252221
4 1991 Oct 3611003
5 1991 Nov 3565869
6 1991 Dec 4306371
7 1992 Jan 5088335
8 1992 Feb 2814520
9 1992 Mar 2985811
10 1992 Apr 3204780
# … with 194 more rows
tsibble
objectsWe can use the mutate()
function to create new variables.
PBS %>%
filter(ATC2 == "A10") %>%
select(Month, Concession, Type, Cost) %>%
summarise(total_cost = sum(Cost)) %>%
mutate(total_cost = total_cost/1e+06)
# A tsibble: 204 x 2 [1M]
Month total_cost
<mth> <dbl>
1 1991 Jul 3.53
2 1991 Aug 3.18
3 1991 Sep 3.25
4 1991 Oct 3.61
5 1991 Nov 3.57
6 1991 Dec 4.31
7 1992 Jan 5.09
8 1992 Feb 2.81
9 1992 Mar 2.99
10 1992 Apr 3.20
# … with 194 more rows
tsibble
objectsWe can use the mutate()
function to create new variables.
# A tsibble: 204 x 2 [1M]
Month total_cost
<mth> <dbl>
1 1991 Jul 3.53
2 1991 Aug 3.18
3 1991 Sep 3.25
4 1991 Oct 3.61
5 1991 Nov 3.57
6 1991 Dec 4.31
7 1992 Jan 5.09
8 1992 Feb 2.81
9 1992 Mar 2.99
10 1992 Apr 3.20
# … with 194 more rows
Trend: pattern exists when there is a long-term increase or decrease in the data.
Seasonal: pattern exists when a series is influenced by seasonal factors (e.g., the quarter of the year, the month, or day of the week).
Cyclic: pattern exists when data exhibit rises and falls that are (duration usually of at least 2 years).
Differences between seasonal and cyclic patterns:
The timing of peaks and troughs is predictable with seasonal data, but unpredictable in the long term with cyclic data.
gg_season()
gg_subseries()
# A tsibble: 52,608 x 5 [30m] <Australia/Melbourne>
Time Demand Temperature Date Holiday
<dttm> <dbl> <dbl> <date> <lgl>
1 2012-01-01 00:00:00 4383. 21.4 2012-01-01 TRUE
2 2012-01-01 00:30:00 4263. 21.0 2012-01-01 TRUE
3 2012-01-01 01:00:00 4049. 20.7 2012-01-01 TRUE
4 2012-01-01 01:30:00 3878. 20.6 2012-01-01 TRUE
5 2012-01-01 02:00:00 4036. 20.4 2012-01-01 TRUE
6 2012-01-01 02:30:00 3866. 20.2 2012-01-01 TRUE
7 2012-01-01 03:00:00 3694. 20.1 2012-01-01 TRUE
8 2012-01-01 03:30:00 3562. 19.6 2012-01-01 TRUE
9 2012-01-01 04:00:00 3433. 19.1 2012-01-01 TRUE
10 2012-01-01 04:30:00 3359. 19.0 2012-01-01 TRUE
# … with 52,598 more rows
# A tsibble: 640 x 3 [1Q]
# Key: State [8]
State Quarter Trips
<chr> <qtr> <dbl>
1 ACT 1998 Q1 196.
2 ACT 1998 Q2 127.
3 ACT 1998 Q3 111.
4 ACT 1998 Q4 170.
5 ACT 1999 Q1 108.
6 ACT 1999 Q2 125.
7 ACT 1999 Q3 178.
8 ACT 1999 Q4 218.
9 ACT 2000 Q1 158.
10 ACT 2000 Q2 155.
# … with 630 more rows
# A tsibble: 74 x 7 [1Q]
Quarter Beer Tobacco Bricks Cement Electricity Gas
<qtr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1992 Q1 443 5777 383 1289 38332 117
2 1992 Q2 410 5853 404 1501 39774 151
3 1992 Q3 420 6416 446 1539 42246 175
4 1992 Q4 532 5825 420 1568 38498 129
5 1993 Q1 433 5724 394 1450 39460 116
6 1993 Q2 421 6036 462 1668 41356 149
7 1993 Q3 410 6570 475 1648 42949 163
8 1993 Q4 512 5675 443 1863 40974 138
9 1994 Q1 449 5311 421 1468 40162 127
10 1994 Q2 381 5717 475 1755 41199 159
# … with 64 more rows
Each graph shows y_t plotted against y_{t-k} for different values of k. - The autocorrelations are the correlations associated with these scatterplots. - ACF (autocorrelation function): - r_1=\text{Correlation}(y_{t}, y_{t-1}) - r_2=\text{Correlation}(y_{t}, y_{t-2}) - r_3=\text{Correlation}(y_{t}, y_{t-3}) - etc.
Covariance and correlation: measure extent of linear relationship between two variables (y and X).
Autocovariance and autocorrelation: measure linear relationship between lagged values of a time series y.
We measure the relationship between:
We denote the sample autocovariance at lag k by c_k and the sample autocorrelation at lag k by r_k. Then define
c_k = \frac{1}{T}\sum_{t=k+1}^T (y_t-\bar{y})(y_{t-k}-\bar{y}) r_{k} = c_k/c_0
Results for first 9 lags for beer data:
Results for first 9 lags for beer data:
google_2015 <- gafa_stock %>%
filter(Symbol == "GOOG", year(Date) == 2015) %>%
select(Date, Close)
google_2015
# A tsibble: 252 x 2 [!]
Date Close
<date> <dbl>
1 2015-01-02 522.
2 2015-01-05 511.
3 2015-01-06 499.
4 2015-01-07 498.
5 2015-01-08 500.
6 2015-01-09 493.
7 2015-01-12 490.
8 2015-01-13 493.
9 2015-01-14 498.
10 2015-01-15 499.
# … with 242 more rows
White noise data is uncorrelated across time with zero mean and constant variance.
(Technically, we require independence as well.)
r_{1} | r_{2} | r_{3} | r_{4} | r_{5} | r_{6} | r_{7} | r_{8} | r_{9} | r_{10} |
---|---|---|---|---|---|---|---|---|---|
0.014 | -0.163 | 0.163 | -0.259 | -0.198 | 0.064 | -0.139 | -0.032 | 0.199 | -0.024 |
Sampling distribution of r_k for white noise data is asymptotically N(0,1/T).
Monthly total number of pigs slaughtered in the state of Victoria, Australia, from January 2014 through December 2018 (Source: Australian Bureau of Statistics.)
These show the series is not a white noise series.
You can compute the daily changes in the Google stock price in 2018 using
Does diff
look like white noise?
Models of Choice and Forecasting