Generalized exponential smoothing in prediction of

0 downloads 0 Views 2MB Size Report
of hierarchical functional time series. Daniel Kosiorowski ([email protected]). Department of Statistics, Cracow University of Economics,.
Generalized exponential smoothing in prediction of hierarchical functional time series Daniel Kosiorowski ([email protected]) Department of Statistics, Cracow University of Economics, Krakow, Poland; Dominik Mielczarek ([email protected]) and Jerzy P. Rydlewski ([email protected]) AGH University of Science and Technology, Faculty of Applied Mathematics, al. A. Mickiewicza 30, 30-059 Kraków, Poland; Małgorzata Snarska ([email protected]) Department of Financial Markets, Cracow University of Economics, Kraków, Poland

7 - 9 November 2016, MSA 2016 Conference, Łódź

Plan

I

What functional time series are?

I

What hierarchical time series are?

I

What hierarchical functional time series are?

I

How do other researchers deal with the problem of predicting ?

I

Our proposal.

I

Empirical examples and simulation study.

EXAMPLE OF FUNCTIONAL TIME SERIES (FTS)

0.00

0.01

0.02

yield

0.03

0.04

0.05

0.06

FTS of US yield curves

2

4

6

8

maturity

Figure 1: FTS of yield curves for USA 2001 – 2011

EXAMPLE OF HIERARCHIC FTS (HFTS)

25000 15000

20000

demand

30000

Electricity Demand Australia Total

0

5

10

15

20

time

time

1400 1200

demand

800

1000

8000 6000 5000

2000 10 15 20

7000

demand

6000

demand

4000

7000

demand

5000 5

3000

0

TAS

9000

QLD

8000

12000 10000 8000 6000

demand

SA

10000

VIC 9000

NSW

0

5

10 15 20 time

0

5

10 15 20 time

0

5

10 15 20 time

Figure 2: Electricity demand in Australia

0

5

10 15 20 time

RECONCILIATION OF FORECASTS PROBLEM Problem discussed in economic literature e.g. Orcutt et. al (1968), Weale (1988),Kahn (1998) Fliender (2001), Hyndman et. al (2010)

Figure 3: samples from GARCH

Figure 4: samples from SV

WHAT FUNCTIONAL TIME SERIES (FTS) ARE?

It is advantageous to look at random functions observed at regular time intervals as functional time series (FTS). Alternatively, FTS may arise by separating an almost continuous time record into natural consecutive intervals such as hours, days, weeks, months, years... Look at the random curve X = {X (t), t ∈ [0, 1]} as a random element of the space L2 = L2 ([0, 1]) equipped with the Borel σ−algebra. The L2 Ris a separable Hilbert space with the inner product < x , y >= x (t)y (t)dt. In Bosq (2000) monography it is carefully proved that such cds’s exist for functional processes with values in Banach or Hilbert space.

WHAT HIERARCHICAL TIME SERIES ARE?

Imagine data are observed at different levels of aggregation. It means i.e., e.g. national income and expenditure balance. A natural problem arises: how to use information obtained at those levels to make a prediction (at all levels.)

WHAT HIERARCHICAL FTS ARE?

𝑿𝒕𝒐𝒕𝒂𝒍 𝒏

𝑿𝟏𝒏

𝑿𝟏𝟏 𝒏

𝑿𝟏𝟐 𝒏

𝑿𝟑𝒏

𝑿𝟐𝒏

𝑿𝟏𝟑 𝒏

𝑿𝟐𝟏 𝒏

𝑿𝟐𝟐 𝒏

𝑿𝟐𝟑 𝒏

𝑿𝟑𝟏 𝒏

𝑿𝟑𝟐 𝒏

Figure 5: Hierarchic time series at time n

𝑿𝟑𝟑 𝒏

APPROACHES TO HFTS PREDICTING

1. Hyndman and Shang (2016) Grouped functional time series forecasting as a combination of individual forecasts using generalized least squares regression. 2. Bottom-up method: forecasting each of the disaggregated series at the lowest level of the hierarchy, and then using simple aggregation to obtain forecasts at higher level of the hierarchy. 3. Top-down method: forecasting aggregated series at the top level of the hierarchy, and then using disaggregation to obtain forecasts at lower level of the hierarchy based on historical proportions. Remark: The problem is how to predict individual FTS: Bosq, Besse et al., Hyndman and Shang (2009), Aue et al. (2016).

DEPTHS Let X N = {X1 , X2 , ..., XN } are scalars and FN (x ) = N −1

N P

I {Xi ≤ x }.

n=1

The halfspace depth of Xi is defined HDN (xi ) = min {FN (xi ), 1 − FN (xi )} , if Xi is the median FN (xi ) = 1/2 and so HDN (xi ) = 1/2 . Another depth DN (xi ) = 1 − |1/2 − FN (xi )| , the largest possible depth is 1.

DEPTHS FOR FUNCTIONAL DATA Depths express centrality of object w.r.t a sample or prob. distr e.g. Zuo and Serfling (2000). Depths for functions – various approaches e.g. Nieto Reyes and Battey (2016), Nagy et. al. (2016).

Figure 6: Generalized band depth calculation

LOCAL CORRECTED BAND DEPTH

-2

0

2

4

6

Local depths Paindaveine & Van Bever (2013), Sguera et al. (2016)

Original data New points from symetrisation Symetrised point -2

-1

0

1

2

Figure 7: Idea of symmetrisation of a sample

GLOBAL VS. LOCAL DEPTH

16

0.5

0.7

16

0.6 14

14 0.5

0.4 12

12

0.4

0.3 10

0.3

10

0.2 0.2

8 -2

0

2

4

6

8

Figure 8: global depth

8

0.1 -2

0

2

4

6

8

Figure 9: local depth

Let now X = {x1 , ..., xn } denote a sample of continuous curves defined on the compact interval T . Let λ denote the Lebesgue measure and let a(i1 , i2 ) = {t ∈ T : xi2 − xi1 ≥ 0}, where xi1 and 1 ,i2 )) xi2 are band delimiting objects. Let Li1 ,i2 = λ(a(i λ(T ) . A corrected band depth of a curve x with respect to the sample X is (see López-Pintado and R. Jörnsten 2007) cBD(x |X ) =

X 2 max{Li1 ,i2 , Li2 ,i1 }I{G(x ∗)∈Bc } n(n − 1) 1≤i 12 and Bc = I{Li ,i ≥ 1 } Bc (xi1 , xi2 ) + I{Li ,i > 1 } Bc (xi2 , xi1 ), where 1 2 2 2 1 2 Bc (xi1 , xi2 ) = {(t, y ) : t ∈ a(i1 , i2 ), xi1 (t) ≤ y ≤ xi2 (t)}.

(1)

A corrected generalized band depth of a curve x with respect to the sample X is (see López-Pintado and R. Jörnsten (2007) cGBD(x |X ) =

X 2 λ(Ac (x ; xi1 , xi2 )) n(n − 1) 1≤i . 2 Band depth is thus modified so as to consider only the proportion of the domain where the delimiting curves define a contiguous region which has non-zero width.

To define local depth we evaluate the depth regions of order α for cGBD, i.e. Rα (P) = {x : cGBD(x , P) ≥ α}. For any depth function D(x, P), the depth regions, Rα (P) = {x ∈ Rd : D(x, P) ≥ α} are of paramount importance as they reveal very diverse characteristic of P : location, scatter, dependency structure (clearly these regions are nested and inner regions contain larger depth). When defining local depth it will be more appropriate to index the family {Rα (P)} by means of probability contents.

Local depth

Consequently, for any β ∈ [0, 1] we define the smallest depth region with P-probability equal or larger than β as R β (P) =

\

Rα (P),

α∈A(β)

where A(β) = {α ≥ 0 : P(Rα (P)) ≥ β}. The depth regions Rα (P) or R β (P) provide neighborhood of the deepest point only. However we can replace P by its symmetrised version Px = 21 P X + 12 P 2x−X (neighborhoods) of nonparametric nature, where β is the locality parameter.

Definition Let D(·, P) be a depth function. The corresponding sample local depth function at the locality level β ∈ (0, 1] is β(n) LD β (x, P (n) ) = D(x, Px β(n) ), where Px denotes the empirical measure with those data points that belong to Rxβ (P (n) ). Rxβ (P (n) ) is the smallest sample depth region that contains at least a proportion β of the 2n random vectors X1 , ..., Xn , 2x − X1 , ..., 2x − Xn . Depth is always well defined – it is affine invariant from original depth.

For β = 1 we obtain global depth, while for β = 0 we obtain extreme localization. As in the population case, our sample local depth will require considering, for any x ∈ Rd , the symmetrized distribution Pxn which is empirical distribution associated with X1 , ..., Xn , 2x − X1 , ..., 2x − Xn . Sample properties of the local versions of depths result from general findings presented in Paindaveine & Van Bever (2013), Other local depths see e.g. Sguera et al. (2016).

IMPLEMENTATIONS – DEPTHPROC R PACKAGE

Implementations of global and local versions of several depths including projection depth, Student, simplicial, Lp depth, regression depth and modified band depth can be found in free R package DepthProc – see Kosiorowski and Zawadzki (2014). For choosing the locality parameter β we recommend using cross validation related to an optimalization a certain merit criterion (the resolution being appropriate for studying phenomena).

MOVING LOCAL FUNCTIONAL MEDIAN

For sample of N functions X N ={Xi (s), s ∈ [s0 , sL ]}, let FD β (Y |X N ) denote sample functional depth of Y (s) with locality parameter β . sample β local median is defined as ME DFD β (X N ) = arg max FD β (Xi |X N ) i

Assume we observe a data stream Xi (s), i = 1, 2, ... PROPOSAL 1: As a predictor for a moment n + 1 let’s take ˆn+1 (s) = ME DFD β (Wn,k ), X where Wn,k denotes a moving window of length k ending in a moment n , i.e., Wn,k = {Xn−k+1 (s), ..., Xn (s)}.

LOCAL MOVING FUNCTIONAL MEDIAN – EXAMPLE

Figure 10: 100 obs from FAR(1)

Figure 11: 10-obs moving fmed

LOCAL FUNCTIONAL MEDIAN – EXAMPLE CONT.

Figure 12: 100 obs from FTS

Figure 13: 10-obs moving fmed

SIMPLE EXPONENTIAL SMOOTHING For sample of N functions X N ={Xi (s), s ∈ [s0 , sL ]}, let FD β (Y |X N ) denote sample functional depth of Y (s) with locality parameter β . Sample α trimmed mean with locality parameter β is defined as ave(α, β)(X N ) = ave(Xi : FD β (Xi |X N ) < α) Assume we observe a data stream Xi (s), i = 1, 2, ... PROPOSAL 2: As a predictor for a moment n + 1 let’s take ˆn+1 (s) = z · ave(α1 , β1 )(Wn ,k ) + (1 − z) · ave(α1 , β1 )(Wn ,k ), X 1 1 2 2 where Wn,k denotes a moving window of length k ending in a moment n , i.e., Wn,k = {Xn−k+1 (s), ..., Xn (s)}, z ∈ [0, 1] is a forgeting parameter, n2 < n1

PROPOSAL OF HIERARCHICAL FTS PREDICTOR

𝑿𝒕𝒐𝒕𝒂𝒍 𝒏

𝑿𝟏𝒏

𝑿𝟏𝟏 𝒏

𝑿𝟏𝟐 𝒏

𝑿𝟑𝒏

𝑿𝟐𝒏

𝑿𝟏𝟑 𝒏

𝑿𝟐𝟏 𝒏

𝑿𝟐𝟐 𝒏

𝑿𝟐𝟑 𝒏

𝑿𝟑𝟏 𝒏

𝑿𝟑𝟐 𝒏

Figure 14: Hierarchic time series at time n

𝑿𝟑𝟑 𝒏

OUR PROPOSAL CONT.

Step 1: At any aggregation level we independently compute level ˆn+1 X = MEDFD β {X level (Wn,k )}

or level ˆn+1 X = z · ave(α1 , β1 )(Wn1 ,k1 ) + (1 − z) · ave(α1 , β1 )(Wn2 ,k2 )

In Hyndman & Shang (2016) authors make predictions using functional regression.

OUR PROPOSAL CONT.

Step 2 – Smart reconciliation of individual forecasts: ˆ n+1 = F(X ˆ leveli1 , ..., X ˆ levelik ), X n+1 n+1 where F is a Generalized Least Squares Estimator (see Shang & Hyndman 2016).

OUR PROPOSAL CONT.

We can write our model in the form Rt = St bt , where Rt is a vector of all series at all levels, bt is a vector of most disaggregated data and St is a matrix that shows a relation between them. We propose to do the base forecast: ˆ n+1 = Sn+1 βn+1 + n+1 , R ˆ n+1 is a matrix of base forecast for all series at all levels, where R βn+1 = E [bn+1 |R1 , ..., Rn ] is the unknown mean of the forecast distribution of the most disaggregated series and n+1 denotes forecast errors.

OUR PROPOSAL – CONT.

We propose to use the generalized least square method as in Shang & Hyndman (2016) 

T βˆn+1 = Sn+1 W −1 Sn+1

−1

T ˆ n+1 Sn+1 W −1 R

modified so that we use a robust estimator of the dispersion matrix W , i.e. instead of diagonal matrix, which contains forecast variances of each series, we use a robust measure of forecast dispersion.

If we consider a hierarchy as in above picture, our dispersion matrix takes a form: W = diag{v total , v level1 , v level2 , v level11 , ..., v level33 } where v level = V

(Z 0

T



level ˆnlevel Xnk −X

2

)

ds, k ∈ Klevel , n = 1, ..., N

where Klevel is a number of obs. at considered level in time n, n=1,...,N, and where V is a robust measure of dispersion. We propose to use c · MAD or take into account dependency structure between series using MCD (minimum covariance matrix estimator).

PROPERTIES OF OUR PROPOSAL – REAL DATA

25000 20000

demand

30000

Predicted Electricity Demand in Australia, beta=0.2, k=10

0

5

10

15

20

time

5

10 15 20 time

1400 1200 demand

6000

800

5000 0

5

10 15 20 time

TAS

0

5

10 15 20 time

900 1000

7000

demand

6000 demand

4000 2000

4000 0

QLD

8000

7000 6000 demand

5000

9000 8000 7000 6000

demand

SA 8000

VIC

10000 11000

NSW

0

5

10 15 20

0

time

Figure 15: Predicted Electricity demand in Australia

5

10 15 20 time

PROPERTIES OF OUR PROPOSALS – REAL DATA integrated errors of 297 forecasts in Australia MAD=1033249

-6000

NSW MAD=131862

-1000

0

1000

-4000

-2000

0

VIC

SA

MAD=108059

MAD=13844

-1000

0 500

-6000

0

2000

4000

6000

QLD

TAS

MAD=3333760

4000

-1000

0

MAD=1509

500

-100

Figure 16: Predicted Electricity demand in Australia

0 50

150

PROPERTIES OF OUR PROPOSAL – SIMULATIONS I

We generated samples from SV, GARCH, Wiener, Brown bridge, FAR processes and various mixtures of these processes.

I

We considered several locality parameters different for the levels of hierarchy, moving window lengths, samples with and without functional outliers.

I

Performance of our proposal was compared with Hyndman & Shang (2016) proposals with respect to sum of integrated squared forecast residuals and with respect to MAD of integrated squared forecast residuals.

I

Our proposal seems to be more robust to functional outliers than Hyndman and Shang (2016) proposal. Moreover it is more appopriate for detecting change of regimes in HFTS setup.

SAMPLE DISTRIBUTTIONS, CONFIDENCE REGIONS, STATISTICAL INFERENCE, P-VALUES

Principal component scores series are considered as surrogates of original functional time series. Vinod and de Lacalle (2009): Maximum entropy bootstrap for time series, Hörman and Kokoszka (2012), Shang (2016): Bootstrap methods for stationary functional time series, Rana et. al (2016).

UNCERTAINITY EVALUATION Similarly as in Hyndman and Shang (2016) we propose to use maximum entropy bootstrap methodology to obtain confidence regions and to conduct statistical inference. The meboot and DepthProc R packages give the appropriate computational support.

SUMMARY AND CONCLUSSIONS

I

Hierarchical functional time series methodology open new areas of statistical as well as economical research. E-economy provides a great deal of HFTS data.

I

Our proposal of HFTS predictor basing on local moving functional median performs surprisingly well in comparison to a milestone proposal of Hyndman and Shang (2016)

I

Using the locality parameter in our proposals we can take into account different resolutions at which the considered phenomena are observed.

Selected references Aue, A., Dubart Norinho, D., Hörmann, S. (2015). On the prediction of stationary functional time series. Journal of the American Statistical Association, 110,509,378–392. D. Bosq, Linear Processes in Function Spaces. Springer, 2000. fda.usc M. Febrero-Bande, MO de la Fuente, Statistical Computing in Functional Data Analysis: The R Package fda.usc, Journal of Statistical Software, 51(4), 1–28. L. Horváth and P. Kokoszka, Inference for Functional Data with Applications, Springer, 2012. Rob J. Hyndman, Roman A. Ahmed, George Athanasopoulos , Han Lin Shang, Optimal combination forecasts for hierarchical time series, Computational Statistics & Data Analysis, Volume 55(9), 2579–2589, 2011.

R. J. Hyndman and H. L. Shang, Rainbow plots, bagplots, and boxplots for functional data, Journal of Computational and Graphical Statistics, 19(1), 29–45 (2010). Hyndman and Shang (2009) Forecasting functional time series, Journal of the Korean Statistical Society 38(3), 199 –221 (2009). D. Kosiorowski and Z. Zawadzki, DepthProc: An R Package for Robust Exploration of Multidimensional Economic Phenomena http://arxiv.org/pdf/1408.4542.pdf, 2014. D. Kosiorowski, Dilemmas of Robust Analysis of Economic Data Streams , Journal of Mathematical Sciences (Springer), vol. 218, No. 2,167–181, 2016.

M. Krzyśko, K. Deręgowski and T. Górecki, Jądrowa i funkcjonalna analiza głównych skladowych, MSA 2012, Łódź, (2012). S. López-Pintado and R. Jörnsten, Functional Analysis via Extensions of the Band Depth, IMS Lecture Notes–Monograph Series Complex Datasets and Inverse Problems: Tomography, Networks and Beyond, Vol. 54 (2007) pp. 103–120, Institute of Mathematical Statistics. A. Nieto-Reyes and H. Battey, A Topologically Valid Definition of Depth for Functional Data. Statistical Science 31(1): 61–79, 2016. D. Paindaveine, G. Van Bever, From Depth to Local Depth: A Focus on Centrality, Journal of the American. Statistical Asssociation, Vol. 108, No. 503(2013), Theory and Methods, pp. 1105–1119.

J.O. Ramsay, G. Hooker, S. Graves, Functional Data Analysis with R and Matlab, Springer, 2009. Stanislav Nagy, Daniel Hlubinka, Irene Gijbels, Integrated depth for functional data: Statistical properties and consistency, ESIAM Probability and Statistics, 2016. Carlo Sguera, Pedro Galeano, Rosa E. Lillo, arXiv1607.05042v1, 2016. Han Lin Shang, Rob J Hyndman, Grouped functional time series forecasting: an application to age-specific mortality rates, J Comput and Graphical Statistics, 2016. Han Lin Shang, Bootstrap methods for stationary functional time series, Statistics and Computing, 2016. H. D. Vinod, J. L. de Lacalle Maximum entropy bootstrap for time series: the meboot R package, Journal of Statistical Software 29(5), 2009.