Maximum Length Weighted Nearest Neighbor ... - Semantic Scholar

16 downloads 0 Views 2MB Size Report
the loads Xd+1 for the next day d + 1, WNN first finds the K nearest neighbors of the previous day Xd; then, the prediction for the new day is a weighted linear ...
Maximum Length Weighted Nearest Neighbor Approach for Electricity Load Forecasting Tommaso Colombo, Irena Koprinska, and Massimo Panella

Abstract— In this paper we present a new approach for time series forecasting, called Maximum Length Weighted Nearest Neighbor (MLWNN), which combines prediction based on sequence similarity with optimization techniques. MLWNN predicts the 24 hourly electricity loads for the next day, from a time sequence of previously electricity loads up to the current day. We evaluate MLWNN using electricity load data for two years, for three countries (Australia, Portugal and Spain), and compare its performance with three state-of-the-art methods (weighted nearest neighbor, pattern sequence-based forecasting and iterative neural network) and with two baselines. The results show that MLWNN is a promising approach for one day ahead electricity load forecasting.

I. I NTRODUCTION This paper proposes a new machine learning approach for forecasting the hourly electricity loads for the next day. Given a time series of electricity loads measured every hour up to a given day, we aim at forecasting the 24 hourly electricity loads for the next day. This task is categorized as ‘short-term load forecasting’. It is needed for the planning and operation of power systems, to ensure reliable and cost-effective electricity supply. It is also fundamental in deregulated electricity markets, where electricity prices are determined through a bidding process, to support the market participants in their transactions. Predicting the load for the next day accurately is a challenging task as the electricity time series has a number of nested cycles (daily, weekly, seasonal and yearly), and also shows random fluctuations depending on the household electricity usage, weather changes, large industrial units with irregular hours of operation and other variations. More generally, the dynamics of the electricity loads influences the behavior of the energy prices. This behavior is complex and has shown large unexpected volatility in the last decade. In this context, a tool providing accurate forecasts of electricity loads and prices, is very useful. Non-linear regression models, such as Neural Networks (NNs), have been successfully used for forecasting financial time series, electricity prices and other commodity prices, and were shown to be able to capture important characteristics such as fat tails, volatility, persistence and leverage effects [1]–[16]. Most of the existing approaches for short-term load forecasting consider one step ahead prediction, i.e. at time h the task is to predict the load for time h + 1. In this paper Tommaso Colombo and Massimo Panella are with the Department of Information Engineering, Electronics and Telecommunications (DIET) of the University of Rome “La Sapienza”, Via Eudossiana 18, 00184 Rome, Italy. Email: [email protected]. Irena Koprinska is with the School of Information Technology, University of Sydney, Sydney, NSW 2006, Australia. Email: [email protected].

we consider predicting all 24 hourly values for the next day simultaneously. This task was previously considered by the Weighted Nearest Neighbor (WNN) approach [17], which is a state-of-the-art approach. Suppose that Xd is a 24-dimensional vector consisting of the hourly loads for a day d. To predict the loads Xd+1 for the next day d + 1, WNN first finds the K nearest neighbors of the previous day Xd ; then, the prediction for the new day is a weighted linear combination of the load for the days following the nearest neighbors, where the weights are determined by the distance of the neighbors to Xd . In this paper we propose a new approach for predicting the hourly electricity load for the next day, called Maximum Length Weighted Nearest Neighbor (MLWNN), which extends the state-of-the-art WNN approach in several ways. Our contribution can be stated as follows: 1) Our proposed MLWNN approach can be seen as a generalization of WNN. While WNN finds the most similar sequence of previous days (when determining the nearest neighbors), MLWNN finds the longest sequence of hourly loads with a similarity higher than a given threshold. The best value of this threshold is determined by an optimization procedure. As a consequence, MLWNN is applicable to any forecasting horizon, as it directly operates on a sequence of hourly loads instead of a sequence of days, and it does not require the time series to be organized into 24 dimensional vectors corresponding to each day. 2) We conduct an evaluation of MLWNN using electricity data for two years, for Australia, Portugal and Spain. We compare the forecasting accuracy of MLWNN on the three datasets with WNN and two other advanced forecasting approaches: Iterative Neural Network (INN) [18] and Pattern Sequence-Based Forecasting (PSF) [19], and also with two baselines. The paper is organized as follows. In Section II we provide an overview of the most important and recent research works in the field of short-term load forecasting. The proposed MLWNN approach is presented in Section III. Section IV presents the experimental setup and Section V shows and discusses the results. Finally, our conclusions are drawn in Sections VI. II. R ELATED W ORKS There are two main groups of approaches for short-term load forecasting: statistical and computational intelligence. Prominent examples of the first group are exponential smoothing, Autoregressive Integrated Moving Average (ARIMA) and

Linear Regression (LR), and notable examples of the second group are NNs and support vector regression. A. Statistical Approaches Taylor et al. [20] considered the prediction of the hourly electricity load for Rio de Janeiro, from 1 to 24 hours ahead. They compared four methods: ARIMA, double seasonal HoltWinters exponential smoothing, backpropagation NN and a PCA-based LR. They found that the most accurate method was exponential smoothing which was also the fastest method. In [21] Taylor and McSharry compared exponential smoothing and PCA-based LR with two new methods (a different formulation of exponential smoothing and a periodic autoregression) using hourly data for Italy, Norway and Sweden. The results again showed that the double seasonal Holt-Winters exponential smoothing was the most accurate method. Soares and Medeiros [22] proposed a novel forecasting model with two components: deterministic for trends, seasonality and special days, and stochastic that uses linear autocorrelation. A different model was built for each hour of the day. An evaluation using Brazilian hourly data showed that the proposed approach obtained promising results, outperforming ARIMA and other methods. A semi-parametric additive regression method was proposed in [22] and used to forecast half-hourly electricity loads one day ahead for Australian data. A separate model was built for each half hour, using previous electricity loads, calendar and temperature variables. The forecasting method was evaluated offline on historical data and and also in real time on site, showing very good results. Fan and Hyndman [23] proposed a semi-parametric additive regression methodology that was used to forecast the halfhourly electricity loads one day ahead for the states of Victoria and South Australia in Australia. A separate model was built for each half hour, using the previous lagged electricity loads, calendar and temperature variables. The forecasting model showed excellent performance on both historical data and when applied in real time on site. B. Computational Intelligence Approaches NN-based approaches are probably the most popular approaches for load forecasting due to their ability to learn the time series from examples and to capture non-linear relationships between the predictor variables and the target variable [24]–[26]. Most of the proposed computational intelligence approaches considered the task of one step ahead prediction (e.g. 1 hour ahead); below we review approached that predict all 24 hourly values for the next day. Apart from WNN [17] which was already mentioned, there are two other notable approaches that predict the 24 hourly values for the next day: PSF and INN. PSF [19] is a generalization of WNN, which combines clustering with sequence matching. It first groups all vectors Xd from the training data into K clusters and labels them with the cluster number. Then it extracts a sequence of consecutive days, from day d backwards, and matches the cluster labels of this sequence against the training data to find a set ESd of

sequences that are the same. It then follows a nearest neighbor approach similarly to WNN, which finds the following day for each element of ESd and averages the 24 hourly loads of these following days, in order to produce the final 24 hourly predictions for day d + 1. The results showed that both WNN and PSF are very competitive approaches outperforming ARIMA, NNs and other methods. INN [18] is an iterative prediction method. At time h it makes a prediction for time h + 1; this prediction is added to the available data and used to make a prediction for time h + 2 and so on for all 24 points from the forecasting horizon. It uses a mutual information feature selector and a neural network forecasting algorithm. The results showed that it was able to provide accurate predictions, outperforming the non-iterative methods WNN and PSF. Wavelet-based approaches predicting the load for the next day have also been proposed. Reis and Alves da Silva [27] considered the task of 1-24 hours ahead prediction of hourly North American data. They used multilevel wavelet to decompose the electricity load into several components that were predicted separately by NNs trained with the backpropagation algorithm. Chen et al. [28] also considered the task of predicting the electricity load 1-day ahead from previous hourly loads using wavelet transformation and backpropagation NNs. They selected a day that is similar to the day to be forecasted in terms of weekly index and weather, decomposed the load for this day into two wavelet components and then trained a separate NN for each component. Non-wavelet features such as temperature, humidity, cloud cover and precipitation were also used as inputs to the NNs. An evaluation using four years data for the state of New England was conducted, showing a mean absolute percentage error of 1.24-2.22%. III. T HE P ROPOSED A PPROACH TO S HORT-T ERM E LECTRICITY L OAD F ORECASTING We consider a ’one day ahead’ prediction problem: given the hourly loads recorded in the past up to day d, the goal is to forecast the 24 hourly loads corresponding to day d + 1. More precisely, given a time series S(n), n > 0, of hourly loads that are known up to hour h, we want to determine the hourly loads S(h + 1), . . . , S(h + 24), which means considering a forecasting horizon of 24 time steps. This formulation is a generalization of the standard ’one day ahead’ prediction problem, as it does not require a reorganization of the hourly time series data into 24-dimensional vectors. Thus, our proposed approach is general and can be applied to any forecasting horizon f ≥ 1, to predict the samples S(h + 1), . . . , S(h + f ). We firstly introduce the main parameters of the method, that have to be specified in advance: • forecasting horizon, an integer value f > 0; • number of nearest neighbor patterns (subsequences), an integer value K ≥ 1; • upper bound of the dimension of the subsequences associated with the nearest neighbors, an integer value W ≥ 1; • similarity threshold under which two subsequences are not considered as sufficiently similar with respect to a

suitable defined similarity measure, a real-valued parameter θ > 0. The proposed MLWNN algorithm is based on the following steps, applied at any time instant h using the available data: 1) Let w be a counter variable initialized to 1. 2) Store a dataset of observations in a matrix Yw ∈ R(h−f −w+1)×w , where the ith row yw,i of Yw , i = 1 . . . (h − f − w + 1), is a vector of w hourly loads: yw,i = [S(i) S(i + 1) . . . S(i + w − 1)] .

(1)

We note that the most recent data sample stored in Yw is S(h − f ), and that it is located in the last row and the rightmost column. 3) Let zw be the vector containing the most recent known w samples of hourly loads: zw = [S(h − w + 1) S(h − w + 2) . . . S(h)] .

(2)

4) Using the Chebyshev distance, find the K nearest neighbors for the vector zw among the rows of Yw . The Chebyshev distance D(zw , yw,i ) between zw and yw,i is defined as: kzw − yw,i k = max zw (j) − yw,i (j) . (3) ∞

j=1...w

5) Let Dmax be the maximum Chebyshev distance among the found K nearest neighbors and let zmax be the maximum absolute value of the elements in zw . If Dmax zmax > θ then go to step (8). 6) Store wbest ← w as a new value for w and also the reference vector q0 ← zw . Store the Nearest Set (NS) of vectors {q1 , q2 , . . . , qK } using the K nearest neighbors vectors found, where q1 is the nearest vector of q0 and qK is the furthest nearest vector. 7) If w < min{h − f − K + 1, W } then increase w + 1 ← w and go back to step (2)1 . 8) Extract the f following samples for each vector in NS and compute a weighted average to produce the prediction of S(h + 1), . . . , S(h + f ), as explained below. At the end of this procedure, we will obtain the set of nearest neighbors NS for q0 ; each of them can be associated with the vector containing the f following load values. For example, if qn = ywbest ,i , 1 ≤ n ≤ K, as in (1), then: ) q(f n = [S(i + w) S(i + w + 1) . . . S(i + w + f − 1)] . (4) (f )

The loads to be predicted are q0 : (f )

q0

= [S(h + 1) S(h + 2) . . . S(h + f )] .

The weighted average, which determines based on the following formula: (f )

q0

= PK

1

n=1

1 The

αn

K X

(f ) q0 ,

) αn q(f n ,

(5)

is therefore

(6)

n=1

number of rows in Yw must be greater than or equal to the number K of nearest neighbors, i.e. h − f − w + 1 ≥ K, and hence w ≤ h − f − K + 1. Also, since w ≥ 1, we must have a sufficient number of observations so that h ≥ f + K.

where the weights αn are obtained as follows: kqK , q0 k∞ − kqn , q0 k∞ αn = . (7) kqK , q0 k∞ − kq1 , q0 k∞ Regarding the novelty of the proposed MLWNN approach, there are three main differences with respect to the original WNN method. Firstly, the length of similar vectors we are looking for in the previous data, in order to generate NS, is iteratively and automatically increased. This is based on the assumption that the longer the similar sequence is, the better it represents the local behavior of the time series. Also, the subsequence is increased based on a similarity threshold whose optimal value is determined using an optimization technique as described in the next section. MLWNN uses the Chebyshev distance while WNN uses the Euclidean distance. The Chebyshev distance is the largest of the elementby-element distances between the two vectors, and it has been chosen to find similarity between sequences by taking under control any divergence on a single element-by-element difference. MLWNN stops increasing the subsequence when the Chebyshev distance between the compared vectors is above the similarity threshold (expressed as a percentage similarity). We note that if the Chebyshev distance is below the threshold, this means that at least one element-by-element distance is below this threshold. Finally, WNN uses a window of previous days, while MLWNN does not require this, which means that a larger part of the available data (potentially all available data) can be used to make the prediction. The underlying assumption is that if we use more data, we may be able to find better candidates. The parameter w in (1) can be considered as the embedding dimension of the time series, that is the number of previous loads that will feed the regression model for estimating the next value to be predicted [19], [29]–[31]. This value can be found by other means, e.g. fractal dimensions and statistical methods, often assuming a chaotic behavior of the observed time series [32]. MLWNN algorithm aims at a more general heuristic for prediction, which tries to overcome the dependence of classical embedding approaches from the performance of estimation of the optimal dimension. It is also important to note that this approach provides the basis for further extensions, that can utilize more complex regression models and also neural and fuzzy NNs. In addition, after NS has been determined, the predicted values in (5) can be obtained through a nonlinear inference system that replaces equations (6) and (7). We note again the high flexibility of our proposed MLWNN algorithm, especially with respect to the forecasting horizon f . For example, MLWNN can be applied for one hour ahead prediction using the value f = 1. In the following sections, we report the results for f = 24, for the sake of comparison with WNN, PSW and INN approaches. IV. E XPERIMENTAL S ETUP We use electricity load data collected from Australia, Portugal and Spain for two years: from 1 January 2010 to 31 December 2011. The data are sampled every hour, thus the total

number of samples in each dataset is 2 ∗ 365 ∗ 24 = 17520. All datasets are publicly available: the Australian data is for the State of New South Wales (NSW), it is provided by the Australian Energy Market Operator (AEMO) and available from http://www.aemo.com.au. The Portuguese and Spanish data are provided by the Spanish Electricity Price Market Operator (OMEL) and available from http://www. omelholding.es. The electricity load has three main cycles: daily, weekly and yearly. Figs. 1-3 show the hourly electricity load for the three countries for one month, June 2011. From these plots we can clearly see the daily and weekly cycles, which are correlated with the human, industrial and commercial activities. During the day, the load has a minimum at 4am, a first peak at 9-10 am, stays relatively stable until the end of the working day and then it reaches a second peak at 6-7 pm. As expected, the load during the weekend is lower than the load during the weekdays. We can also see that the load for Portugal and Spain is more variable than the load for Australia, which might be due to greater weather and temperature fluctuations, and also to higher activity variations, especially given that the Australian data is for one region only (the state of NSW).

Fig. 1.

Australian hourly loads (June 2011).

Figure 3.1: Australian dataset, month of June 2011

Fig. 2.

Portuguese hourly loads (June 2011).

Figure 3.3:Australian Portuguese dataset, month of June 2011 Figure 3.2: dataset, month of December 2011 ahead prediction. Weather data available on the Australian Government Bureau of Meteorology were daily maximum temperature, minimum temperature, rainfall and solar exposure throughout the New South Wales territory. To aggregate those thousands of di↵erently located data, a population-based weighted average was computed and then a normalization was carried on to make them comparable to electricity load samples. Obviously not all cities in New South Wales were considered, but the most important only. The weighted average, for each city c in the set C, as follows: Given the daily weatherFig. sample xd,c andhourly the population c living in that city, 3. Spanish loads (June P 2011).

Figure Spanish dataset, dataset, month June 20112011 Pc xd,c ⇤month Figure 3.4: 3.5: Portuguese ofofDecember Xd = pattern (3.1) P There is also a yearly Pc of the electricity load, e.g. c2C

(although also input) variable and daily for the daily variables, data were This procedure was repeated for each of the weather variables. aggregated in a vector with 28 elements: the first 24 representing the hourly Taking in consideration the di↵erent sampling rate, hourly for the target loads of day d, whereas elements 25, 26, 27 and 28 representing the four daily measures of the weather variables. 32 Afterwards, data were standardized in the interval [0,1] through the following

the load for 2010 is very similar to the load for 2011. This motivates using the data for 2010 as training set to build prediction models and then using these models to predict the load for 2011 (test set data). Although the three datasets have similar cycles, the range of values is different. It is highest for the Spanish data (from 10000 MW to 40000 MW), followed by the Australian data (from 5000 MW to 14000 MW) and lowest for the Portuguese data (from 1000 MW to 8000 MW). The objective function to be minimized was the Mean Absolute Percentage Error (MAPE), which is defined as: N 100 X Sn − Sˆn (8) MAPE = , Sn N n=1

where Sˆn is the estimated value of the time series at time n and N is the total number of samples in the test set (when evaluating performance) or validation set (when tuning parameters). As previously explained, given that f was set to 24, MLWNN requires three other parameters to be pre-specified or estimated from data: θ, W and K, for which a suitable upper bound was found to be 50. The training phase on the 2010 data was three-fold, due to computational time constraints that we set (i.e., a prediction must be completed in a hour, before a new sample is available and the next one is to be predicted). Firstly, two suitable initial values for W and K were determined: W0 and K0 . Then, a near-optimal value for θ was found through a global optimization technique. This was a ‘nested’ modification of a full-search global optimization algorithm: every iteration, m values of θ were drawn at random in a given interval; then, the MAPE was computed m times in order to select the value θ∗ minimizing it; finally, the interval for the next iteration is restricted and centered on the θ∗ value. The output of such an algorithm, although not guaranteeing to find the global optimum, is a ‘funnel’ leading quickly to a local minimum. If the objective function is convex, then it leads to the global minimum. Successively, the determination of local optima for W and K was addressed: the values of these parameters were iteratively increased until a local minimum was found. This procedure was applied to both parameters, once at a time. For each tuning, the performance was evaluated by measuring the MAPE on a validation set, which consisted of the most recent 25% training data samples. The optimal parameters determined for the three datasets are summarized in Table I. TABLE I O PTIMAL MLWNN PARAMETERS FOR E ACH DATASET Parameter θ W K

Australian 0.09 25 8

Portuguese 0.22 40 7

Spanish 0.19 40 9

We can see that the value of the parameter θ is considerably different for each dataset. It is lowest for the least volatile time series (Australia) and highest for the most volatile time series (Portugal). This result reflects the variability of the three time

TABLE II MAPE R ESULTS FOR MLWNN Hour 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Avg.

Australian 1.15 ± 1.01 1.22 ± 1.11 1.31 ± 1.11 1.36 ± 1.15 1.70 ± 1.48 2.74 ± 2.79 3.88 ± 4.40 4.00 ± 4.67 3.26 ± 3.84 3.02 ± 3.28 3.22 ± 3.24 3.54 ± 3.45 3.84 ± 3.82 4.13 ± 4.29 4.37 ± 4.66 4.38 ± 4.71 4.45 ± 4.70 4.23 ± 4.37 3.73 ± 3.96 3.59 ± 3.75 3.43 ± 3.67 3.03 ± 3.25 2.52 ± 2.67 2.40 ± 2.53 3.10 ± 3.63

Portuguese 8.08 ± 7.30 9.06 ± 7.88 9.94 ± 8.00 10.32 ± 8.42 10.22 ± 8.50 10.44 ± 8.60 11.02 ± 9.17 12.59 ± 10.83 14.56 ± 13.62 17.60 ± 16.73 17.28 ± 17.69 16.07 ± 16.88 14.99 ± 15.60 14.16 ± 14.74 14.97 ± 15.13 16.05 ± 16.10 16.60 ± 16.95 16.98 ± 17.69 16.12 ± 16.52 14.20 ± 14.36 13.20 ± 13.03 13.00 ± 12.47 12.42 ± 11.81 12.27 ± 11.51 13.42 ± 13.53

Spanish 4.29 ± 3.44 5.39 ± 4.54 6.57 ± 5.77 6.98 ± 6.39 7.31 ± 6.77 7.17 ± 7.06 6.85 ± 7.61 8.63 ± 9.26 10.47 ± 12.45 9.27 ± 10.85 7.75 ± 8.95 7.15 ± 8.19 6.97 ± 7.72 6.81 ± 7.67 6.39 ± 6.95 6.66 ± 7.30 7.26 ± 7.96 7.14 ± 7.92 6.72 ± 7.15 6.47 ± 6.85 5.73 ± 5.61 5.30 ± 5.03 4.89 ± 4.40 4.53 ± 4.09 6.78 ± 7.49

series for which θ is the similarity threshold expressed as a percentage: a lower value (more strict constraint) is suitable for a less volatile time series, while a higher value (more relaxed constraint) is needed when dealing with more volatile time series. The values of W and especially the values K are similar for the three datasets (between 7 and 9). V. P ERFORMANCE E VALUATION Once the training was completed and the parameters were tuned, MLWNN was evaluated on the test data (the 2011 data) in terms of MAPE. The evaluation procedure consisted of 365 one day ahead predictions, starting from the first hour of each day. The MAPE for each predicted hour, averaged over the whole test set, and the related standard deviations are reported in Table II. The overall average error and standard deviation were also computed and are shown in the last row. The hourly MAPE results for the three datasets are shown in Fig. 4. Although the graphs are similar, we can see that the peaks in the MAPE error are not aligned: the higher the average error, the more shifted the peak is towards the hours in the middle of the day.

Fig. 4.

Hourly MAPE of the three datasets.

Figure 6.1: Hourly MAPE on the three datasets

For the sake of comparison with other approaches, we also low, but the forecasting accuracy has been proven to be quite stable, with a maximum error very likely to be under 7-8%. To give an idea of how good was such a performance, we compare our results with the results obtained by di↵erent methods on the same dataset and some baselines as well. Please note that Bpday is a simple predictor baseline which forecasts next

TABLE III MAE RESULTS FOR MLWNN Hour 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 Avg.

Australian 91.72 90.66 90.76 91.12 114.08 193.96 300.73 339.66 295.65 281.11 302.37 331.63 359.83 385.03 405.79 412.07 424.35 411.64 367.44 348.55 323.72 274.80 221.13 198.63 277.35

Portuguese 280.33 287.06 296.59 296.01 289.54 289.34 298.03 348.77 425.24 596.72 634.97 615.43 604.23 556.92 575.39 620.54 626.45 619.39 591.44 549.40 531.82 524.46 502.39 478.23 476.61

Spanish 870.44 952.11 1054.12 1057.30 1081.34 1064.94 1049.72 1465.05 1862.34 1797.94 1610.65 1525.54 1549.05 1517.48 1406.91 1437.66 1529.08 1517.89 1456.65 1412.65 1281.74 1197.07 1087.26 933.97 1321.62

report the Mean Average Error (MAE), which is defined as: MAE =

N 1 X |Sn − Sˆn |. N n=1

(9)

The MAE obtained for each hour, averaged over the 2011 test set, is reported in Table III. Below we discuss the performance of MLWNN on each dataset in more details. For comparison, we also present the results of the three state-of-the-art approaches mentioned in Sect. II: INN, WNN and PSF. As MLWNN is a generalization of WNN, it is important that we compare the two methods. The other two methods, PSF and INN, were chosen for comparison since PSF is an extension of WNN, and INN was shown to outperform WNN and PSF in [18]. In addition, we also compare the performance of WNN with two baselines: Bpday and Bpweek . The daily baseline Bpday simply predicts the loads from the previous day. The weekly baseline Bpweek predicts the loads from the same day of the previous week. A. Australian Data Table IV presents the comparison results for the Australian dataset. We can see that MLWNN is the most accurate approach, followed by INN, WNN, PSF and the two baselines. We also used the t-test to evaluate if the differences in accuracy are statistically significant. The results showed that the accuracy of MLWNN is significantly higher (+) than all of the other methods used for comparison at a confidence level of 99%. Figs. 5, 6 and 7 show the predicted and the actual loads for a month, week and day, respectively, in December 2011. The predicted values are relatively close to the actual values, especially taking into consideration that December is one of

TABLE IV P ERFORMANCE C OMPARISON FOR AUSTRALIAN DATASET Error/test MAPE MAE t-test

MLWNN 3.10 277.4

INN 3.36 304.9 +

WNN 3.40 307.5 +

PSF 3.96 352.0 +

Bpday 4.82 420.5 +

Bpweek 5.20 471.2 +

the months with most volatile electricity load in Australia, and hence, difficult to predict.

TABLE V P ERFORMANCE C OMPARISON FOR P ORTUGUESE DATASET Error/test MAPE MAE t-test

MLWNN 13.42 476.6

INN 11.70 426.4 -

WNN 14.95 538.9 +

PSF 16.18 589.8 +

Bpday 16.06 579.6 +

Bpweek 19.12 695.4 +

daily pattern for the Portuguese data is more complex and this may be reason for the lower accuracy. C. Spanish Data

Table VI presents the comparison results for the Spanish dataset. MLWNN is the third most accurate approach, after INN and WNN. However, the accuracy of WNN is only slightly higher than the accuracy of MLWNN and this difference is not statistically significant. More specifically, the t-test for statistical significance of the accuracy differences at a confidence level of 99%, showing that the accuracy of Fig. 5. NSW: predicted (blue) vs. actual load (red), December 2011. Figure 2011 Figure 6.2: 6.2: Predicted Predicted (red) (red) vs vs actual actual Australian Australian loads loads for for December December Below 2011 some MLWNN is statistically higher (+) than PSFtheand the two figures are reported to let the reader visualize di↵erence in performances compared to the previous dataset. baselines, statistically lower (-) than INN and not statistically different (=) than the accuracy of WNN.

Fig. 6.

NSW: predicted (blue) vs. actual load (red), a week in December

Figure Predicted 2011. Figure 6.3: 6.3: Predicted (red) (red) vs vs actual actual Australian Australian loads, loads, last last week week of of December December 2011 2011 Fig. 8.

Portugal: predicted (blue) vs. actual load (red), December 2011.

The thus The procedure procedure applied applied was was to to start start the the forecast forecast at at hour hour 11 of of each each day, day,Figure thus 6.10: Predicted (red) vs actual Portuguese loads for December 2011 forecasting forecasting hours hours 1,2, 1,2, ..., ..., 24 24 and and then then shift shift to to the the next next day. day. This This isis not not the the The it first best best approach, approach, mainly mainly for for two two reasons: reasons: first first of of all, all, in in aa real real application, application, it graphic plots prediction (in red) against actual samples (in blue) for the is of is in in general general not not true true that that forecasts forecasts are are made made starting starting from from the the first first hour hour of month of December 2011, while the second one provides a closer look on(i.e. a specific week of the same month. the the day; day; secondly, secondly, and and most most important, important, an an online online update update of of the the forecasts forecasts (i.e. aa continuous 24-47, continuous process, process, forecasting forecasting hours hours 1-24 1-24 first, first, then then 2-25, 2-25, 3-26, 3-26, ..., ..., Even 24-47,if in the majority of the cases the prediction is very close to the acetc.) and, tual behavior of the timeseries, some peaks in the error can be recognized. etc.) would would lead lead to to better better performances performances in in terms terms of of prediction prediction accuracy accuracy and, then, Those peaks are generally present during the central hours of the day, when then, risk risk management. management. human and industrial activities generate unpredictable spikes for the stanFig. 7. NSW: predicted (blue) vs. actual load (red), a day in Decemberdard 2011.model. Spikes (or, more in general, outliers) prediction will be further Figure 6.4: Predicted (red) vs actual Australian loads for a day in December investigated in future work. 2011 The weekly pattern is hard to be found in these data, which are much more B. Portuguese Data predicted (blue) vs. actual load (red), a week in December irregular Fig. than9.thePortugal: Australian ones. Figure 6.11: Predicted (red) vs actual Portuguese loads, last week of Decem2011. hours 1 2 3 4 5 6 7 8 Table V shows the accuracy on the Portuguese dataset. The last picture represents the forecasting of a day in December. MAPE-365 1.16 1.22 1.30 1.35 1.69 2.72 3.85 3.97ber 2011 We can see that MLWNN is more accurate than WNN. std-365 1.01 1.11 1.10 1.15 1.48 2.79 4.40 4.67 It is clear how predicting Portuguese TABLE electricity Overall, MLWNN is the VI loads is a much more tough MAPE-8760 1.56 1.86 2.13second 2.32 best 2.48 performing 2.60 2.71 approach 2.82 task than predicting Australian ones. 52 52 after INN; 1.44 WNN1.93 is third, the 3.05 daily 3.18 baseline P ERFORMANCE C OMPARISON FOR S PANISH DATASET std-8760 2.31 followed 2.57 2.76by2.91 and PSF, and9 finally The hours 10 by11the weekly 12 13baseline. 14 15 t-test 16 for As explained predicting every WNN day starting the first hour Error/testbefore, MLWNN INN PSF from Bpday Bpweek statistical significance that 3.86 the accuracy of MLWNN MAPE-365 3.23 3.00 showed 3.22 3.54 4.15 4.39 4.40 6.78 6.03 8.87 7.45 is not fullyMAPE representative of how 5.79 the performance, so in the 9.47 next table the std-365 3.84lower 3.27 (-)3.24 3.82significantly 4.29 4.66 higher 4.71 (+) is significantly than 3.45 INN and MAE 1321.6 1134.6 1711.4 1888.0is reported 1460.1 hourly Mean Average Percentage Error and1179.9 its standard deviation MAPE-8760 2.92 3.00 at 3.08 3.17 3.26 level 3.33of 3.39 t-test between the two tests: + + than all other methods a confidence 99%. 3.43in a comparison one=always starting from hour +1 of std-8760 3.41 3.50 3.75 and 3.85 3.91load 3.92 Figs. 8, 93.31 and 10 show the 3.62 predicted actual for the the day and then shifting to the next day, the other making a one day ahead hoursmonth,17 19 as for 20 the 21 22 data. 23 Note 24 that same week 18 and day Australian The Spanish data is 24 more difficult to predict than the prediction starting from each of the hours of each day. MAPE-365 4.47 4.25 3.74 3.60 3.44 3.04 2.53 2.42 In FigureAustralian 6.13 hourly data MAPE the two tests compared. data. It is interesting the y-axes are different for the two datasets. The Portuguese butforeasier than theare Portuguese Figs. 11, std-365 4.70 4.36 3.95 3.95 3.66 3.25 2.67 2.52 to note that here, di↵erently thanactual on Australian data, the MAPE of the first dataset is the most difficult to predict. As Fig. 10 shows, the 12 and 13 show the loads compared to the predicted MAPE-8760 3.43 3.50 3.54 3.59 3.64 3.68 3.73 3.79Figure 6.12: Predicted (red) vs actual Portuguese loads for a day in December std-8760 3.94 3.98 4.03 4.07 4.10 4.11 4.12 4.142011 57

Therefore, the results yield until now are only partly representative of the hour possible applications. In fact, the prediction accuracy and its standard de-in the case of starting points not fixed to be hour 1 is smaller than in the viation is computed for only one of the 24 starting points. Furthermore, it standard approach. This leads to the logical conclusion that, while in New is implied that the accuracy, for example, of a one hour ahead prediction is South Wales forecasting the first hour is simpler than forecasting a

Figure 6.11: Predicted (red) vs actual Portuguese loads, last week of December 2011

results to INN. On the Australian data, MLWNN was the best performing algorithm. In terms of computational cost, some optimization efforts might be carried out in future research works, even considering the framework of big data analysis [33] or exhaustive search by quantum computation MLWNN INN WNN PSF Bpday Bpweek Bmean [34]. Nevertheless, the most ‘strict’ constraint, represented by INN WNN 1711.38 PSF Bpday B B MAE MLWNN 1321.62 1134.63 1179.89 1888.02 1460.07 2671.69 mean pweek the computational time that must be less than one hour in MAPE 6.78 5.79 6.03 8.87 9.47 7.45 14.32 MAE 1321.62 1134.63 1179.89 1711.38 1888.02 1460.07 2671.69 case of one hour ahead prediction, is widely respected by the MAPE 6.78 5.79 6.03 8.87 9.47 7.45 14.32 Figures from 6.18 to 6.20 report the predicted values compared to the actual algorithm in almost all the operative situations. Fig. 10. Portugal: predicted (blue) vs. actual load (red), a day in December ones for December. At first sight the results seem to be inaccurate, but it

Figures from 6.18 to 6.20 report the predicted values compared the actual Figure 6.12: Predicted (red) vs actual Portuguese loads for a day to in December 2011. must be December. specified that one of theseem mosttodifficult months but to be ones At December first sight isthe results be inaccurate, it 2011for predicted due to the presence of outliers must be specified that December is one (Christmas of the mostholidays). difficult months to be predicted duecase tofor the presence ofaoutliers ones, month,points week and atoday, in December hour in the of astarting not (Christmas fixed be holidays). hour 1 is smaller2011, than

respectively. in the standard approach. This leads to the logical conclusion that, while in New South Wales forecasting the first hour is simpler than forecasting a general hour of the day, the same can not be said about Portuguese data.

Figure Fig. 6.18:11.Predicted (red) vs(blue) actual loads forDecember December Spain: predicted vs.Spanish actual load (red), 2011.

Figure 6.18: Predicted (red) vs actual Spanish loads for December 58

this is consistent with human activity which deeply influence the behavior of Fig. 12. Spain: predicted (blue) vs. actual load (red), a week in December the timeseries. 2011. Figure 6.19: Predicted (red) vs actual Spanish loads, last week of December

Figure 6.19: Predicted (red) vs actual Spanish loads, last week of December Observing Figure 6.20, the error is higher in the central hours of the day, while the prediction is very for theinfirst last hours Observing Figure 6.20, the accurate error is higher theand central hours of of the the day: day, while the prediction is very accurate for the first and last hours of the day: 62 62

Fig. 13.

Spain: predicted (blue) vs. actual load (red), a day in December

2011. Figure 6.20: Predicted (red) vs actual Spanish loads for a day in December

In table summary, we can conclude that MLWNN able to obtain The following compares hourly MAPE and standardisdeviation when very promising results. It outperformed WNN (the algorithm the starting point is fixed and when it is not. it extends) on the Australian and Portuguese data, and was

hours 1 2 3 4 5 6 7 8 only slightly accurate the Spanish data. MAPE-365 4.29 less5.39 6.56 than 6.98WNN 7.31on 7.16 6.84 8.63 It also outperformed the 6.39 baselines the 7.60 state-of-thestd-365 3.43 4.53 all 5.76 6.77 and 7.05 7.60 art approach PSF on all datasets, and produced comparable MAPE-8760 3.68 4.22 4.65 4.96 5.19 5.38 5.54 5.70 std-8760 3.58 4.44 5.18 5.71 6.04 6.30 6.49 6.67 hours 9 10 11 12 13 14 15 16 MAPE-365 10.47 9.26 7.74 7.15 6.97 6.80 6.38 6.65 std-365 12.45 10.84 8.94 8.19 7.72 7.67 6.94 7.30 MAPE-8760 5.87 6.05 6.23 6.42 6.58 6.73 6.85 6.96 std-8760 6.86 7.02 7.22 7.41 7.59 7.72 7.79 7.86

VI. C ONCLUSION In this paper, we presented MLWNN, a new approach for time series forecasting that is applicable for any forecasting horizon. It combines nearest neighbor sequence similarity prediction with optimization techniques. To make a prediction for a future value or a set of values, MLWNN finds similar previous sequences of values and combines them to produce a final prediction. In particular, it finds the longest sequences of values with a similarity higher than a given threshold, where the best value of the threshold is determined by an optimization procedure. In contrast to other advanced approaches, MLWNN directly operates on a single sequence and does not require the time series to be organized into vectors with a particular length (e.g. 24 dimensional vectors for hourly daily electricity loads). MLWNN was applied and evaluated for predicting the 24 hourly electricity loads for the next day, using data from three different countries. A comparison with several advanced forecasting methods and baselines showed that MLWNN is a promising approach for one day ahead electricity load forecasting. It outperformed WNN, the approach it extends, on two of the datasets and obtained a similar performance on the third one. In future works, we plan to investigate about the sensitivity of the proposed approach on some parameter settings and to develop further extensions of MLWNN by using advanced regression models based on NNs and fuzzy NNs. R EFERENCES [1] T. Mills, Time Series Techniques for Economists. Cambridge, UK: Cambridge University Press, 1990. [2] A. Pankratz, Forecasting with Dynamic Regression Models. New York, NY, USA: John Wiley and Sons, 1991. [3] D. Percival and A. Walden, Spectral Analysis for Physical Applications. Cambridge, UK: Cambridge University Press, 1993. [4] G. Box, G. Jenkins, and G. Reinsel, Time Series Analysis: Forecasting and Control, third edition. Englewood Cliffs, NJ, USA: Prentice-Hall, 1994. [5] R. Donaldson and M. Kamstra, “An artificial neural network-GARCH model for international stock return volatility,” Journal of Empirical Finance, vol. 4, pp. 17–46, 1997. [6] F. Miranda and N. Burgess, “Modelling market volatilities: the neural network perspective,” European Journal of Finance, vol. 3, pp. 137–157, 1997. [7] K. Kyoung-jae, “Financial time series forecasting using support vector machines,” Neurocomputing, vol. 55, no. 1-2, pp. 307 – 319, 2003. [8] W. Xie, L. Yu, L. Xu, and S. Wang, “A new method for crude oil price forecasting based on support vector machines,” in Lecture notes in computer science, V. A. et al., Ed. Heidelberg, Germany: Springer, 2006, pp. 444–451.

[9] I. Haidar, S. Kulkarni, and H. Pan, “Forecasting model for crude oil prices based on artificial neural networks,” in Proc. of International Conference on Intelligent Sensors, Sensor Networks and Information Processing (ISSNIP 2008), Sydney, NSW, Australia, 2008, pp. 103–108. [10] M. Panella, F. Barcellona, and R.L. D’Ecclesia, “Subband prediction of energy commodity prices,” in Proc. of the IEEE Int. Workshop on Signal Processing Advances in Wireless Communications (SPAWC 2012). Cesme, Turkey: IEEE, 2012. [11] S. Kulkarni and I. Haidar, “Forecasting model for crude oil price using artificial neural networks and commodity futures prices,” International Journal of Computer Science and Information Security (IJCSIS), vol. 2, no. 1, pp. 81–88, 2009. [12] P. Brockwell and R. Davis, Time Series: Theory and Methods, 2nd ed. New York, NY, USA: Springer-Verlag, 2009. [13] M. Panella, F. Barcellona, and R.L. D’Ecclesia, “Forecasting energy commodity prices using neural networks,” Advances in Decision Sciences, vol. 2012, 2012. [14] J. Matzenberger, “Neuronal network based modelling of demand and competing use of forestry commodities for material and energy use,” Energy Procedia, vol. 40, no. 0, pp. 156 – 164, 2013. [15] A. Sato, L. Pichl, and T. Kaizoji, Using Neural Networks for Forecasting of Commodity Time Series Trends, ser. Lecture Notes in Computer Science, A. Madaan, S. Kikuchi, and S. Bhalla, Eds. Springer Berlin Heidelberg, 2013, vol. 7813. [16] M. Panella, L. Liparulo, F. Barcellona, and R.L. D’Ecclesia, “A study on crude oil prices modeled by neurofuzzy networks,” in Proc. of 2013 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE 2013), Hyderabad, India, 2013. [17] A. Troncoso, J. M. Riquelme, J. C. Riquelme, J. L. Martinez, and A. Gomez, “Electricity market price forecasting based on weighted nearest neighbor techniques,” IEEE Transactions on Power Systems, vol. 22, pp. 1294–1301, 2007. [18] M. Rana, I. Koprinska, and A. Troncoso, “Forecasting hourly electricity load profile using neural networks,” in Neural Networks (IJCNN), 2014 International Joint Conference on, July 2014, pp. 824–831. [19] F. Martinez-Alvarez, A. Troncoso, J. C. Riquelme, and J. S. AguilarRuiz, “Energy time series forecasting based on pattern sequence similarity,” IEEE Transactions on Knowledge and Data Engineering, vol. 23, pp. 1230–1243, 2011. [20] J. W. Taylor, L. M. d. Menezes, and P. E. McSharry, “A comparison of univariate methods for forecasting electricity demand up to a day ahead,” International Journal of Forecasting, vol. 22, pp. 1–16, 2006.

[21] J. W. Taylor and P. E. McSharry, “Short-term load forecasting methods: an evaluation based on european data,” IEEE Transactions on Power Systems, vol. 22, pp. 2213–2219, 2007. [22] L. J. Soares and M. C. Medeiros, “Modeling and forecasting short-term electricity load: a comparison of methods with an application to brazilian data,” International Journal of Forecasting, vol. 24, pp. 630–644, 2008. [23] S. Fan and R. J. Hyndman, “Short-term load forecasting based on a semi-parametric additive model,” EEE Transactions on Power Systems, vol. 27, pp. 134–141, 2012. [24] S. Haykin, Neural Networks, a Comprehensive Foundation, 2nd Edition. Englewood Cliffs, NJ, USA: Prentice-Hall, 1999. [25] S. Theodoridis and K. Koutroumbas, Pattern Recognition, Fourth Edition. Burlington, MA: Academic Press, Elsevier Inc., 2009. [26] M. Panella, L. Liparulo, and A. Proietti, “A higher-order fuzzy neural network for modeling financial time series,” in Proc. of 2014 International Joint Conference on Neural Networks (IJCNN 2014), July 2014, pp. 3066–3073. [27] A. Reis and A. A. da Silva, “Feature extraction via multiresolution analysis for short-term load forecasting,” IEEE Transactions on Power Systems, vol. 20, pp. 189–198, 2005. [28] Y. Chen, P. Luh, C. Guan, Y. Zhao, L. Miche, M. Coolbeth, P. Friedland, and S. Rourke, “Short-term load forecasting: similar day-based wavelet neural network,” IEEE Transactions on Power Systems, vol. 25, pp. 322– 330, 2010. [29] S. Haykin and J. Principe, “Making sense of a complex world [chaotic events modeling],” IEEE Signal Process. Mag., vol. 15, no. 3, pp. 66–81, 1998. [30] B. Widrow and R. Winter, “Neural nets for adaptive filtering and adaptive pattern recognition,” Computer, vol. 12, no. 3, pp. 25–39, 1988. [31] M. Panella, “Advances in biological time series prediction by neural networks,” Biomedical Signal Processing and Control, vol. 6, no. 2, pp. 112–120, 2011. [32] H. Abarbanel, Analysis of Observed Chaotic Data. New York, USA: Springer-Verlag, Inc., 1996. [33] S. Scardapane, D. Wang, M. Panella, and A. Uncini, “Distributed Learning with Random Vector Functional-Link Networks,” Information Sciences, vol. 301, pp. 271–284, 2015. [34] M. Panella and G. Martinelli, “Neurofuzzy networks with nonlinear quantum learning,” IEEE Trans. Fuzzy Syst., vol. 17, no. 3, pp. 698 –710, 2009.