A Prediction Methodology of Energy Consumption Based on ... - MDPI

electronics Article

A Prediction Methodology of Energy Consumption Based on Deep Extreme Learning Machine and Comparative Analysis in Residential Buildings Muhammad Fayaz and DoHyeun Kim * Department of Computer Engineering, Jeju National University, Jejusi 63243, Jeju Special Self-Governing Province, Korea; [email protected] * Correspondence: [email protected]; Tel.: +82-64-754-3658 Received: 6 August 2018; Accepted: 18 September 2018; Published: 28 September 2018

Abstract: In this paper, we have proposed a methodology for energy consumption prediction in residential buildings. The proposed method consists of four different layers, namely data acquisition, preprocessing, prediction, and performance evaluation. For experimental analysis, we have collected real data from four multi-storied residential building. The collected data are provided as input for the acquisition layer. In the pre-processing layer, several data cleaning and preprocessing schemes were deployed to remove abnormalities from the data. In the prediction layer, we have used the deep extreme learning machine (DELM) for energy consumption prediction. Further, we have also used the adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN) in the prediction layer. In the DELM different numbers of hidden layers, different hidden neurons, and various types of activation functions have been used to achieve the optimal structure of DELM for energy consumption prediction. Similarly, in the ANN, we have employed a different combination of hidden neurons with different types of activation functions to get the optimal structure of ANN. To obtain the optimal structure of ANFIS, we have employed a different number and type of membership functions. In the performance evaluation layer for the comparative analysis of three prediction algorithms, we have used the mean absolute error (MAE), root mean square error (RMSE) and mean absolute percentage error (MAPE). The results indicate that the performance of DELM is far better than ANN and ANFIS for one-week and one-month hourly energy prediction on the given data. Keywords: energy; prediction; residential building; machine learning algorithms; consumption

1. Introduction The energy consumption in residential buildings has significantly increased in the last decade. Energy is an essential part of our lives and almost all things in some way are associated with electricity [1,2]. According to the report issued by the US Energy Information Administration (EIA), 28% growth in global energy demand may occur until 2040 [3]. Due to improper usage, a tremendous amount of energy is wasted annually; hence, energy wastage can be avoided by efficient utilization of energy. Smart solutions are required to certify the proper use of energy [4]. An energy consumption prediction is very significant to achieve efficient energy maintenance and reduce environmental effect [5–7]. However, in residential buildings, it is quite challenging as there are many types of buildings and different forms of energy. Also many factors are involved to influence the energy behaviour of the building structures, such as weather circumstances, the physical material used in the building construction, company behaviour, sub-level systems, i.e., lighting, heating, ventilating, and air-conditioning (HVAC) systems, and the execution and routines of the sub-level components [8]. Technologies based on the Internet of Things (IoT) are immensely significant to comprehend the notion of smart homes. Numerous solutions for obtaining energy consumption predictions in Electronics 2018, 7, 222; doi:10.3390/electronics7100222

www.mdpi.com/journal/electronics

Electronics 2018, 7, x FOR PEER REVIEW

2 of 22

heating, ventilating, and air-conditioning (HVAC) systems, and the execution and routines of the sub-level components [8]. Electronics 2018, 7, 222 2 of 22 Technologies based on the Internet of Things (IoT) are immensely significant to comprehend the notion of smart homes. Numerous solutions for obtaining energy consumption predictions in buildings can be be found foundin inthe theliterature literature[9]. [9].Energy Energymanagement management and efficiency is buildingsbased based on on the the IoT IoT can and efficiency is the the next most crucial applications in South Korea From homes in country this country next most crucial areaarea for for IoTIoT applications in South Korea [9]. [9]. From 2003,2003, homes in this have have been getting smarter and smarter with the inclusion of remote communication devices. The been getting smarter and smarter with the inclusion of remote communication devices. The energy energy demand in South Korea is growing day-by-day; in 2013 South Korea was the eighth largest demand in South Korea is growing day-by-day; in 2013 South Korea was the eighth largest energy energy consuming Theconsumption energy consumption in South Korea is between consuming country.country. The energy distributiondistribution in South Korea is between residential and residential and commercial sectors (38%), the industrial sector (55%), the transport sector (1%), and commercial sectors (38%), the industrial sector (55%), the transport sector (1%), and the public sector the public sector (6%) as shown in Figure 1. (6%) [10] as shown in[10] Figure 1.

Figure 1. Annual energy consumption distribution in the different zones of South Korea [10]. Figure 1. Annual energy consumption distribution in the different zones of South Korea [10]

Many solutions were developed based on machine learning algorithms for energy consumption Many solutions were developed based on machine learning algorithms for to energy consumption prediction. These models use historical data, which reflect process behavior be modeled [11,12]. prediction. These models use historical data, which reflect process behavior to be modeled [11,12]. The machine learning techniques that have been used redundantly for prediction purposes are The machine learning techniques that have been used redundantly prediction purposes are artificial neural networks [7], adaptive neuro-fuzzy inference system for (ANFIS) [13], support vector artificial neural networks [7], adaptive neuro-fuzzy inference system (ANFIS) [13], support vector machine (SVM) [14], extreme learning machine (ELM) [15], and so forth. The ELM method has some machine (SVM) [14], learning machine method has some advantages over theextreme conventional NNs such (ELM) as they[15], are and easysotoforth. use, The fast ELM learning, provide good advantages over the conventional NNs such as they are easy to use, fast learning, provide good generalization results, can easily be applied, and can get least training inaccuracy and minimum generalization results, can easily be and can get least have training and minimum norm of weights [16]. Nowadays theapplied, deep learning approaches alsoinaccuracy been used in various areas norm of weights [16]. Nowadays the deep learning approaches have also been used in various areas for prediction purposes [17], such as a deep neural network, deep belief network, and a recurrent for prediction purposes [17], of such as learning a deep neural deepofbelief a recurrent neural network. The term deep statesnetwork, the number layersnetwork, throughand which data are neural network. The term of deep learning states the number of layers through which data transferred [18]. The deep learning techniques are powerful tools to obtain healthier modelling are and transferred [18]. The deepThe learning techniques are powerful tools obtain healthier modelling and prediction performance. datasets used in References [18–20] fortotime series prediction applications prediction The datasets used intoReferences [18–20] for areas time ofseries do not haveperformance. a large quantity of data as compared datasets in the research imageprediction processing, applications do not have a large quantity of data as to datasets the research of speech recognition and machine vision. Though, in compared these applications, theindeep learning areas methods image processing, speech recognition and machine vision. Though, in these applications, the deep worked efficiently as compared to the conventional machine learning approaches due to their slightly learning methods worked efficiently as compared deeper architectures and novel learning methods. to the conventional machine learning approaches due toIntheir slightly deeper architectures and novel learning this paper, we have proposed a methodology for the methods. energy prediction having four layers, i.e., In this paper, we have proposed a methodology for the energy having four layers, the data acquisition layer, the pre-processing layer, the prediction layer, prediction and the performance evaluation i.e., theWe data acquisition layer, the pre-processing layer, prediction layer, and the performance layer. have performed different operations on the datathe in each layer of the proposed model. In the evaluation layer. We have performed different operations on the data in each layer of the proposed prediction layer, we used the deep extreme learning (DELM) approach for the improved performance model. In the prediction layer, we used the deep extreme learning (DELM) approach for the of the energy consumption prediction. The DELM takes the benefits of both extreme learning and improved performance of theThe energy consumption prediction. DELMlayers takes the benefits of both deep learning techniques. DELM increases the number The of hidden in the original ELM extreme andarbitrarily deep learning techniques. Thelayer DELM increases of hidden in networklearning structure, initializes the input weights and the the number initial hidden layerlayers weights the original ELM arbitrarily weights and first the hidden initial along with the biasnetwork of initialstructure, hidden layer, uses theinitializes techniquethe for input hiddenlayer layers (excluding hidden layer weights along with the bias of initial hidden layer, uses the technique for hidden layers layer) parameters calculation, and finally uses the least square technique for output network weights (excluding layer) calculation, the least square technique for calculation.first Wehidden have used theparameters trial and error method toand set finally the bestuses number of hidden layers, a suitable number of neurons in the hidden layers, and a compatible activation function. The performance Electronics 2018, 7, x; doi: FOR PEER REVIEW evaluation of the proposed DELM model with an adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN) regarding energy consumption prediction was carried out.

Electronics 2018, 7, 222

3 of 22

The rest of the paper is organized as follow. The related work is given in Section 2, a detailed explanation of the proposed model comprised of data acquisition, preprocessing, prediction and performance evaluation modules are given in Section 3. Section 4 discusses the experimental results based on prediction algorithms in detail. The discussion and comparative analysis explanation are provided in Section 5. The paper conclusion and future work are discussed in Section 6. 2. Related Work Energy is an extremely vital resource and its demand is growing day-by-day. Saving energy is not only significant to promote a green atmosphere for future sustainability but also vital for household consumers and the energy production corporations. Electricity affects the user’s regular expenses and the user always wants to decrease their monthly expenses. Energy production companies are always under intense pressure to fulfil the growing energy demand of the commercial and domestic sectors. Techniques for proficient energy consumption prediction are essential for all stakeholders. Many researchers have made numerous efforts and developed several methods for energy consumption prediction. Generally, we can achieve more accurate results by using machine learning in different real-world applications. Kalogirou [21] approached a back propagation neural network for the required heating load prediction in buildings. The training of the algorithm was carried out for the energy consumption data of 225 buildings; these buildings were different in sizes from small spaces to big rooms. Olofsson [22] proposed a method to forecast the energy requirement on a yearly basis for a small single-family building situated in Sweden. Yokoyama [23] suggested a method for a cooling demand prediction in a building based on the back propagation neural network. Kreider [24] applied a recurrent neural network to predict energy consumption hourly based on the heating and cooling energy prediction in buildings. The recurrent neural network also used by Ben-Nakhi [25] for the cooling load prediction of three office buildings. The data used for model training and testing was collected from 1997 to 2000 for short-term energy prediction. Carpinteiro [26] used a hierarchical neural network based model for short-term energy consumption prediction. They have used two self-organization maps for load forecasting. The Euclidian distance was used to calculate the distance between two vectors. They have used the Brazilian utility dataset for the energy consumption prediction. Their proposed approach performed well for both short-term and long-term forecasting. Another technique was suggested based on a regression model for short-term load forecasting [27]. Irisarri [28] proposed a method of energy load prediction based on summer and winter sessions. Ali, in Reference [29], proposed a technique comprising of six stages for smart homes energy consumption prediction in South Korea. They have used the Kalman filter as a predictor and the Mamdani fuzzy controller to control the actuators. Wahid [30] proposed a technique for energy consumption prediction in residential buildings. They have calculated the first two statistical moments namely mean and variance for the data that consisted of hourly, daily, weekly and monthly energy consumption. Then, the multilayer perceptron on the data with statistical moments was applied for energy consumption prediction. The trial and test methods were used to address a suitable combination of input, hidden, and output layers of neurons. Wahid [31] proposed another energy consumption prediction methodology for residential buildings. The introduced method consisted of five stages, namely data source, data collection, feature extraction, prediction, and performance assessment. Different machine learning algorithms, such as Multi-Layer Perceptron, Random Forest, and K-Nearest Neighbors Algorithms (KNN) were used to obtain the energy consumption predicted results. Arghira [32] presented an energy consumption forecasting method for different appliances in homes. The technique used in this study was developed to predict the day-ahead electricity demand for homes. In this study, the authors have used a historical dataset for homes in France. In this paper, the authors have suggested a stochastic predictor and tested two other predictors. Two pre-processing methods were also proposed namely, segmentation and aggregation. Li [33] suggested an alternate method called the hybrid genetic algorithm-adaptive network-based fuzzy inference system (GA-ANFIS) for energy consumption prediction. In their


4 of 22

proposed approach, the GA algorithm was used as an optimizer, which assisted in developing the rule base and the premises and subsequent factors were adjusted through ANFIS for optimization of the prediction performance. Kassa, in Reference [34], proposed a model based on ANFIS for one-day ahead energy generation prediction. The proposed model was tested on real information of a wind power generation profile and the results provided by this method were prominent. In another paper, Ekici [35] proposed a technique using the ANFIS model to predict the energy demands of diverse buildings with different characteristics. Nowadays, deep learning approaches have been used extensively for energy consumption prediction [18–20]. Hence, due to the greater ability of learning, the deep learning (DL) methods have been used to improve the performance of modelling, classification, and visualization problems. Collobert [36] proposed an approach based on a convolutional neural network (CNN) for natural language processing (NLP). Hinton [37] used a deep auto-encoder to reduce the dimensionality and the results indicate that the deep auto-encoder performs better compared to a principal component analysis (PCA). Qiu [20] used a DL technique to predict time series small-batch data sets. Li [18] used a deep learning technique to predict traffic flow based on time series data. After review of all these applications, the results indicate that the performance of deep learning techniques is better compared to other counterpart approaches. Figure 2 illustrates the proposed conceptual model for energy consumption prediction. The proposed methodology consisted of four main modules, namely data acquisition, pre-processing, prediction, and performance evaluation.

Figure 2. A conceptual model of the proposed approach.

3. Proposed Energy Consumption Prediction Methodology Energy consumption prediction in residential building is extremely important; it assists the manager to preserve energy and to avoid wastage. Due to the unpredictability and noisy disorder, correct energy consumption prediction in residential buildings is a challenging task. In this paper,


5 of 22

we have proposed a methodology based on a deep extreme learning machine (DELM) for energy consumption prediction in REVIEW residential buildings. We have divided the proposed method5 into Electronics 2018, 7, x FOR PEER of 22 four main layers, namely data acquisition, preprocessing, prediction, and performance evaluation. In the layers, namely datahave acquisition, preprocessing, prediction, andin performance evaluation. In theIn the data main acquisition layer, we discussed the detailed data used the experimental work. data acquisition layer, we have discussed the detailed data used in the experimental work. In the preprocessing layer, the moving average has been used to remove abnormalities from the data. preprocessing layer, the moving average has been used to remove abnormalities from the data. In In thethe prediction layer, the deep extreme learning machine (DELM) has been proposed to enhance prediction layer, the deep extreme learning machine (DELM) has been proposed to enhance the the accuracy of energy consumption results. Inperformance the performance evaluation layer, MAE, RMSE, accuracy of energy consumption results. In the evaluation layer, MAE, RMSE, and and MAPE [14] performance performancemeasures measures have been used measure performance of prediction MAPE [14] have been used to to measure the the performance of prediction algorithms. Figure 3 shows thethe detailed thediagram diagramofof the proposed method. algorithms. Figure 3 shows detailedstructure structure of the the proposed method.

Figure 3. Detailed processing diagram energyconsumption consumption prediction approach. Figure 3. Detailed processing diagramfor for the the proposed proposed energy prediction approach.

3.1. Data Acquisition Layer 3.1. Data Acquisition Layer Figure 4 shows thethe data collection datasetsfrom fromfour four residential buildings Figure 4 shows data collectionphase. phase. The The datasets residential buildings from from January to December 2010 were collected January to December 2010 were collected[10]. [10].

Building 33 Floors

Data Collection

Jan

Feb

One Year Period

Mar

Dec

Each Day of the Month

1

1

Temperature (F)

2

2

Temperature (F)

Humidity Reading

Environmental Circumstances Condition

User Occupancy

kWh

Humidity Reading

Environmental Condition

Circumstances

User Occupancy

kWh

Humidity Reading

Environmental Condition

Circumstances

3

31 31

Temperature (F)

User Occupancy

kWh

Figure 4. Data collection, the Year 2010.

We have completed the task of data collection in the data acquisition layer for the proposed Figure 4. Data collection, the Year information 2010. work. Sensors were mostly used for the collection of contextual such as environmental Electronics 2018, 7, x; doi: FOR PEER REVIEW

1


6 of 22

We 2018, have7,completed the task of data collection in the data acquisition layer for the proposed Electronics x FOR PEER REVIEW 6 of 22 work. Sensors were mostly used for the collection of contextual information such as environmental conditions, circumstances, temperature, humidity, user occupancy, occupancy, and and so so forth. forth. For user occupancy detection, obtain information in in 0, 10,form, such as detection, several severalPassive PassiveInfra-Red Infra-Red(PIR) (PIR)sensors sensorswere wereused usedtoto obtain information 1 form, such busy or not busy. To get the information about user occupancy, the installation of cameras in transition as busy or not busy. To get the information about user occupancy, the installation of cameras in positions regions of the building was required. collection the designated transition between positionsseveral between several regions of the building wasThe required. Theofcollection of the residential buildings data was carried out from January 2010 to December 2010. The building designated residential buildings data was carried out from January 2010 to December 2010. had The 33 floors (394 ft. floors tall), the floor-wise information collectedwas fromcollected smart meters and used for this building had 33 (394 ft. tall), the floor-wisewas information from smart meters and work. The installations of these meters have meters been carried out in a floor wise in the chosen used for this work. The installations of these have been carried out in amanner floor wise manner in buildings. It also indicated a direct relationship between energy utilization and users occupancy in the chosen buildings. It also indicated a direct relationship between energy utilization and users the dataset.inTo better explain the entire energy consumption data, a box plot used. occupancy the dataset. To better explain the entire energy consumption data,was a box plotThe wasx-axis used. represents hours ofthe thehours day (24 y-axis represents therepresents energy consumption kWh. The box The x-axis the represents ofh), thethe day (24 h), the y-axis the energy in consumption in represents the energy consumption in a particular hour of the day for the whole year. The long length kWh. The box represents the energy consumption in a particular hour of the day for the whole year. box highbox energy consumption and the short lengthand boxthe indicates low energy consumption. The indicates long length indicates high energy consumption short length box indicates low As residential buildings are very busy during noon and night times, the energy consumption was energy consumption. As residential buildings are very busy during noon and night times, the higher these timings. The entire dataset of hourly energy used the proposed energy during consumption was higher during these timings. The consumption entire dataset of inhourly energy work is shownused in Figure for betterwork observation. consumption in the5proposed is shown in Figure 5 for better observation.

Consumption(KWh)

Consumption VS Hour (boxplot) (boxplot)

Hour Figure Distribution of of data, data, on Figure 5. 5. Distribution on an an hourly hourly basis, basis, for for energy energy consumption. consumption.

3.2. Preprocessing Layer 3.2. Preprocessing Layer In the pre-processing layer, we have removed abnormalities from the data. The data were assumed In the pre-processing layer, we have removed abnormalities from the data. The data were to have noise due to the inherent nature of data recording where several external aspects affect the assumed to have noise due to the inherent nature of data recording where several external aspects reading. In the same way, there were many factors involved in outliers such as meter problem, affect the reading. In the same way, there were many factors involved in outliers such as meter human mistake, measurement errors, and so forth. Different smoothing filters can be used to remove problem, human mistake, measurement errors, and so forth. Different smoothing filters can be used abnormalities from the data, such as a moving average, loess, lowess, Rloess, RLowess, Savitsky-Golay. to remove abnormalities from the data, such as a moving average, loess, lowess, Rloess, RLowess, In this study, we have used the moving average method, which is an important filter and widely used Savitsky-Golay. In this study, we have used the moving average method, which is an important by various authors [14] for data smoothing. Equation (1) is the mathematical representation of the filter and widely used by various authors [14] for data smoothing. Equation (1) is the mathematical moving average filter. representation of the moving average filter. 1 M−1 y [i ] = X(i + j) (1) M J∑ 1 =0 [ ] = X(i + j) (1) where x [ ] represents the inputs, y [ ] denotes the M outputs, and M indicates the points of the moving average. In the proposed work, M was equal to 5, which was a suitable size for data smoothing [38].

where x [ ] represents the inputs, y [ ] denotes the outputs, and M indicates the points of the moving average. In the proposed work, M was equal to 5, which was a suitable size for data smoothing [38]. Usually, data normalization is required when the sample data is scattered and the sample span is large. Hence, the span of the data was minimized for building models and predictions. The Electronics 2018, 7, x; doi: FOR PEER REVIEW


7 of 22


7 of 22

Usually, data is required when therange sample data isforscattered andinputs the sample normalization was normalization done, in essence, to have the same of values each of the to the span is large. Hence, the span of the data was minimized for building models and predictions. machine learning models. It can guarantee a stable convergence of weight and biases. In machine The normalization done, in essence, to have the same range of values eachtraining of the inputs to learning algorithmwas modelling for increasing prediction accuracy andforfor process the machine learning models. It can guarantee a stable convergence of weight and biases. In machine improvement, complete sample data were normalized to fit them in the interval [0, 1] by using learning Equationalgorithm (2) given modelling below: for increasing prediction accuracy and for training process improvement, complete sample data were normalized to fit them in the interval [0, 1] by using Equation (2) = , i = 1, 2, 3…, N, (2) given below: xi − xmin P= , i = 1, 2, 3 . . . , N, (2) xmax − xmin is i of input data; and where P represents the mapped values, x denotes the starting value, where P represents the mapped values, x denotes the starting value, xi is i of input data; xmax and xmin indicate the maximum and minimum values of starting data accordingly [39]. indicate the maximum and minimum values of starting data accordingly [39].

3.3. Prediction Prediction Layer Layer 3.3. In the the prediction prediction layer, layer, we we have have used used three three well-known well-known machine machine learning learning algorithms algorithms to to make make In one-week and and one-month one-month energy energy consumption consumptionpredictions predictionsfor forthe theresidential residentialbuildings. buildings. one-week 3.3.1. Deep Deep Extreme Extreme Learning Learning Machine Machine (DELM) (DELM) 3.3.1. The extreme extreme learning learning machine machine (ELM) (ELM) technique technique is is aa very very famous famous technique technique and and itit has has been been used used The in different fields for energy consumption prediction. The conventional artificial neural network in different fields for energy consumption prediction. The conventional artificial neural network based based algorithm moresamples, trainingslower samples, slower learning times, may lead to of the algorithm requiresrequires more training learning times, and may leadand to the over-fitting a over-fitting of a learning model [40]. The idea of ELM was first specified by Reference [41]. The ELM learning model [40]. The idea of ELM was first specified by Reference [41]. The ELM is used widely in is used areas widely various areas classification and regression because an ELM learns various forinclassification andfor regression purposes because an purposes ELM learns very quickly and it is very quickly andefficient. it is computationally efficient. The ELM modellayer, comprises thehidden input layer, single computationally The ELM model comprises the input a single layer, aand an hidden layer, and an output layer. The structural model of an ELM is shown in Figure 6, where output layer. The structural model of an ELM is shown in Figure 6, where p represents input layerp represents input layer nodes, q represents hidden layeroutput nodes, layer and rnodes. indicates output layer nodes. nodes, q represents hidden layer nodes, and r indicates

Figure Figure 6. 6. Structural Structural diagram diagram of of an an extreme extreme learning learning machine machine (ELM). (ELM).

Initially, take a sample of training [ A, B] = ak, bk, (i = 1, 2, . . . , Z ), and input feature A = Initially, take a sample of training [ , ] = = , , ( = 1,2, … . , ) , and input feature [ ak1 ak2 ak3 . . . akZ ] and a targeted matrix B = [bl1 bl2 bl3 . . . blZ ] consisted of the training samples, [ ….. = [ … . . ] consisted of the training ] and a targeted matrix then A and B matrices can be represented as in Equations (3) and (4) respectively, where, a and samples, then A and B matrices can be represented as in Equations (3) and (4) respectively, where, a b represent the dimension of the input matrix and the output matrix respectively. Next, the ELM and b represent the dimension of the input matrix and the output matrix respectively. Next, the ELM arbitrarily adjusts the weights between the input layer and the hidden layer where wkl is the weight arbitrarily adjusts the weights between the input layer and the hidden layer where is the weight between the kth input layer node and lth hidden layer node as represented in Equation (5). Then, the between the kth input layer node and lth hidden layer node as represented in Equation (5). Then, the ELM randomly fixes the weights between the hidden neurons and output layer neurons that can be represented by Equation (6), where γ is the weight between the input and hidden layer nodes.

Electronics 2018, 7, x; doi: FOR PEER REVIEW


8 of 22

ELM randomly fixes the weights between the hidden neurons and output layer neurons that can be represented by Equation (6), where γkl is the weight between the input and hidden layer nodes. 

a11 a12 . . . a1z a21 a22 . . . a2z a31 a32 . . . a3z

     A=  .. ..   . .   a p1 a p2 . . .





.. . brz

(3)

           

w11 w12 . . . w1p w21 w22 . . . w2p w31 w32 . . . w3p

     w=  .. ..   . .   wi1 wi2 . . . 

          

.. . a pz

b11 b12 . . . b1z b21 b22 . . . b2z b31 b32 . . . b3z

     B=  .. ..   . .   br1 br2 . . .



.. . wip

γ11 γ12 . . . γ1r γ21 γ22 . . . γ2r γ31 γ32 . . . γ3r

     γ=  .. .. ..   . . .   γ p1 γ p2 . . . γ pr

(4)

           

(5)

           

(6)

Next, the biases of hidden layers nodes were randomly selected by the ELM, as in Equation (7). Further, the ELM selected a g(x) function, which was the activation function of the network. Consider Figure 4; the resultant matrix can be represented as in Equation (8). Respectively, the column vector of the resultant matrix, T, is represented in Equation (9). T B = b1, b2, b3 , ··· , b p

    vl =   

v1j v2j t3j .. . trj

V = [ v1 , v2 , v3 , · · · , v Z ]r × Z  q ∑l =1 γk1 g(wk al + bk ) q   ∑l =1 γk2 g(wk al + bk )     q =  ∑l =1 γk3 g(wk al + bk )   ..     . q ∑l =1 γkr g(wk al + bk ) 

(7) (8)        

(9)

(l =1,2,3,. . . ,Q )


9 of 22

Next, by considering Equations (8) and (9), we can obtain Equation (10). The hidden layer output is expressed as H and the transposition of V is represented as V 0 and weight matrix values of γ [42,43] were computed using the least square method as given in Equation (11). Hγ = V 0

(10)

γ = H+ V 0

(11)

The regularization term γ has been used in order to make the network more generalized and more stable [44]. Deep learning is emerging and is the most popular topic for researchers nowadays. A network having at least four layers with input/output layers meets the requirement of a deep learning network. In a deep neural network, the neurons of each layer are trained on a different set of parameters using the prior layer’s output. It enables the deep learning networks (DLN) to handle extensive data sets. Deep learning has grasped the attention of many researchers because it is very efficient to solve real-world problems. In the proposed work, we have used the DELM to encapsulate the advantages of both ELM and deep learning. The configuration model of DELM consisted of one input layer having four neurons, six hidden layers where each hidden layer consisted of 10 neurons, and one output layer having one neuron is illustrated in Figure 7. The trial and error method has been used to select the number of nodes in the hidden layers due to the unavailability of any specific mechanism for specifying hidden layers neurons. The projected output of the second hidden layer can be achieved as: H1 = Vγ+

(12)

where γ+ represents the general inverse of matrix γ. Hence, the values of the hidden layer 2 can be simply achieved by means of Equation (11) and the activation function inverse. g(W1 H + B1 ) = H1

(13)

In the Equation (13), the parameters W1 , H, B1 , and H1 represent the weight matrix of the first two hidden layers, the bias of the first hidden layer neurons, the estimated output of first hidden layer, and the estimated output of the second hidden layer respectively. WHE = g−1 ( H1 ) HE+

(14)

HE+ represents the inverse of HE and the activated function g(x) is used to compute Equation (5). So by specifying any proper activation function g(x), the desired result of the second hidden layer is updated as below: (15) H2 = g(WHE HE ). The update of the weighted Matrix γ between hidden layer 2 and hidden layer 3, carried out as in Equation (16), where H2+ indicates the inverse of H2 . Therefore, the estimated result of the layer 3 is represented as in Equation (17). γnew = H2+ V (16) + H3 = Vγnew

(17)

+ represents the inverse of the weight matrix γ Vγnew new . Then the DELM defines the matrix WHE1 = [ B2 , W2 ]. The output of the third layer can be achieved by using Equations (10) and 11).

H3 = g−1 ( H2 W2 + B2 )= g(WHE1 HE1 )

(18)

+ WHE1 = γ−1 ( H3 ) HE1 .

(19)


10 of 22

In Equation (18) the H2 signifies the desired result and the hidden layer 2, the weight between the hidden layer 2 and the hidden layer 3 is represented by W2 , the hidden layer B2 is the bias of + the hidden layer 3 neurons. HE1 represents the inverse of HE1 , and g−1 ( x ) denotes the inverse of the activation function g(x). The logistic sigmoid function represented in Equation (20) has been adopted. The third hidden layer output is computed as in Equation (21). g( x ) =

1 1 + e− x

H3 = g(WHE1 HE1 ).

(20) (21)

Finally, the resultant weighted matrix between the hidden layer 3 and the last layer output is computed as in Equation (22). The estimated result of the hidden layer 3 is represented as in Equation (23). γnew = H4T (

−1 1 + H4T H4 ) V λ

+ H4 = Vγnew

(22) (23)

+ represents the inverse of the weight matrix γ Vγnew new . Then the DELM defines the matrix WHE2 = [ B3 , W3 ]. The output of the fourth layer can be achieved by using Equations (15) and (24).

H4 = g−1 ( H3 W3 + B3 ) = g(WHE1 HE1 )

(24)

+ WHE2 = γ−1 (( H4 ) HE2 .

(25)

In Equation (11), the H3 denotes the desired output of the third hidden layer, the weight between the third hidden layer and the fourth hidden layer is represented by W3 , the hidden layer B3 is the bias + of the third hidden layer neurons. HE1 represents the inverse of HE1 , and g−1 ( x ) denotes the inverse of the activation function g(x). The logistic sigmoidal function has been adopted. The output of the third and fourth hidden layer is computed as Equation (26) below: H4 = g(WHE2 HE2 )

(26)

Finally, the output weight matrix between the fourth layer and the output layer is computed as in Equation (27). The estimated result of the fifth layer can be denoted by Equation (28). The desired output of the DELM network is represented by Equation (29). γnew = H5T (

−1 1 + H5T H5 ) V λ

(27)

+ H5 = Vγnew

(28)

f ( x ) = H5 β new.

(29)

So far, we have discussed the calculation process of the four hidden layers of the DELM network. The cycle theory has been applied to demonstrate the calculation process of the DELM. The recalculation of Equations (18)–(22) can be done to get and record each hidden layer’s parameters and eventually the last result of DELM network. If increments occur in the hidden layers, the same computation procedure can be reused and executed similarly. In the proposed work we have applied a trial and error method [30,31] to determine the optimal neural network structure. Inputs to the DELM as shown in Figure 7 are hours of the day (X1 ), days of the week (X2 ), days of the month (X3 ) and month (X4 ) and the output is the energy consumption prediction (ECP).

and eventually the last result of DELM network. If increments occur in the hidden layers, the same computation procedure can be reused and executed similarly. In the proposed work we have applied a trial and error method [30,31] to determine the optimal neural network structure. Inputs to the DELM as shown in Figure 7 are hours of the day (X1), days of the week (X2), days of the month Electronics 7, 222 11 of 22 (X3) and 2018, month (X4) and the output is the energy consumption prediction (ECP).

Figure 7. Structural diagram of the proposed energy consumption prediction based on the deep Figure Structural diagram of the proposed energy consumption prediction based on the deep extreme7.learning machine (DELM) approach. extreme learning machine (DELM) approach.

3.3.2. Artificial Neural Network (ANN) ANNs are based on biological information processing and have been extensively used for energy consumption in residential buildings. The ANNs have been commonly used because Electronics 2018, 7,prediction x; doi: FOR PEER REVIEW of their robust nonlinear mapping capability. The ANN might be reflected in a regression method, which signifies the sophisticated nonlinearity between independent and dependent variables [45]. In recent years, researchers have deployed ANN models for analyzing numerous types of prediction problems in a variety of circumstances. The ANN model used in the proposed work is the multilayer perceptron (MLPs). MLPs usually have three layers namely input, hidden, and the output consisting of input nodes, neurons, and synaptic connections. In MLPs backpropagation method is used to reduce the prediction residual sum of squares (RSS). The mathematical representation of RSS is given in the Equation (30). RSS =

n

∑i=1 (Yi − Yi )

2

(30)

where Yi represents the ith targeted values in the training data and Yi indicates the predicted values. The strength of the input signal is represented through synapse weights, and initially, these weights are initially allocated randomly. The sum of the product of each connection input value and synapse is computed and provided as input to each neuron in the hidden layer. Commonly three types of activation functions, namely linear, tan-sigmoid, and logarithmic sigmoid as represented in Equations (31)–(33) respectively are used in the hidden layer and output layer of MLP [30]. χ(x) = linear (x) Φ (x) =

(31)

2 −1 (1 + e−2x )

(32)

1 (1 + e − x )

(33)

ψ(x) =

The tan-sigmoid function is used as the activation function in the hidden layer. The best transfer function selection in the hidden is also somewhat trial and test method [46]. In the proposed work, we have tested five transfer functions, such as tan-sigmoid, linear, radial basis, symmetric, and saturating linear. In the output layer linear function has been used which is the most appropriate activation function for output neuron (s) of ANNs for regression problems. The structure diagram of the ANN used in the proposed approach is shown in Figure 8.

The tan-sigmoid function is used as the activation function in the hidden layer. The best transfer function selection in the hidden is also somewhat trial and test method [46]. In the proposed work, we have tested five transfer functions, such as tan-sigmoid, linear, radial basis, symmetric, and saturating linear. In the output layer linear function has been used which is the most appropriate activation function for output neuron (s) of ANNs for regression problems. The structure diagram of Electronics 2018, 7, 222 12 of 22 the ANN used in the proposed approach is shown in Figure 8.


12 of 22

Figure 8. of the neuralneural network (ANN) (ANN) used in the proposed Figure 8. Structural Structuralofofa model a model of artificial the artificial network used in the approach. proposed Different training algorithms, such as Levenburg Marquardt (LM), Bayesian regularization, approach.

Different training algorithms, such as Levenburg Marquardt (LM), Bayesian regularization, scaled scaled conjugate gradient and so forth [47] have been used for network training. The development of conjugate gradient and so forth [47] have been used for network training. The development of MLP with the number of pre-defined hyper-parameters disturbs the ability of fitness of the model. MLP with the number of pre-defined hyper-parameters disturbs the ability of fitness of the model. The selection of the number of neurons in the hidden layer is a somewhat trial and test method [30]. Electronics 2018, 7,of x; the doi: number FOR PEERof REVIEW The selection neurons in the hidden layer is a somewhat trial and test method [30].

3.3.3. Adaptive Adaptive Neuro-Fuzzy Neuro-Fuzzy Inference Inference System System (ANFIS) (ANFIS) 3.3.3. ANFIS uses usesa afeed-forward feed-forward network having multiple NN algorithms for ANFIS network having multiple layerslayers whichwhich use NNuse algorithms for learning learning and fuzzy reasoning for mapping the input space to output space. ANFIS is used and fuzzy reasoning for mapping the input space to output space. ANFIS is used extensively in various extensively in various areas for predictions ANFIS is a(FIS), fuzzy inference system (FIS), and its areas for predictions [48–50]. ANFIS is a fuzzy[48–50]. inference system and its implementation is carried implementation is carried out in the adaptive neural framework. The structure of ANFIS is shown in out in the adaptive neural framework. The structure of ANFIS is shown in Figure 9 (2 inputs, and one Figure 9 (2 inputs, and one output) for the first order Sugeno fuzzy logic model. In this structure, for output) for the first order Sugeno fuzzy logic model. In this structure, for each input, two membership each input, twobeen membership functions have defined. functions have been defined.

Figure 9. Structural diagram for an adaptive neuro-fuzzy inference inference system. system.

The adaptive neuro-fuzzy system consisted of five different layers, each of them them is is explained explained below as layer 1 nodes are adaptive and produce inputs degree membership functions below inindetail detailsuch such as layer 1 nodes are adaptive and produce inputsofdegree of membership (MFs). Layer 2 nodes are fixed, and these nodes do simple multiplication. Layer 3 nodes are also fixed, functions (MFs). Layer 2 nodes are fixed, and these nodes do simple multiplication. Layer 3 nodes and the role of these nodes in the network is normalization. Layer 4 nodes are adaptive whose output are also fixed, and the role of these nodes in the network is normalization. Layer 4 nodes are is a simplewhose multiplication firing strength first-orderfiring Sugeno model.and Thefirst-order factors of adaptive output isof anormalized simple multiplication of and normalized strength

Sugeno model. The factors of this layer are named as consequent factors. Layer 5 has a single permanent node where the calculation of all incoming is carried out. The supervised learning is used to train the network. Hence, the purpose is to train the adaptive network for known functions approximation supplied by training data and then finds the exact value of the mentioned parameters. There is no hard and fast rule to determine a suitable number of


13 of 22

this layer are named as consequent factors. Layer 5 has a single permanent node where the calculation of all incoming is carried out. The supervised learning is used to train the network. Hence, the purpose is to train the adaptive network for known functions approximation supplied by training data and then finds the exact value of the mentioned parameters. There is no hard and fast rule to determine a suitable number of membership functions for a variable in ANFIS. In the proposed work, we have applied the trial and error mechanism to determine the effective number of MFs for each variable. Similarly, there are many types of membership functions, such as triangular, trapezoidal, and so forth [49]. In the proposed work, we have considered the bell-shaped membership functions as illustrated in Equations (34) and (35). µXi (a) =

µYi (b) =

1 1+

a − ci ai

1+

b−ci ai

1

2xi , i, 1, 2,

(34)

2xj , j, 1, 2,

(35)

The bell-shaped membership functions are the most common and effective MFs used in the ANFIS for prediction purposes [51]. 3.4. Performance Evaluation Layer Several criteria are used for performance computation of different prediction algorithms. In the performance evaluation layer of the proposed model, MAE, RMSE, and MAPE performance indices have been used to compare the target values and the actual values. The MSE is a measure that used for the minimization of the error distribution. The RMSE measures the error between the predicted power and the targeted power, and the MAPE is a measure which evaluates the prediction difference as a percentage of the targeted power. The RMSE, MAE, and MAPE performance can be computed in Equations (36)–(38) respectively as: r RMSE = MAE = MAPE =

1 n (Ti − Pi )2 N ∑ k=0

(36)

1 n |Ti − Pi | N ∑ i=1

(37)

1 |Ti − Pi | n ×100 ∑ i = 1 N Ti

(38)

where N indicates the entire values, T represents the target value, and P indicates the predicted value. These metrics provide a single value to measure the accuracy of the outcomes of different algorithms. These statistical measurements have been used in previous studies to analyze energy consumption prediction models [34]. 4. Experimental Results Based on Prediction Algorithms 4.1. Model Validation of DELM To validate the model and analyze the experiments we have used, the actual data collected by using different meters fixed in the designated multi-storied residential buildings. The data is collected for a single year, i.e., 1 January 2010 to 31 December 2010. The size of complete input data is equal to 365 days × 24 h per day = 8760. The installation of smart meters at each floor sub-distribution switchboard has been carried out, and these meters are in connection with the central server. The energy consumption for each hour is recorded for a year, and the unit used for measurement is Kilowatt hour (kWh). The information contained by data-set is floor-wise hourly energy consumption. Example

using different meters fixed in the designated multi-storied residential buildings. The data is collected for a single year, i.e., 1 January 2010 to 31 December 2010. The size of complete input data is equal to 365 days × 24 h per day = 8760. The installation of smart meters at each floor sub-distribution switchboard has been carried out, and these meters are in connection with the central server. The Electronics 7, 222 14 of 22is energy 2018, consumption for each hour is recorded for a year, and the unit used for measurement Kilowatt hour (kWh). The information contained by data-set is floor-wise hourly energy consumption. Example view for two days hourly collected data is illustrated in Figure 10 for view for two days hourly collected data is illustrated in Figure 10 for anonymous building-04 having anonymous building-04 having 33 floors. 33 floors. 5-6

6 5 4 3 2 1 0

4-5

3-4

2-3

1-2

0-1

Floor23 Floor12 Floor1

Figure Figure10. 10.Example Exampleview viewof oftwo twoday dayhourly hourlyenergy energyconsumption consumptiondata datacollected collectedininBuilding-IV. Building-IV.

We have used four important parameters, hours of the day (X1 ), days of the week (X2 ), days of Electronics 2018, 7, x; doi: FOR PEER REVIEW the month (X3 ) and month (X4 ) as input to machine learning algorithms used in the proposed work. Further, to prevent overfitting, we used the k-fold cross-validation. It is a popular method because it is simple to understand and generally results in a less biased or less optimistic estimate of the model skill than other methods, such as a simple train/test split [52]. For one-week energy consumption prediction the data is divided into 52 folds of approximately equal size. The first fold is treated as a validation set, and the method is fit on the remaining folds. In the proposed work, we have carried out energy consumption prediction for one-week and one-month. Hence, for a one-week energy consumption prediction, one year hourly data is divided into 52 folds. In testing data set we have used the one-week (7 days × 24 h = 168) data for testing and the remaining data (358 days × 24 h = 8592) to train the models for one week energy consumption prediction. We have swapped the results achieved for one-week energy consumption prediction, and next to the training and testing data and randomly selected another data set (one week) for testing and the remaining for training. This process continues until 52 iterations. Similar, for one-month energy consumption prediction the data have been divided into 12 (k) sets with approximately equal size. We have used the one-month (January) (31 days × 24 h = 744 h) data for testing and the remaining 11 months data (8016 h) for training. Next, we have selected another month (February) data (28 days × 24 h = 672 h) for testing and the remaining 11 months hourly data (8088 h) for training. The processes continue until the 12th month (December) hourly data (31 days × 24 h = 744 h) get selected for testing and the remaining (11 months data) (31 days × 24 h = 744 h) for training. Finally, the average of the testing results was determined. The optimum network configuration depends on the number of hidden layers, a number of neurons in the hidden layer (s), and the type of activation function. We have applied trial and error method to achieve the optimum structure [46]. After applying trial and error method, we achieved the well-suited configuration model consisted of 6 hidden layers and 20 neurons in each hidden layer for the proposed DELM approach. The sigmoid activation function is used as activation function because it is the most popular activation function and numerously used from the last couple of years [51]. We have also tried different iteration numbers from 1000 to 3000 with 100 increments and set iteration number as 2000. Now by using the best-suited configuration model the one-week and one-month hourly energy consumption resulted are recorded as shown in Figures 11 and 12 respectively.

function because it is the most popular activation function and numerously used from the last couple of years [51]. We have also tried different iteration numbers from 1000 to 3000 with 100 increments and set iteration number as 2000. Now by using the best-suited configuration model the one-week and one-month hourly energy consumption resulted are recorded as shown in Figures 11 Electronics 2018, 7, 222 15 of 22 and 12 respectively.


Figure 11. Actual vs. DELM predicted results for one-week energy consumption. Figure 11. Actual vs. DELM predicted results for one-week energy consumption.

15 of 22


Figure 12. Actual vs. DELM predicted results for one-month energy consumption. Figure 12. Actual vs. DELM predicted results for one-month energy consumption.

In this work, we have used ANN and ANFIS models for comparison with ANN and DELM. In this work, we have used ANN and ANFIS models for comparison with ANN and DELM. The reason behind the selection of ANFIS was its ability to seek for useful features and development The reason behind the selection of ANFIS was its ability to seek for useful features and development of the prediction model. The ANNs is also a very famous technique and it is used for energy of the prediction model. The ANNs is also a very famous technique and it is used for energy consumption purposes. consumption purposes. 4.2. Model Validation of ANFIS 4.2. Model Validation of ANFIS The structure diagram of the ANFIS for the proposed work is shown in Figure 13 [53]. We have structure diagram of thefor ANFIS for the work is shown in Figure 13 [53]. Weinhave used The a trial and error approach the type andproposed number of membership functions selection the used a trial and For error approach type and number membership functionsMFs selection in the proposed work. each variable,for wethe have considered two of generalized bell-shaped as shown in proposed work. For each variable, we have considered two generalized bell-shaped MFs as shown Figure 14. Total 16 rules were specified; the rule viewer is shown in Figure 15. in Figure 14. Total 16 rules were specified; the rule viewer is shown in Figure 15.

4.2. Model Validation of ANFIS The structure diagram of the ANFIS for the proposed work is shown in Figure 13 [53]. We have used a trial and error approach for the type and number of membership functions selection in the proposed Electronics 2018, 7,work. 222 For each variable, we have considered two generalized bell-shaped MFs as shown 16 of 22 in Figure 14. Total 16 rules were specified; the rule viewer is shown in Figure 15.

Figure 13. Screenshot of the structure neuro-fuzzyinference inference system (ANFIS) for the Figure 13. Screenshot of the structureofofthe theadaptive adaptive neuro-fuzzy system (ANFIS) for the proposed work [53]. proposed work Electronics 2018, 7, x[53]. FOR PEER REVIEW 16 of 22


Figure Screenshotofofthe themembership membership functions in in ANFIS [53].[53]. Figure 14.14. Screenshot functionsused used ANFIS

Figure Screenshotof of the the If-then If-then rules [53]. Figure 15.15. Screenshot rulesafter aftertraining training [53].

The output predicted results for one-week and one-month energy consumption is shown in Figures 16 and 17 respectively.


Figure 15. Screenshot of the If-then rules after training [53].

17 of 22

The output output predicted predicted results results for for one-week one-week and and one-month one-month energy energy consumption consumption is is shown shown in in The Figures 16 16 and and 17 17 respectively. respectively. Figures

Electronics 2018, 7, x FOR PEER REVIEW Electronics 2018, 7, x FOR PEER REVIEW

17 of 22 17 of 22

.

Figure ANFIS predicted predicted results for . Figure 16. 16. Actual Actual vs. vs. ANFIS results for one-week one-week energy energy consumption. consumption. Electronics 2018, 7, x; doi: FOR PEER REVIEW

Figure 17. Actual vs ANFIS predicted results for one-month energy consumption. Figure 17. Actual vs vs ANFIS ANFIS predicted predicted results Figure 17. Actual results for for one-month one-month energy energy consumption. consumption.

4.3. 4.3. Model Model Validation Validation of of ANN ANN 4.3. Model Validation of ANN In the the proposed the best model, we we tried In proposed work work for for achieving achieving the best ANN ANN prediction prediction model, tried different different hidden hidden layerIn activation functions, different training functions, and output layer layer transfer functions instead of the proposed workdifferent for achieving thefunctions, best ANNand prediction model, we tried different hidden layer activation functions, training output transfer functions instead of the sensitivity of input parameters. All the network models have four neurons in input layer, a single layer activation functions, different training functions, and output layer transfer functions instead of the sensitivity of input parameters. All the network models have four neurons in input layer, a single neuron in output layer and for hidden layer, we have tried some neurons in the hidden layer started the sensitivity of input parameters. All the network models have four neurons in input layer, a single neuron in output layer and for hidden layer, we have tried some neurons in the hidden layer started from 55 to to to find best combination layer, hidden layer, and layer output layer neuron in30 output layer and for hidden layer, we combination have triedinput some neurons in the hidden from to 30with withfive fiveincrements increments find best input layer, hidden layer, and started output neurons Trial and error method has been applied to determine the number of neurons in hidden from 5 to 30 with five increments to find best combination input layer, hidden layer, and output layer neurons Trial and error method has been applied to determine the number of neurons in layers [30]. We[30]. haveWe considered the model shown Figure 18 Figure becausethe provides the least MSE layer neurons Trial and errorconsidered method has applied to determine numberitof neurons in hidden layers have theasbeen model asinshown in 18it because provides the values with tan-sigmoid (x) function in the hidden layer, linear function in the output layer and the hidden layers [30]. We have considered the model as shown in Figure 18 because it provides least MSE values with tan-sigmoid (x) function in the hidden layer, linear function in the output Levenberg-Marquardt for algorithm training. least and MSEthe values withalgorithm tan-sigmoid (x) function the hidden layer, linear function in the output layer Levenberg-Marquardt forintraining. layer and the Levenberg-Marquardt algorithm for training.

Figure 18. ANN ANN structure structure model to predict energy consumption. Figure 18. ANN structure model to predict energy consumption.

The recorded energy consumption prediction results for one-week and one-month have been The shown in recorded Figures 19energy and 20consumption respectively. prediction results for one-week and one-month have been shown in Figures 19 and 20 respectively.


18 of 22

Figure 18. ANN structure model to predict energy consumption.

The Therecorded recordedenergy energyconsumption consumptionprediction predictionresults resultsfor forone-week one-weekand andone-month one-monthhave havebeen been shown in Figures 19 and 20 respectively. shown in Figures 19 and 20 respectively.


18 of 22

Figure Figure19. 19.Actual Actualvs. vs.ANN ANNpredicted predictedresults resultsfor forone-week one-weekenergy energyconsumption. consumption. Electronics 2018, 7, x; doi: FOR PEER REVIEW

Figure 20. Actual vs. ANN predicted results for one-month energy consumption. Figure 20. Actual vs. ANN predicted results for one-month energy consumption.

5. Discussion and Comparative Results Analysis 5. Discussion and Comparative Results Analysis In the proposed work, we have applied the deep extreme learning algorithm along with ANN and In on thereal proposed work, we applied the deep extreme learning algorithm along ANN ANFIS data collected forhave one year to predict energy consumption in buildings forwith one-week and one-month. ANFIS on real data have collected for one year to consumption buildings for and The data been pre-processed to predict remove energy abnormalities from theindata and make one-week and one-month. The data have been pre-processed to remove abnormalities from the data the data smooth and error free. In the DELM different number of hidden layers, hidden neurons, and makecombinations the data smooth and errorfunctions free. In the DELM number of hidden layers, hidden different of activation have been different tried to find the best configuration model neurons, different combinations of activation functions have been tried to find the best for energy consumption prediction. For a fair comparison, we also apply trial and error approach configuration model for energy consumption prediction. For a fair comparison, we also apply trial for ANN to find the best configuration model. Hence, we have tried different numbers of neurons in and error approach fortypes ANNoftoactivation find the best configuration model. Hence,ofwe have tried hidden layers, different functions, and different numbers neurons in thedifferent hidden numbers of neurons in hidden layers, different types of activation functions, and different numbers layer. Similarly, we also tested different types of membership functions and different numbers of of neurons infunctions the hidden layer. Similarly, westructure also tested different types ofconsumption membershipprediction. functions membership to achieve the suitable of ANFIS for energy and In different numbers of membership functions DELM to achieve the different suitable structure ANFIS for this work, we have applied the proposed for two periods ofoftime energy energy consumption prediction. consumption prediction along with optimized ANN, and ANFIS approaches to test the efficiency In this work, we have applied the proposed for two prediction different periods ofinto timetraining energy of these algorithms properly. For one-week energyDELM consumption the data consumption prediction alongas with optimized ANN, and energy ANFIS consumption approaches toprediction. test the efficiency of set would be more significant compared to one-month Hence to these algorithms properly. For one-week energy consumption prediction the data into training set properly evaluate the performance of prediction algorithms both short-term and long-term energy would be more significant compared one-month energy consumption prediction. Hence to consumption prediction haveasbeen carried to output. properly evaluate performance of measures predictiontoalgorithms both short-termofand energy We have used the different statistical measure the performance thelong-term proposed DELM consumption prediction have been carried output. algorithm along with counterpart algorithms. In Tables 1 and 2 the MAE, RMSE and MAPE values We have used statistical measures to measure the consumption performance of the proposed of DELM, ANN anddifferent ANFIS for one-week and one-month energy prediction have DELM algorithm along with counterpart algorithms. In Tables 1 and 2 the MAE, RMSE and MAPE values of DELM, ANN and ANFIS for one-week and one-month energy consumption prediction have been recorded. As in the proposed work, we have computed the one-month and one-week energy consumption prediction using machine learning algorithms. Hence the average of statistical measures values for both periods for the used prediction algorithms have been computed in Table 3. These statistical measures values indicate that the DELM performance is far better than the other


19 of 22

been recorded. As in the proposed work, we have computed the one-month and one-week energy consumption prediction using machine learning algorithms. Hence the average of statistical measures values for both periods for the used prediction algorithms have been computed in Table 3. These statistical measures values indicate that the DELM performance is far better than the other counterpart algorithms. The performance of ANFIS is better as compared to the ANN. Table 1. Performance evaluation of deep extreme learning machine (DELM), adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN) for one-week energy consumption prediction. Statistical Measures

MAE

MAPE

RMSE

DELM ANFIS ANN

2.0008 2.2679 2.3918

5.7077 6.3884 6.7097

2.2451 2.4636 2.6030

Table 2. Performance evaluation of DELM, ANFIS and ANN for one-month energy consumption prediction. Statistical Measures

MAE

MAPE

RMSE

DELM ANFIS ANN

2.3347 2.6433 2.5437

6.5464 7.3798 7.4562

2.6864 3.1712 3.2400

Table 3. Average values of statistical measures for one-week and one-month energy consumption prediction results of DELM, ANFIS, and ANN. Statistical Measures

MAE

MAPE

RMSE

DELM ANFIS ANN

2.1677 2.4556 2.4317

6.1271 6.8841 7.0830

2.4657 2.8174 4.8561

The statistical measures values indicate that the performance of the proposed DELM is far better than the ANN and ANFIS for short-term (one-week) as well as on long-term (one-month) hourly energy consumption prediction. So, the proposed DELM is the best choice for the energy consumption prediction for both short and long terms energy consumption prediction. 6. Conclusions and Future Work Modelling of energy consumption prediction in residential buildings is a challenging task, because of randomness and noisy disturbance. To obtain better prediction accuracy, in this paper, we have proposed a model for energy consumption prediction in residential buildings. The proposed model comprised of four stages, namely data acquisition layer, a preprocessing layer, the prediction layer, and performance evaluation layer. In data acquisition layer the data was collected through smart meters in a designated building to validate the model and analyze the results. In the preprocessing layer, some pre-processing operations have been carried out on the data to remove abnormalities from the data. In the second stage, we have proposed deep extreme learning machine and applied to the pre-processed data for one-week and one-month energy consumption prediction in residential buildings. The purpose of using different machine learning algorithms on collected data was to obtain better results in term of accuracy for practical applications. For the optimal structure of DELM different parameters various number of hidden layers, different numbers of neurons in the hidden layer and different activation functions have been tuned to get the optimized structure of DELM. We have also applied other well-known machine learning algorithms, such as ANN, and ANFIS one the same data for comparison with proposed DELM. We have used different statistical measures for performance measurements of these machine learning algorithms. These statistical measures values indicate that the performance of proposed DELM is far better as compared to other counterpart algorithms. These


20 of 22

initial results give us confidence, and we are currently exploring various alternatives and collecting data to extend this work in directions above. Author Contributions: M.F. designed the proposed scheme, implemented the system, did experimental work and paper writing. D.K. conceived the overall idea for energy consumption prediction in residential buildings and did supervision of the overall work. Funding: This research received no external funding. Acknowledgments: This research was supported by the 2018 scientific promotion program funded by Jeju National University. Conflicts of Interest: The authors declare no conflicts of interest.

References 1. 2. 3. 4. 5. 6. 7. 8.

9. 10.

11.

12. 13.

14. 15. 16. 17. 18. 19.

Fayaz, M.; Kim, D. Energy Consumption Optimization and user comfort management in residential buildings using a bat algorithm and fuzzy logic. Energies 2018, 11, 161. [CrossRef] Selin, R. The Outlook for Energy: A View to 2040; ExxonMobil: Irving, TX, USA, 2013. Sieminski, A. International Energy Outlook; Energy Information Administration: Washington, DC, USA, 2014. Mitchell, B.M.; Ross, J.W.; Park, R.E. A Short Guide to Electric Utility Load Forecasting; Rand Corporation: Santa Monica, CA, USA, 1986. Parez-Lombard, L.; Ortiz, J.; Pout, C. A review on buildings energy consumption information. Energy Build. 2008, 40, 394–398. [CrossRef] Zhao, H.-X.; Magoulès, F. A review on the prediction of building energy consumption. Renew. Sustain. Energy Rev. 2012, 16, 3586–3592. [CrossRef] Fumo, N. A review on the basics of building energy estimation. Renew. Sustain. Energy Rev. 2014, 31, 53–60. [CrossRef] Ahmad, A.; Hassan, M.; Abdullah, M.; Rahman, H.; Hussin, F.; Abdullah, H.; Saidur, R. A review on applications of ANN and SVM for building electrical energy consumption forecasting. Renew. Sustain. Energy Rev. 2014, 33, 102–109. [CrossRef] Kim, S.; Kim, S. A multi-criteria approach toward discovering killer iot application in Korea. Technol. Forecast. Soc. Change 2016, 102, 143–155. [CrossRef] Malik, S.; Kim, D. Prediction-learning algorithm for efficient energy consumption in smart buildings based on particle regeneration and velocity boost in particle swarm optimization neural networks. Energies 2018, 11, 1289. [CrossRef] Khosravani, H.R.; Castilla, M.D.M.; Berenguel, M.; Ruano, A.E.; Ferreira, P.M. A comparison of energy consumption prediction models based on neural networks of a bioclimatic building. Energies 2016, 9, 57. [CrossRef] Kalogirou, S.A. Artificial neural networks in energy applications in buildings. Int. J. Low-Carbon Technol. 2006, 1, 201–216. [CrossRef] Kampouropoulos, K.; Cárdenas, J.J.; Giacometto, F.; Romeral, L. An energy prediction method using adaptive neuro-fuzzy inference system and genetic algorithms. In Proceedings of the 2013 IEEE International Symposium on Industrial Eleactronics, Taipei, Taiwan, 28–31 May 2013. Ullah, I.; Ahmad, R.; Kim, D. A prediction mechanism of energy consumption in residential buildings using hidden markov model. Energies 2018, 11, 358. [CrossRef] Huang, G.B.; Zhu, Q.Y.; Siew, C.K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [CrossRef] Fan, C.; Xiao, F.; Zhao, Y. A short-term building cooling load prediction method using deep learning algorithms. Appl. Energy 2017, 195, 222–233. [CrossRef] Schmidhuber, J. Deep learning in neural networks: An overview. Neural Netw. 2015, 61, 85–117. [CrossRef] [PubMed] Li, L.; Lv, Y.; Wang, F.Y. Traffic signal timing via deep reinforcement learning. IEEE/CAA J. Autom. Sin. 2016, 3, 247–254. Lv, Y.; Duan, Y.; Kang, W.; Li, Z.; Wang, F.Y. Traffic flow prediction with big data: A deep learning approach. IEEE Trans. Intell. Transp. Syst. 2015, 16, 865–873. [CrossRef]


20.

21. 22. 23. 24. 25. 26. 27. 28. 29. 30. 31. 32. 33. 34.

35. 36.

37. 38.

39.

40. 41. 42.

43.

21 of 22

Qiu, X.; Zhang, L.; Ren, Y.; Suganthan, P.N.; Amaratunga, G. Ensemble deep learning for regression and time series forecasting. In Proceedings of the 2014 IEEE Symposium on Computational Intelligence in Ensemble Learning (CIEL), Orlando, FL, USA, 9–12 December 2014; pp. 21–26. [CrossRef] Kalogirou, S.; Neocleous, C.; Schizas, C. Building heating load estimation using artificial neural networks. In Proceedings of the Clima 2000 Conference, Brussels, Belgium, 30 August–2 September 1997. Olofsson, T.; Andersson, S.; Östin, R. A method for predicting the annual building heating demand based on limited performance data. Energy Build. 1998, 28, 101–108. [CrossRef] Yokoyama, R.; Wakui, T.; Satake, R. Prediction of energy demands using neural network with model identification by global optimization. Energy Convers. Manag. 2009, 50, 319–327. [CrossRef] Kreider, J.; Claridge, D.; Curtiss, P.; Dodier, R.; Haberl, J.; Krarti, M. Building energy use prediction and system identification using recurrent neural networks. J. Sol. Energy Eng. 1995, 117, 161–166. [CrossRef] Ben-Nakhi, A.E.; Mahmoud, M.A. Cooling load prediction for buildings using general regression neural networks. Energy Convers. Manag. 2004, 45, 2127–2141. [CrossRef] Carpinteiro, O.A.; Reis, A.J.; da Silva, A.P. A hierarchical neural model in short-term load forecasting. Appl. Soft Comput. 2004, 4, 405–412. [CrossRef] Gross, G.; Galiana, F.D. Short-term load forecasting. Proc. IEEE 1987, 75, 1558–1573. [CrossRef] Irisarri, G.; Widergren, S.; Yehsakul, P. On-line load forecasting for energy control center application. IEEE Trans. Power App. Syst. 1982, 71–78. [CrossRef] Ali, S.; Kim, D.-H. Effective and comfortable power control model using kalman filter for building energy management. Wirel. Pers. Commun. 2013, 73, 1439–1453. [CrossRef] Wahid, F.; Kim, D.H. Short-term energy consumption prediction in korean residential buildings using optimized multi-layer perceptron. Kuwait J. Sci. 2017, 44, 67–77. Wahid, F.; Kim, D.H. A prediction approach for demand analysis of energy consumption using K-nearest neighbor in residential buildings. Int. J. Smart Home 2016, 10, 97–108. [CrossRef] Arghira, N.; Hawarah, L.; Ploix, S.; Jacomino, M. Prediction of appliances energy use in smart homes. Energy 2012, 48, 128–134. [CrossRef] Li, K.; Su, H.; Chu, J. Forecasting building energy consumption using neural networks and hybrid neuro-fuzzy system: A comparative study. Energy Build. 2011, 43, 2893–2899. [CrossRef] Kassa, Y.; Zhang, J.; Zheng, D.; Wei, D. Short term wind power prediction using ANFIS. In Proceedings of the 2016 IEEE International Conference on Power and Renewable Energy (ICPRE), Shanghai, China, 21–23 October 2016; pp. 388–393. Ekici, B.B.; Teoman Aksoy, U. Prediction of building energy needs in early stage of design by using ANFIS. Expert Syst. Appl. 2011, 38, 5352–5358. [CrossRef] Collobert, R.; Weston, J. A unified architecture for natural language processing: Deep neural networks with multitask learning. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 160–167. Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [CrossRef] [PubMed] Nau, R. Forecasting with Moving Averages. Fuqua School of Business, Duke University, 2014. Available online: https://people.duke.edu/~rnau/Notes_on_forecasting_with_moving_averages--Robert_Nau.pdf (accessed on 24 June 2018). Niu, D.; Wang, H.; Chen, H.; Liang, Y. The general regression neural network based on the fruit fly optimization algorithm and the data inconsistency rate for transmission line icing prediction. Energies 2017, 10, 66. [CrossRef] Cheng, J.; Duan, Z.; Xiong, Y. QAPSO-BP algorithm and its application in vibration fault diagnosis for a hydroelectric generating unit. Shock Vib. 2015, 34, 177–181. Huang, G.-B.; Wang, D.H.; Lan, Y. Extreme learning machines: A survey. Int. J. Mach. Learn. Cybern. 2011, 2, 107–122. [CrossRef] Wang, S.; Chen, H.; Yan, W.; Chen, Y.; Fu, X. Face recognition and micro-expression recognition based on discriminant tensor subspace analysis plus extreme learning machine. Neural Process. Lett. 2014, 39, 25–43. [CrossRef] Huang, G. An insight into extreme learning machines: Random neurons, random features and kernels. Cogn. Comput. 2014, 6, 376–390. [CrossRef]


44. 45. 46.

47. 48. 49.

50. 51. 52.

53.

22 of 22

Wei, J.; Liu, H.; Yan, G.; Sun, F. Robotic grasping recognition using multi-modal deep extreme learning machine. Multidim. Syst. Signal Process. 2017, 28, 817–833. [CrossRef] Geem, Z.W. Parameter estimation for the nonlinear muskingum model using the bfgs technique. J. Irrig. Drain. Eng. 2006, 132, 474–478. [CrossRef] Shine, P.; Murphy, M.; Upton, J.; Scully, T. Machine-learning algorithms for predicting on-farm direct water and electricity consumption on pasture based dairy farms. Comput. Electron. Agric. 2018, 150, 74–87. [CrossRef] Chau, K.W. Particle swarm optimization training algorithm for ANNs in stage prediction of Shing Mun River. J. Hydrol. 2006, 329, 363–367. [CrossRef] Lo, S.-P. An adaptive-network based fuzzy inference system for prediction of workpiece surface roughness in end milling. J. Mater. Process. Technol. 2003, 142, 665–675. [CrossRef] Elena Dragomir, O.; Dragomir, F.; Stefan, V.; Minca, E. Adaptive neuro-fuzzy inference systems as a strategy for predicting and controling the energy produced from renewable sources. Energies 2015, 8, 13047–13061. [CrossRef] Chang, F.-J.; Chang, Y.-T. Adaptive neuro-fuzzy inference system for prediction of water level in reservoir. Adv. Water Resour. 2006, 29, 1–10. [CrossRef] Jang, J.-S.R. ANFIS: Adaptive-network-based fuzzy inference system. IEEE Trans. Syst. Man Cybern. 1993, 23, 665–685. [CrossRef] Owda, H.; Omoniwa, B.; Shahid, A.; Ziauddin, S. Using Artificial Neural Network Techniques for Prediction of Electric Energy Consumption. Available online: https://arxiv.org/abs/1412.2186 (accessed on 24 June 2018). MATLAB, version 8.1.0 (R2013a); The MathWorks Inc.: Natick, MA, USA, 2013. © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).