The Effect of Missing Data on Wind Resource Estimation

3 downloads 0 Views 2MB Size Report
paper investigates the effect of missing time-series data caused by icing of anemometers ... power output generated from a wind turbine for each time point and ...
The Effect of Missing Data on Wind Resource Estimation Aidan Coville∗, Afzal Siddiqui,† and Klaus-Ole Vogstad‡

The available wind resource at a potential wind farm site is estimated using historical data, usually from an anemometer measuring wind speed and a weather vane measuring wind direction, set up at the site of interest. This paper considers the influence of missing data due to icing of machinery during the winter on the wind resource estimation. Using a mean-reverting, jump-diffusion process to model electricity prices in a deregulated market, the resulting effect on the expected revenue from a wind turbine constructed at the site is also considered. We show that missing data due to icing significantly biases the wind resource estimate downwards. This effect is dampened when considering revenue estimates due to the high volatility experienced in the electricity spot price market. However, when considering either a fixed price or low-volatility market for electricity, the effect of missing data on revenue estimates is highly significant. We develop a methodology to account for the seasonality experienced in both wind speed and direction, which succeeds in removing most of the bias in wind resource and revenue estimation caused by missing data.

1

INTRODUCTION

Changes in energy policy to mitigate the effects of climate change have encouraged investment in renewable energy technologies. In order to develop these technologies optimally, a better understanding of the economic viability of alternative energy sources is required. While the sun and wind provide free and sustainable energy, these sources are highly intermittent which results in a cost to the energy supplier. A deeper understanding of the processes driving this intermittency and the possible sources of bias that may influence resource estimates would add value to the decision-making process regarding renewable energy investments. In this paper, we focus on wind power, which is being developed extensively as more advanced turbines with capacities of up to 3 MW are better able to exploit resources. The planning phase for wind farms follows three distinct stages. Initially, a permit is obtained, giving the investor the right to build on a site. Wind masts are then installed with anemometers to measure the speed and weather vanes to measure the direction of wind in order to estimate the available wind resource at the site.1 Data are then collected over a period of time and, together with information on operating costs, electricity prices, ∗

Department of Economics, University of Oxford, E-mail: [email protected] Department of Statistical Science, University College London, and Department of Computer and Systems Sciences, Stockholm University/KTH, E-mail: [email protected] ‡ Agder Energi, Kristiansand, Norway, E-mail: [email protected]

1

1

INTRODUCTION

2

subsidy levels, and investment costs, a decision is made on whether or not to construct the wind farm along with its scale. This paper concerns itself with the second step of the production process - namely, the collection of site data to support the decision-making process of whether or not to continue to the construction phase of a project. Good-quality, reliable data are needed to make informed decisions, and regardless of how sophisticated the subsequent analyses are, bad-quality data can seriously jeopardise the integrity of a project. In particular, this paper investigates the effect of missing time-series data caused by icing of anemometers on annual wind resource estimates. Many sites ideal for the construction of wind farms are found in the cold, northern regions of Scandinavia. However, high average annual wind speeds and low temperatures mean that anemometers are often susceptible to freezing in the winter. Heated anemometers are slowly becoming more readily available, but these are expensive and not guaranteed to solve the problem. The result of missing data is a time series with gaps in portions of the winter months. It is also well known that wind speeds in the Nordic regions are generally higher in the winter months than during the summer. Thus, the missing data are not missing at random, but are rather excluded from the time series in a systematic way. Due to this systematic missingness, it is possible that, by ignoring the missing data, bias may be introduced into the estimates of annual wind resources. Here, we use a dataset of wind speed and direction for two years at a site in southern Norway unaffected by icing, and, with the aid of Monte Carlo simulation, find the distribution of the annual wind resource available. Data are then artificially removed from the time series to create the effect of missingness due to icing. The distribution of the wind resource is once again estimated and compared to that when the full dataset was used. In this way, the bias due to icing can be estimated. The effect of this change in the annual wind resource distribution on the present value of expected revenues generated from a single wind turbine constructed at the site of interest is then evaluated.2 We consider the case where investors are faced with the uncertainty of electricity spot prices in a deregulated market and compare this to a fixed-price feed-in tariff agreement. A number of features unique to electricity markets are addressed in this paper. Although electricity is a commodity with a price determined by classic supply and demand features, it is highly variable in the short term due to unavailability of storage. The price of electricity is believed to revert to a mean value in line with its marginal cost of production in the long term (Lucia and Schwartz, 2002). At very high (low) prices, suppliers enter (leave) the market, thereby driving the prices down (up). Based on patterns of electricity demand, prices also exhibit spikes and strong levels of seasonality. We use data from Nord Pool’s day-ahead spot market, which provides prices for each hour of the following day based on the intersection of supply and demand. Nord Pool spot prices are low in summer months and increase in the winter as more heating and lighting is used (Bystr¨om, 2005). Prices during off-peak hours such as at night and on weekends also decrease correspondingly to demand, which results in a distinct daily and weekly pattern 1

The wind resource at a site refers to wind available for energy exploitation, usually described by the joint distribution of wind speed and direction over a grid. Details of how the wind resource is estimated will be discussed in the following section. 2 Annual revenue estimates are obtained by multiplying the expected electricity price by the expected power output generated from a wind turbine for each time point and summing over a year.

2

MODELLING TOOLS

3

in the Nord Pool spot price time-series data. From the data, we find that electricity prices and wind speeds have a small, but significant, positive correlation. However, we find a correlation that is not significantly different from zero when we deseasonalise both processes. This indicates that the observed dependency between the two processes can be explained by the underlying seasonality that both processes share. Wind-powered electricity supply to Nord Pool constitutes a small percentage of overall supply, and from these results, it seems reasonable that changes in supply (from changing wind speeds) do not significantly affect electricity spot prices after accounting for seasonal fluctuations.With this conditional independence in mind, we propose a method for removing the seasonality present in wind direction and wind speed before imputing the missing wind values and estimating the wind resource and corresponding revenue at the site. In order to make this problem manageable, a few simplifications are assumed. First, no long-term patterns in wind speed or electricity prices are considered. As such, this paper deals with a one-year time horizon when considering wind resource and revenue estimation. Second, a wind farm usually consists of a number of wind turbines across a site; however, we consider only the case that a single wind turbine with a rated power of 850 kW is to be constructed at the location where the wind speeds are observed. Revenue is assumed to be derived solely from the wind turbine, and no external sources of revenue such as government subsidies are considered. Finally, it is assumed that given the choice of wind turbine, investment and operating costs are constant over the year and are independent of revenue calculations. Costs are not explicitly accounted for in this paper, but, under the given assumption, could be easily incorporated to estimate expected profits based on expected revenues. The structure of this paper is as follows: in Section 2, we survey the current tools available for modelling wind speed, wind energy, and electricity spot prices. In Section 3, the data used for the empirical analysis are considered in more detail. Section 4 outlines the models chosen to estimate wind speed, wind energy, and electricity spot prices, and Section 5 discusses the simulation procedure and its subsequent results. The paper concludes in Section 6 with the implications these results have for the wind energy sector.

2 2.1

MODELLING TOOLS Wind Resource Estimation

Wind speed, w , has often been shown to fit a Rayleigh distribution, but is further generalised to the Weibull distribution (of which Rayleigh is a special case) with the following probability density function (PDF): fW (w) =

κ w κ−1 ( wγ )κ ( ) e , γ γ

w ≥ 0.

(1)

This distribution captures the main components of wind speed: non-negative observations with a positively skewed distribution, thereby indicating the possibility of large wind speeds in the tail of the distribution with a mean value larger than the median. The parameters κ and γ are referred to as the shape and scale parameters, respectively.

2

MODELLING TOOLS

4

It is further understood that wind speed and wind direction are dependent. Wind direction is an important attribute of the wind regime that needs to be understood by developers as wind turbines are sited at optimal angles, perpendicular to the prevailing winds in the region, in order to avoid wake loss3 when multiple turbines are being constructed in close proximity. To deal with wind direction, which is a circular, continuous variable, wind direction data are aggregated into sectors and treated as categorical variables. The loss of accuracy in aggregation, while not ideal, is far outweighed by the simplification of the resulting analysis. In this paper, we aggregate wind direction into 12 sectors of 30 degrees each. It is then assumed that the conditional distribution of the wind speeds in each directional sector is Weibull with different shape and scale parameters. Here, the parameters have been calculated using maximum likelihood estimation (MLE). The power available from wind is proportional to the wind speed cubed, meaning that small variations in an estimate of the wind speed result in very large deviations in the corresponding power output. This makes it important to measure wind speeds accurately. Furthermore, it is assumed that the relationship between the wind speed and the resulting power output from a given wind turbine is described by the deterministic power curve. While each turbine model will have different characteristics, all power curves follow a similar pattern. There is no power output at wind speeds below a certain level (usually about 4 m/s), which is called the cut-in point. The curve then increases in proportion to the cubic value of the wind speed up to a point, increasing at a decreasing rate thereafter. The rated power level is the point at which the curve reaches its maximum, usually between 12 and 15 m/s. The turbine is then controlled through a variety of methods such that the power output does not exceed this level even as wind speeds increase. For safety reasons, there is a cut-off point around 25 m/s at which point the turbine is shut down. Figure 1(a) shows the typical power curve of a Vestas V52 wind turbine with a rated power of 850 kW. Anahua et al. (2007) addresses the fact that these power curves are in fact only estimates of the true relationship between power output and wind speeds and can be affected by site conditions. It accounts for this by modelling power output as a stochastic process; however, this is beyond the scope of our paper. Instead, the reference power curve in this paper is defined by the following simple set of equations, seen graphically in Figure 1(b):

PW (w) =

 3   0.5w − 21.4375

850

  0

3.5 ≤ w < 12 12 ≤ w < 25 otherwise

(2)

When the power curve, PW (w ), the Weibull PDF for each sector, fWi , and the relative frequency of wind directions, ri , for each of the s sectors are known, the annual energy production (AEP) is calculated using the following formula: AEP = n ×

s X

Z

ri

fW i (w)PW (w)dw

(3)

i=1

Here, n is the number of time points in a year, depending on the resolution of the observations available. For instance, if data were available in daily intervals, then n would be 3

Wake loss, in this case, describes the loss of exploitable power from wind due to the turbulence created from wind passing over another turbine.

2

MODELLING TOOLS

5

Figure 1: Power Curves

365. In this form, the AEP does not explicitly consider seasonality since there is no time component to the calculation. We deal with this in Section 5.4 by modelling the seasonal component of the wind speed directly and retaining the time dimension of the data.

2.2

Icing

Icing of instruments is a common problem for potential wind farm developers trying to obtain an initial dataset from which to estimate the wind resource in a region. Lacroix and Manwell (2000) argues that in areas of North America, in high altitudes, icing can be expected up to 10% of the time. As wind speeds tend to increase in winter, these missing data will mean that the wind resource may be underestimated by fitting incorrect parameters to the Weibull distributions. Harstveit et al. (2004) presents a method for modelling the likelihood of instrument icing using altitude, wind speed, cloud cover, and temperature as influential factors. This paper simplifies the approach by assuming that icing will occur in the coldest periods of the year. As such, the effects of icing are mimicked by removing the observations corresponding to the lowest 2%, 5%, and 10% of temperatures. This artificial removal captures some of the elements of missing data due to icing: namely, data are removed from the coldest periods, generally when wind speeds are at their highest, and data are removed in batches. As temperature is highly correlated from one time point to the next, the coldest temperatures are usually found close together. Hence, the removal of these batches mimics the nature of icing where data will be lost until the mast thaws or is de-iced manually.

2.3

Electricity Spot Prices

The three defining features of electricity price processes are seasonality, mean-reversion, and occasional spikes. Electricity markets often exhibit strong seasonality components dictated by changes in demand throughout the year. The electricity price, Pt , can thus

2

MODELLING TOOLS

6

be decomposed into a deterministic seasonal component and a random process as follows: Pt = St eXt

(4)

where Xt is a stochastic process and the seasonality component, St , in this case is multiplicative. By modelling the log of electricity prices, this can be separated from the random component: log(Pt /St ) = Xt (5) In practice, there are a number of ways in which the seasonal component can be estimated. Lucia and Schwartz (2002) posits that the annual seasonality in the Nord Pool Elspot market can be accounted for through a sinusoidal function, and the weekly seasonality can be removed by using a dummy variable for weekends. These parameters can be estimated through nonlinear regression methods. Weron et al. (2004) uses wavelet decomposition, breaking the time series into small and large frequencies, and then estimating the resulting cycles using a sine function, while Cartea and Figueroa (2005) uses Fourier transforms to estimate the annual cycle of the deregulated UK market. The above papers do not model daily patterns and opt to use a single day as the data resolution level, aggregating the hourly observations accordingly. Harvey (1989) presents a method for removing seasonality by using a mixture of sine and cosine functions such that: [s/2]

St =

X

(γ1j cos λj t + γ2j sin λj t),

t = 1, 2, ..., n

(6)

j=1

where s is the number of time steps over which the seasonal cycle occurs and λj = 2πj /s. When s is an odd number, s/2 is rounded down. In this paper, we follow a similar method to that proposed by Lucia and Schartz (2002) to account for yearly and weekly seasonality. Daily seasonality is a more complex process requiring more parameters for accurate modelling. In this case, we follow the method proposed by Harvey (1989) to account for daily fluctuations. After removing the seasonality component, we use the Ornstein-Uhlenbeck model to capture the mean-reverting properties of the process Xt that drives electricity prices: ¯ − Xt )dt + σdZt dXt = η(X

(7)

¯ is the mean-reversion level in the long term, where dZt is a Brownian motion increment X and η is the speed at which the process reverts to the long-term mean. Dixit and Pindyck (1994) shows that this process in its discrete form is equivalent to an autoregressive model of lag 1 - AR(1): Xt − Xt−1 = a + bXt−1 + t (8) Here, t is normally distributed with mean zero and standard deviation σ , and the parameters a and b can be estimated by simple linear regression. The estimates found in the discrete form of the mean-reverting process relate to the continuous form in the following way: ¯ = −ˆ X a/ˆb, ηˆ = −log(1 + ˆb)

(9) (10)

3

DATA

7

and

v u u log(1 + ˆ b) σ ˆ = σˆ t

(1 + ˆb)2 − 1

(11)

Although the simple mean-reverting model captures the basic attributes of electricity spot prices, it fails to describe the inherent spiky nature of the market. By visual inspection of historical data, it is clear that large spikes occur more frequently than one would expect if the changes in electricity prices were truly explained by a normal distribution, as implied by Equation (7). By incorporating a jump process, the model accounts for random spikes that would not be predicted by a simple mean-reverting model. These jumps are captured by a Poisson process, independent of the mean-reversion diffusion process. More formally, ¯ − Xt )dt + σdZt + Jt dqt dXt = η(X

(12)

where Jt is the size of the jump, determined by a chosen distribution and dqt follows a Poisson process with some intensity parameter indicating the frequency of these jumps. Weron et al. (2004) shows that, by choosing an appropriate method to remove spikes from the observed data, the remaining observations are much better suited to the assumption of normally distributed increments.

3

DATA

The company Agder Energi has set up three wind masts within a few kilometres of each other in southern Norway with the intention of estimating the wind resource in the region. For each of the three masts, Geitvassfjellet, Holmavatnet, and Leksarvatnet, data are recorded continuously, but reported as ten-minute interval averages. Average temperature, wind speed, wind direction, and their corresponding variances have been recorded at heights of 48 m and 30 m for a period of two years. These datasets suffer from a small amount of missing data, but Geitvassfjellet at 48 m was chosen as the reference mast with just over 1% of its data missing. It is on the Geitvassfjellet mast that the analysis in this paper is based. The analysis uses data from the time period 15 May 2006 - 14 May 2007, where the second year of observations (15 May 2007 - 14 May 2008) is used as an out-of-sample dataset with which to compare the results obtained using the first year’s data. The Nord Pool electricity spot price market is the oldest and most reliable market of its kind. Hourly data are available from 1996; however, the data used in this paper span the interval of 1 January 1999 - 14 May 2008 and were obtained from the Nord Pool FTP server in Swedish krona (SEK) per MWh.

4

MODEL FITTING

The model fitting and comparison methodology in this paper follows two distinct paths. Alternative electricity spot price models are considered first, after which, an imputation method for calculating the AEP for different levels of missing data is proposed and compared to the case in which missing data are ignored. Revenue estimates are then

4

MODEL FITTING

8

obtained for each method based on energy production and electricity price simulations. This methodological process is summarised as a flowchart in Figure 2.

Figure 2: Methodology Flowchart

4.1

Electricity Spot Price

Initially, seasonality is removed from the time series as Step A.2 from the methodology flowchart. It is assumed that the seasonality component is multiplicative, where the spot price is described by Equation (4).Through descriptive statistics and an observational analysis, the nature of the daily, weekly, and annual seasonality is interpreted. Based on this, the methods described in Section 2.3 are used to remove the deterministic seasonal component from the observed data, allowing the deseasonalised spot price data to remain. The seasonal component is then added back to the data after the diffusion process has been estimated. This method of deseasonalisation is applied to both of the models proposed below. Step A.3 of the methodology involves fitting two models to the Nord Pool electricity spot price. The first model is a simple mean-reverting diffusion process defined by Equation (7). In this case, the mean-reverting level and mean-reverting speed are estimated using the regression model described in Equation (8). The second model incorporates the possibility of large spikes by fitting a jump-diffusion process as described by Equation (12). This paper follows the methodology proposed by Blanco and Soronow (2001) and used by Cartea and Figueroa (2005) and Bierbrauer et al. (2004) to do this. Spikes are first identified in the original time series by differencing the log of the spot price and flagging all differences exceeding three standard deviations. These values are filtered from the time series, and the process is repeated until the number of differences of size three standard deviations or larger is that to be expected under the assumption that the differences where normally distributed. Using standard Normal tables, this would be approximately 0.26% of the data. A mean-reverting process is then fitted to the remaining data, and the observations flagged as spikes are modelled separately under the assumption that the spikes are independent of the diffusion process. A shifted lognormal

4

MODEL FITTING

9

distribution is fitted to the jump sizes using MLE to determine the parameters of the distribution. Although a rather ad hoc method, this process explicitly addresses the fact that the increments in the log spot price are not normally distributed, as assumed in the simple mean-reverting process, and accounts for the presence of extreme jumps away from the mean. In this case, the timing of the jumps is assumed to follow a Poisson process, where the number of jumps expected at each time point is Poisson distributed and the waiting time until the next jump is exponentially distributed. This assumption ignores the fact that jumps are more likely to occur in close proximity to each other, which would contradict the memoryless property of the exponential distribution. A regime-switching mechanism, or a more appropriate renewal process could be incorporated into the model to account for this, however that is beyond the scope of our paper. Using one hour as the time interval, 1000 yearly sample paths are then simulated based on the fitted models and discounted to present value using a continuous discount rate of 10%. The seasonality component is reapplied, and these sample paths are then used in conjunction with the simulated energy output to estimate the distribution of present value revenue.

4.2

Wind Speed and Direction

Using the Geitvassfjellet mast data for wind speed and direction, 2%, 5%, and 10% of the data are artificially removed from the available set as Step B.2 of the methodology. Step B.3 of the methodology involves calculating the AEP for each level of missing data. Initially, Weibull distributions are fitted to each of the twelve directional sectors using the available data and MLE to estimate the shape and scale parameters. The AEP is then calculated using Equation (3) for each level of missing data. In order to assess the significance of these differences, Monte Carlo simulation is used to obtain confidence intervals for the point estimates obtained. For each time step (in this case there are 8760 time steps representing the number of hours in a normal year), the directional sector is first chosen at random, based on the relative frequencies, ri , observed in the dataset. A random value is then generated from the fitted Weibull distribution corresponding to the chosen sector, thus providing the wind speed for each point in time. These wind speeds are then converted to power using the chosen power curve to provide the energy yield for each hour of the year. This process is repeated 1000 times to obtain a distribution for the AEP from which inferences can be made. As Step C of the methodology, the 1000 simulated wind speed sample paths are then multiplied by the corresponding 1000 simulated electricity prices and summed over the year to obtain an estimated yearly revenue distribution based on the current wind regime and electricity spot price market. This allows one to assess how influential the change in AEP estimates due to missing data is in the context of electricity price uncertainty. Step B.3.1 of the methodology deals with the bias observed in the AEP calculations by attempting to remove the seasonality present in both the wind speed and direction before fitting the Weibull distributions to each directional sector. The ten-minutely data are then simulated based on these newly fitted Weibull distributions to obtain a year’s worth of observations, and the seasonality is re-applied to the resulting time series. The year’s simulated values are then aggregated and converted by the power curve to obtain an estimate of the AEP. This is repeated 1000 times in order to obtain a distribution of

5

SIMULATIONS AND RESULTS

10

the expected AEP. The full details of this simulation procedure follow in Section 5.4.

5 5.1

SIMULATIONS AND RESULTS Electricity Spot Price

The price of electricity has a strong daily, weekly, and annual seasonal component. Figure 3 shows the hourly, daily, and monthly averages of the available data, which gives a clear indication of the seasonality present. The annual seasonality can be roughly described by

M

T

W

T

F

S

S

F e bA p r J u n A u g O c t D e c

Figure 3: Spot Price Seasonality a sine or cosine function, and there is a clear dip in prices during weekends, which are considered off-peak times. The daily seasonality also exhibits a clear pattern, but this pattern cannot be described by a single sine or cosine function. In this case, there is a bimodal shape with a commercial peak at 10:00 and a domestic peak at about 19:00. The average price drops sharply during off-peak hours when consumers go to sleep and demand drops. In order to account for these three components of seasonality, a robust regression is performed including twelve sine and twelve cosine functions to account for the daily seasonality as proposed by Harvey (1989) as well as a dummy variable to indicate a weekend or week day and a sine and cosine function to account for the yearly seasonality similar to Lucia and Schwartz (2002). In total, 27 variables are used to address the seasonality present in the data, and the log of electricity prices, Pt , is described by the following equation: log(Pt ) = α1 sin

12 X 2π 2π 2πj 2πj t + α2 cos t + βδt + (γ1j cos t + γ2j sin t) + Xt , 8760 8760 24 24 j=1

t = 1, 2, ..., n

(13)

where α1 and α2 are coefficient estimates for the yearly seasonal process, δt is a dummy variable with a value of one if the observation lies on a weekend and zero otherwise, and β is the corresponding estimated coefficient. The bracketed term in the summation accounts for the daily seasonality and Xt is the error term of the regression, which corresponds to the deseasonalised log electricity price process. The seasonality component is stored, and the error term of the regression is used for subsequent analyses.

5

SIMULATIONS AND RESULTS

11

Using the deseasonalised process and running a regression of the differenced log spot price on the log spot price of lag 1 from Equation (8), we estimate the continuous mean-reverting model to be dXt = 0.006(5.27 − Xt )dt + 0.039dZt

(14)

We then reapply the seasonality component and exponentiate to obtain the electricity spot price process. The mean and standard deviation of the simulated data are 220.55 SEK and 121 SEK, respectively, which compares favourably to the corresponding mean of 221.5 SEK and standard deviation of 118.6 SEK calculated from the observed data. However, this model fails to capture some of the large spikes present in the observed data. The second method involves filtering the spikes and fitting a diffusion process to the spikeless data and an appropriate distribution to the size of the observed spikes. After filtering the data, 4098 observations are removed from the original time series. The intensity parameter of the Poisson distribution used to describe the distribution of jumps over time is estimated as the total number of jumps recorded divided by the total number of observations, which is 0.067. The distribution of spike sizes is slightly skewed with an estimated skewness of 0.4, mean of 0.008, and a roughly equal number of upward and downward spikes. It is assumed that the true spike distribution is symmetric, allowing for a simpler analysis by considering only the absolute value of the spikes. After taking the absolute value and shifting the data so that the smallest spike is of size zero, the distribution of absolute spike sizes is explained well by a truncated lognormal distribution with a truncation point of 1.2, where the parameters are estimated using MLE. Because the lognormal distribution has a longer tail than the empirical spike distribution, we use this truncation to ensure that unrealistically large spikes are not possible. A mean-reverting diffusion process is then fitted to the remaining data, and, by assuming the jump process is an independent Poisson process with truncated lognormal jumps, the fitted model is: dXt = 0.00048(5.3 − Xt )dt + 0.015dZt + Jt dqt

(15)

Here, Jt ∼ lognormal(-3.37, 2.04) with truncation point 1.2, and dqt denotes the increments of a Poisson process with intensity parameter 0.067. By removing the spikes, both the estimated standard error of the increments as well as the mean-reversion speed of the diffusion process have decreased. Spikes generally revert back to the mean level fairly quickly, thus increasing the mean-reversion speed when they are included in the diffusion process. These spikes also increase the standard error of the increments. The results, therefore, follow intuition. The mean electricity price of the simulated data based on the jump-diffusion model is calculated to be 233.8 SEK, and the standard deviation is 140.2 SEK. Both of these values are overestimates of the moments calculated from the observed data.

5.2

Wind Speed and Direction

The wind rose is a graphical tool used to describe the joint distribution of wind speed and direction at a site. Wind speeds are aggregated into manageable groups, and relative

5

SIMULATIONS AND RESULTS

12

frequencies are displayed in a form of circular cumulative bar chart corresponding to the directional sectors of the observations. This tool is very useful as a graphical comparison and presents a simple and understandable summary of the data as seen in Figure 4, which indicates dependence between wind speed and direction. Formally, the Pearson correlation test yields a value of 0.1975 with a corresponding 95% confidence interval of [0.189,0.206], indicating a highly significant correlation between the two variables. Full Dataset N

15 % 10 % 5% W

E

S

Figure 4: Wind Rose

5.3

AEP and Revenue Estimation

After artificially removing data, the resulting AEP calculations, based on the fitted Weibull distributions, are compared in Table 1. These AEP calculations confirm the hypothesis Table 1: AEP Results Data Removed None 2% 5% 10%

AEP (MWh/year) 2641 2610 2559 2541

Bias (%) — 1.2 3.1 3.8

that systematic removal of data during cold periods biases the estimated AEP downwards with increasing bias as the percentage of missing data increases. In order to assess the significance of this bias, the Monte Carlo simulation described in Section 4.2 is used to estimate the AEP distribution under the assumption that the fitted Weibull distributions accurately describe the wind resource at the site and are constant throughout time. This is a strong assumption to make based on only one year of data since, in reality, these are only estimates of the true underlying distributions and, when considering a long-term analysis, due consideration should be given to the fact that the wind resource can change over time. As such, the true AEP variance will be larger to account for these sources of uncertainty. After simulating 1000 sample paths and obtaining the corresponding AEP for each path, the results are well approximated by the normal distribution as to be expected by the Central Limit Theorem. Figure 5 provides a comparison of the approximated normal distributions of the AEP to be expected for each of the missing data levels.

SIMULATIONS AND RESULTS

13

Full 2% 5% 10%

0.000 0.002 0.004 0.006 0.008 0.010 0.012

Density

5

2400

2500

2600

2700

2800

MWh/year

Figure 5: AEP Distribution

This comparison highlights the significance of the bias induced by missing data in relation to the expected variability in the AEP. A value smaller than the point estimate of 2541 MWh/year found using the dataset with 10% missing observations would be expected with a 0.0004 probability if the true wind resource was described by the joint distribution estimated when using the full dataset at Geitvassfjellet. This is a highly significant bias under the assumptions made. These simulated power output series are then multiplied by the simulated electricity spot prices to obtain estimates of the present value of revenue over a one-year time horizon. For the case of the simple mean-reverting process, the resulting revenue distributions have a small positive skewness due to the fact that electricity prices are bounded below by zero, but are not bounded above. This results in a handful of very large simulated revenues. The jump-diffusion process results in a much more pronounced skewness with very large revenue simulations occurring far more frequently than when using the mean-reverting process to model electricity prices. The revenue variance and mean are also larger for the jump-diffusion process. Using the full wind speed dataset and the meanreverting model, the simulated revenue mean and standard deviation are calculated to be 553360 SEK and 57407 SEK, respectively. When using the jump-diffusion process the simulated mean and standard deviation of the revenue are calculated to be 585440 SEK and 277925 SEK, respectively. Most noticeable is the fact that the standard deviation of revenues when using the jump-diffusion process is almost five times larger than that of the mean-reverting process. This indicates how influential a correct model fit for electricity prices is for a potential investor and how much variability there is between models. As to be expected, comparison between revenue estimates based on the full dataset and the set with 10% artificially removed data show smaller differences using the jump-diffusion model than the mean-reverting model with respect to the variability expected. For illustration, only the results from the mean-reverting model are presented in Figure 6. The bias incurred through the missing data is overwhelmed by the variability experienced in the electricity spot price estimates. Although there is still, as expected, a downward shift in the revenue distribution as more data are removed, this shift is relatively

14

0.000 0.001 0.002 0.003 0.004 0.005 0.006 0.007

SIMULATIONS AND RESULTS

Density

5

Full 2% 5% 10%

400

500

600

700

800

Swedish Krona (SEK in 1000's)

Figure 6: Revenue Distribution

small in comparison to the variability expected in revenue estimation. The mean of the revenue estimates with 10% missing data is 532461 SEK - a discrepancy of just over 20000 SEK. While the Pearson correlation coefficient between the wind speed and electricity spot price processes using the raw data is 0.04 with a 95% confidence interval [0.02,0.06], this significant (but small) correlation may pose a problem when modelling these two processes independently. This dependence, however, seems to be induced purely because of the common seasonal components underlying both processes. Since the Pearson correlation coefficient estimate of the correlation between the deseasonalised log price of electricity and deseasonalised wind speed is not significantly different from zero, and calculated to be -0.002 with a 95% confidence interval of [-0.005,0.001], this indicates that the two deseasonalised processes can be modelled independently, as is done in the following section.

5.4

Seasonality Model

The reason that bias occurs due to missingness caused by icing is because wind speeds are generally higher in the winter than in the summer. By removing this seasonality before estimating the wind regime, it is hypothesised that the effect of the systematic missingness will be dampened. In theory, if all of the observed bias is due to a seasonality effect that can be captured by statistical methods, then it should be possible to remove this bias completely. In practice, seasonality estimates will differ based on the data available. There is no guarantee that the deterministic seasonality component estimated with large portions of missing data in the colder months will accurately capture the true seasonality present at the site of interest. The problem is further compounded by the fact that seasonality is present in both wind speed and wind direction, which are dependent variables. An outline behind the process of the seasonality model proposed is presented as a flowchart in Figure 7.

5

SIMULATIONS AND RESULTS

15

Figure 7: Flowchart of Seasonality Model

5.4.1

Seasonality of Wind Direction

Because direction is a circular variable, standard descriptive tools such as the autocorrelation function (ACF) used to assess the nature of the seasonality component cannot be applied here. Instead, wind direction is assigned to one of twelve sectors and treated as a categorical variable. By constructing wind roses for each hour of the day as well as for each month of the year, it becomes clear that wind direction has a definite annual and daily seasonal component. The wind roses do not present a clear cut-off point; however, it is noticeable that evening wind directions differ from those during the day. Monthly wind direction patterns seem to change between the winter months (November, December, January), the summer months (April, May, June, July, August) and the transition months (February, March, September, October), where wind roses are similar within these categories, but differ quite markedly across categories. The assumption is made that wind direction follows a Markov process whereby the future wind directional sector, Dt+1 , is independent of past values, given the present direction: P r(Dt+1 = d|Dt = dt , Dt−1 = dt−1 , ..., D1 = d1 ) = P r(Dt+1 = d|Dt = dt ) ∀ t

(16)

By making this assumption, it is possible to estimate the transition probabilities for a discrete-time, discrete-space Markov chain based on the observed transitions across each of the twelve directional sectors. In this case, there are 122 = 144 transition probabilities to be estimated. To account for the daily and seasonal differences in transition probabilities, these probabilities are estimated for six Markov chains corresponding to each of the possible combinations of day/night and summer/winter/transition months. Wind directions are then simulated over an entire year by generating a starting node with each sector having probability of being chosen proportional to the observed relative frequency of being in that sector. The simulations are then generated using the transition matrices estimated from the observed data and switching between each of the six transition matrices based on the time of day and month in the year. This method captures both the first-order autocorrelation and seasonality experienced in the wind direction observations.

5

SIMULATIONS AND RESULTS

5.4.2

16

Seasonality of Wind Speed

Wind speed seasonality is easier to capture and can rely on some of the methodologies used to deseasonalise the electricity spot price data. It is assumed that the seasonality component, St , is multiplicative to ensure that wind speeds, Wt , are always positive and can be described by: Wt Wt = St vt or = vt (17) St where vt represents the deseasonalised wind speed. By taking logs to make the equation additive and using robust regression, the seasonality component can be estimated. The regression model considered uses a sine and cosine function to account for the yearly seasonality and the same to explain the daily pattern. To account for the fact that the daily variability can change over the year, the second-order interactions of these functions are also included. After estimating the coefficients using robust regression, the log seasonality component is obtained. To obtain St , the estimated function is exponentiated, and to obtain vt the observed wind speeds are divided by St . Using these deseasonalised wind speeds, the Weibull distributions are estimated as usual. Figure 8 shows the wind speed ACF before and after deseasonalisation, indicating the adequacy of this method in removing both the daily and yearly seasonality components.

Figure 8: Deseasonalising Wind Speed - ACF

5

SIMULATIONS AND RESULTS

5.4.3

17

Simulations

Based on the fitted Weibull distributions and directional transition matrices, the wind direction sectors for each time point in the year (52560 for a ten-minute resolution) are simulated. Wind speeds are then generated for each time point based on the Weibull distribution corresponding to the simulated directional sector. One thousand such sample paths are generated, and the seasonality component is reapplied in order to estimate the AEP. Before the AEP estimation is conducted, though, an algorithm is used to reorder the simulated data in such a way that the underlying autocorrelation structure of the observed wind speed is mimicked. 5.4.4

Sorting Algorithm

By generating a random variable from a Weibull distribution for each point in time in order to simulate a sample path for wind speeds over a year, one can ensure that the resulting sample path will have Weibull-distributed wind speeds, but in the process all autocorrelation present in real wind speed data is lost since wind speeds at each time point are independent of the last. When looking to reproduce annual characteristics of wind speed, this method is fine. If, however, the time dependency of the sample path is important, as is the case when considering the seasonality effect, then this feature becomes a problem. In order to reproduce the autocorrelation structure of wind speed over time, but still maintain the overall characteristics of the wind regime - namely the wind speed and direction distributions - a sorting algorithm is proposed. The sample path is first generated using the fitted Weibull distributions and wind directions. This ensures that these distributional properties are maintained. It is then assumed that the autocorrelation in wind direction is fully accounted for through the Markov process used, and as such, the simulated wind directions are not altered. A distribution is then fitted to the increments observed in the deseasonalised wind speeds. As it turns out, the empirical distribution of increments is highly symmetric around zero with a median of zero and a mean of -0.0001. The skewness is estimated to be 0.217, and the absolute value of these increments can be well estimated by a Weibull distribution with shape and scale parameter estimates of 1.05 and 0.59, respectively. This distribution is then used to dictate the change in wind speed from one time point to the next. The algorithm works as follows, with a numerical example presented in Figure 9: • Start with a sample path of 52560 wind speeds each with a corresponding wind direction. • The simulated wind direction process is assumed to retain its autocorrelation through the Markov chain model and is not reordered. • The first simulated point from the wind speed sample path is used as the starting value at t1 . • A random number is then generated from the distribution fitted to the wind speed increments and given an equal probability of being positive or negative (since the actual increment distribution is assumed symmetric about zero, but all values generated from the Weibull distribution are positive).

5

SIMULATIONS AND RESULTS

18

• This increment is then added to the starting value, resulting in the ideal choice for the wind speed at t2 . • From the remaining 52559 simulated wind speed values remaining, only the ones with a corresponding wind direction sector equal to the directional sector at t2 are considered. • From the list of possible wind speeds, the closest value to the ideal choice is taken to be the wind speed at t2 • This process is repeated until all points have been reordered.

Figure 9: Example of Sorting Algorithm

The result of this algorithm is that the increments in wind speeds from the simulation now roughly follow the same distribution as that of the observed data. The fitted Weibull distribution for a random simulation has a shape parameter of 0.89 and a scale parameter of 0.61. The smaller shape parameter indicates that the simulated distribution has a slightly larger spread than the empirical distribution. Although the algorithm succeeds in retaining the wind speed autocorrelation to an extent, it does experience some important flaws. Since only the closest value available is chosen for the next time step, the incremental distribution is not followed exactly. In particular, problems occur at very high and very low wind speed levels, since values never go below zero, and there are very few choices at extremely high wind speeds. In reality, it would be expected that increment sizes are dependent on the actual wind speed. This is seen trivially by the fact that decrements cannot be larger in absolute value than the current wind speed without producing negative values. Another problem that occurs is that as the algorithm proceeds and there are fewer simulated values from which to choose as the ideal next choice, the resulting choices slowly

5

SIMULATIONS AND RESULTS

19

become far from ideal. While these factors hinder the algorithm’s ability to ensure that the simulated increment distribution stays true to the empirical one, they do not influence the overall wind speed and direction distributions in any way as none of the simulated values are actually changed, but only reordered. By accounting for the seasonality in wind speed, and using the sorting algorithm discussed, the daily and annual autocorrelation is captured quite well, as seen in Figure 10.

Figure 10: Simulated vs. Observed Wind Speed ACF

5.4.5

AEP and Revenue Calculations with Seasonality Adjustment

Once the ordered wind speed sample paths are obtained, the seasonality component is reapplied, the series is converted to energy yield, and the AEP distribution is estimated based on 1000 sample paths. Figure 11 shows the AEP distributions approximated by normal distributions for both the full dataset and the dataset with 10% of missing observations. Strikingly, there is very little difference between the two distributions indicating that the proposed seasonality model is successful in restoring the original AEP density function even when this estimate is based on data with 10% missing observations. The mean of 2640 MWh/year for the full dataset is very close to the AEP obtained analytically using Equation (3), confirming that the process itself does not introduce any bias. The mean AEP of 2642 MWh/year found using the dataset with 10% missing observations indicates a very slight, but insignificant, overestimate of the true AEP. The standard deviation is also affected and is slightly underestimated in this case. Corresponding revenue

SIMULATIONS AND RESULTS

20

Full 10%

0.006 0.004 0.000

0.002

Density

0.008

0.010

5

2500

2550

2600

2650

2700

2750

2800

MWh/year

Figure 11: AEP Distribution Accounting for Seasonality

0.000 0.001 0.002 0.003 0.004 0.005 0.006

Density

estimates are then calculated and summarised in Figure 12.

Full 10%

400

500

600

700

800

Swedish Krona (SEK in 1000's)

Figure 12: Revenue Distribution Accounting for Seasonality

The estimate of the revenue mean for the full dataset is 15000 SEK larger than the estimate when not accounting for the seasonality component in wind speed in Section 5.2. This follows from the fact that electricity prices and wind speeds follow similar seasonal patterns with increases during the day and in winter. Larger power output is then produced at times when electricity prices are higher, thereby shifting the revenue estimate upward. The standard deviation of revenues in the seasonal model is also larger as a result of accounting for the autocorrelation in wind speed. Using the out-of-sample data for wind speeds and electricity prices over the period 15 May 2007 - 14 May 2008, the potential revenue for the year, assuming an 850 kW wind turbine had been in place, is calculated to be 661650 SEK. This lies within a 90% confidence interval of the estimated revenue distribution and has a two-tailed p-value of 0.14. In comparison, using the revenue distribution, based on the mean-reversion model, of the untreated data with 10% missing values, we find this result lies outside of the 95% confidence interval with a two-tailed p-value of 0.025. Thus, by not accounting for systematic missing data, confidence intervals for expected revenues may fail to represent

6

CONCLUSION

21

the true revenue range for a given year. When incorporating seasonality to account for this missingness, the mean and standard deviation of the revenue distribution are adjusted, and the associated confidence interval more accurately represents the true point estimate and associated variability that a wind farm investor may face.

6

CONCLUSION

Investment in wind power by the private sector is essential if countries hope to reach their renewable energy production goals. This requires a good understanding of the expected returns and the variability of these returns when constructing a wind farm. This knowledge is also needed to inform government policy as to the level of subsidy that would make these investments viable. First and foremost in the procedure, though, is the confidence needed by stakeholders that revenue estimates are accurate and reliable. In this paper, missing data caused by icing is shown to bias wind resource estimates downwards by up to 3.8%, depending on the level of missingness. This, in turn, results in biased revenue estimates. This bias is relatively small in comparison to the high-volatility nature of the Nord Pool electricity spot price market, but, if an investor is assured of a fixed-price feed-in tariff, or experiences a low-volatility price regime for electricity, then the effect of missing data needs to be addressed to be confident about the resulting revenue calculations. Under a fixed-price contract, the bias found in the AEP calculations would scale up exactly, since revenue estimates would simply amount to multiplying the expected energy supply by a constant and no further variation enters into the process. By modelling wind and electricity seasonality, the method proposed in this paper not only drastically reduces the bias induced by missing data, but also increases the expected revenue estimates and variance thereof. This is a result of the common seasonality patterns experienced in both wind and electricity prices and points to the importance of incorporating a time dimension into revenue calculations when the processes underlying these calculations are affected by seasonal fluctuations. The model is not without its flaws and still requires further development. With a rapidly changing climate, a better understanding of long-term variability of wind speeds is needed, since it is not guaranteed that the true underlying wind resource will remain constant over the lifetime of a wind farm. A more refined analysis of electricity spot prices would also give stronger weight to the revenue analysis in this paper. Nonetheless, ignoring data reliability factors has been shown to have significant consequences. With this in mind, we hope that this paper has laid the foundation for further investigation into the effect of data reliability on the accuracy of wind resource and wind farm revenue estimation.

REFERENCES Anahua, E., Barth, St., Peinke, J. (2007), ‘Markovian power curves for wind turbines’, Wind Energy 11 (3), 219-232. Bierbrauer, M, Tr¨ uck, S. and Weron, R. (2004), ‘Modeling electricity prices with regime switching models’, Chapter from Computational Science - ICCS 2004, Springer, Berlin. Blanco, C. and Soronow, D. (2001), ‘Jump diffusion processes - Energy price processes used for derivatives pricing & risk management’, Commodities Now, September, 83-87

6

CONCLUSION

22

Bystr¨om, H. N. E.(2005), ‘Extreme value theory and extremely large electricity price changes’, International Review of Economics & Finance 14 (1), 41-55. Cartea, A. and Figueroa, M. C. (2005), ‘Pricing in electricity markets: a mean reverting jump diffusion model with stationarity’, Applied Mathematical Finance 12 (4). Dixit, A. K. and Pindyck, R. S. (1994), Investment under uncertainty, Princeton University Press, Princeton, New Jersey. Harstveit, K., Tallhaug, L. and Fidje, A. (2004), ‘Modelling and measuring atmospheric icing at a coastal mountain in Norway’, A study supported by the Norwegian State Power Board (Statkraft). Harvey, A. C. (1989), Forecasting structural time series models and the Kalman Filter, Cambridge University Press, Cambridge, UK. Lacroix, A., Manwell, J. F. (2000), ‘Wind energy: Cold weather issues’, University of Massachusetts at Amherst, Renewable Energy Research Laboratory. Lucia, J. J. and Schwartz, E. (2002), ‘Electricity prices and power derivatives: Evidence from the Nordic power exchange’, Review of Derivatives Research 5, 5-50. Weron, R., Bierbrauer, M. and Tr¨ uck, S. (2004), ‘Modeling electricity prices: jump diffusion and regime switching’, Physica A: Statistical and Theoretical Physics 336 (1-2), 39-48.