need for a new land valuation framework which is designed to overcome the problems ... consolidation, each landowner shall be granted a property of an aggregate ...... software programs that are able to estimate the value (usually the market ...
The assessment of land valuation in land consolidation schemes: The need for a new land valuation framework Demetris Demetriou a,b,∗ a
Land Consolidation Department, 131 Prodromou street, 1419 Nicosia, Cyprus b
School of Geography, University of Leeds, Leeds LS2 9JT, UK
Published in Land Use Policy, 54 (2016) 487–498 (paper before review) Abstract The conventional land valuation process employed in land consolidation schemes is a type of mass appraisal carried out using an empirical process based on visual inspection of all parcels involved. This presents several weaknesses regarding time, costs, transparency, accuracy, reliability, consistency and fairness. Consequently, these deficiencies have adverse effects in the preparation of land consolidation plans and cause arguments between landowners and the authorities carrying out each scheme. Although experts are aware of this issue, there is a lack of a research investigating land valuation factors actually and the quality of this traditional process. Therefore, this paper discusses, explores and assesses the land valuation process undertaken by the Land Valuation Committee (LVC) in a real case study area in Cyprus and proposes a new framework for carrying out that process. The assessment of the current process is undertaken by employing advanced spatial analysis techniques, including multiple regression analysis (MRA) and geographically weighted regression (GWR) within a GIS. Results show that eight out of fourteen land valuation factors related to parcel location characteristics, legal factors, physical attributes and economic conditions are the most significant. In addition, although basic regression fits are quite good, some of the assumptions required for testing hypothesis are not met, indicating unreliability and inconsistency in the relationships modelled. Furthermore, the presence of spatial autocorrelation reveals important regional variation in these factors suggesting significant inconsistencies of the valuation policy applied by the LVC. The latter two findings confirm experts’ concerns and suggest the need for a new land valuation framework which is designed to overcome the problems of the current process. The application of this framework and the investigation of various critical relevant issues is the core of on-going further research. Keywords: land consolidation, land valuation, GIS, multiple regression analysis, geographically weighted regression (GWR), automated valuation models (AVMs). 1. Introduction Land consolidation is considered as the most effective land management planning approach for solving land fragmentation, a problem that hinders rational agricultural development and rural sustainable development more generally (Author et al., 2012). Land consolidation, which is applied in 26 out of 28 European Union (EU) countries and in many other places around the world, consists of two main components: the reallocation of land and the provision of rural infrastructure such as roads and irrigation networks. Land reallocation (or
land readjustment), which involves the land tenure restructuring, is the most critical, complex and time-consuming part of the land consolidation process (Van Dijk, 2003; Thomas, 2006). Land reallocation relies on land valuation since its fundamental principle is that after consolidation, each landowner shall be granted a property of an aggregate value that is the same (after deducting the landowners’ land contribution for infrastructure) as the value of the property owned prior to consolidation. If the value of the holding is smaller after consolidation, equivalency can be achieved by paying financial compensation. In other words, land value is the crucial factor for the land reallocation process and hence for the success and acceptance of the final land consolidation plan (FAO, 2003). Real estate valuation is the process for estimating the current market value of a home, business, office and land plot/parcel. There is a huge literature about the appraisal of real estate property (e.g. Longley et al., 1994; Schulz et al., 2013).Valuation is based on the economic theory of consumer behavior (Hamilton and Morgan, 2010) according to the supply and demand principle, involving buyers that are competing with each other to optimize their utility (the ability to satisfy a desire defined by human needs) in the context of finite market supply. Thus, theoretically, demand and supply equilibrium defines the price of a property together with the effective purchasing power of individuals (Weber, 2004) to participate in a market and several other objective and subjective factors (Kontrimas and Verikas, 2011). However, this market price is rarely identical to reality i.e. the market value. Valuation is globally a very important (Johnstone, 2004) and risky process involving many aspects of socioeconomic life in both the developed and developing world, because it affects real estate properties valued each year at around 1,000 billion euros (Weber, 2004). Similarly, in the case of land consolidation, land valuation is a mass appraisal process of assigning monetary values (i.e. market value or agronomic value reflected by soil quality and land productivity or using a relative dimensionless score) to all parcels of the consolidated area and to all of their contents (i.e. trees, wells, buildings, et cetera). It is usually carried out by the committee implementing the project (e.g. in the Netherlands,), by an ad hoc Land Valuation Committee (LVC) in which landowners participate (e.g. in Cyprus, in Denmark), by agricultural experts (e.g. in Germany), or by a surveying engineer and two trustees (e.g. in Finland and Sweden). Experience shows that land valuation in consolidated areas faces some problems. In particular in Cyprus, it is not based on recognized standards, it is time consuming (it may take some months on a non-regular work basis) and is costly since at least five members of the LVC plus one or two technicians are involved in site visits. In addition, outcomes may present inconsistencies since the process is undertaken manually and empirically without employing systematic analytical tools for accurately measuring and comparing the land parcel attributes that define land value. As a result, the process is not standardized or transparent and it is sometimes unfair (Sipan et al., 2012) due to these weaknesses. Consequently, it can cause biased land reallocation and therefore create objections by landowners who usually compare the land value assigned to their land parcels with other similar or neighbourhood parcels. Although the weaknesses of manual valuation methods have been identified by some researchers (e.g. Jahanshiri et al., 2011), in the case of land consolidation, research papers about land valuation are rare (Yomralioglu et al., 2007) and there is a lack of a relevant assessment studies that focus on land valuation based on the spatial statistical analysis of a real full-scale project. In light of the above, this research aims to discuss, explore and assess the current land valuation process by providing scientific evidence regarding reliability and consistency and to set out the general framework for developing a new process based on automated valuation models (AVMs). For this purpose multiple regression analysis (MRA) and Geographically
Weighted Regression (GWR) are combined with a GIS and applied in a case study area in Cyprus. The basic research questions are as follows: What are the current problems with the conventional land valuation process followed in land consolidation schemes? Which are the land valuation factors that are taken into account in land consolidation schemes? What is the importance of each of these factors in the valuation process? What is the quality in terms of consistency of the land valuation carried out by the LVC? How can the weaknesses inherent in the process be overcome? The structure of the rest of the paper is as follows: section 2 deals with the conventional land valuation process carried out in land consolidation schemes with a focus on Cyprus; section 3 provides an outline of methods employed in this research, i.e. MRA and GWR; and section 4 presents the land consolidation case study area, the available data and the land valuation factors are taken into account. Thereafter, section 5 investigates the quality of work carried out by the LVC by employing the methods for developing relevant models including a sales transaction model. Section 6 suggests a new framework for land valuation with a core based on automated valuation models (AVMs). Conclusions and recommendations for further research contained in the last section (section 7). 2. The conventional land valuation process in land consolidation areas Land valuation in land consolidation areas in Cyprus is carried out according land consolidation legislation (RoC, 1989) by the Land Valuation Committee (LVC) that consists of five members: a valuator who is nominated by the Head of the Land and Surveys Department and who chairs the Committee; an agriculturalist who is nominated by the Head of the Land Consolidation Department; an officer who is nominated by the District Administrative Office of the district area concerned; and two landowners who are directly elected by the entitled landowners of the particular consolidated project. After the completion of such a valuation, the LVC prepares and publishes a list showing the value of each property together with a cadastral thematic map showing the affected area subdivided into valuation categories classifying each parcel. Any landowner in the land consolidation area interested in any property may, within 21 days of the publication of the list, make a reasoned objection to the LVC. The LVC shall examine the objection made and shall notify its decision to the objector and republish any part of this list and plan. Any person aggrieved by a decision of the LVC may, within 21 days of the notification of the decision to her/him, appeal against it to the Court. In carrying out any valuation, the LVC follows the rules set out in the Compulsory Acquisition Law (RoC, 1999) without taking into consideration any new roads constructed or planned to be constructed as part of the land consolidation measures. The basic principle of that legislation is that the value of the property should be estimated by taking into account the market value which the property, if sold on the open market on the date of the publication of the relative notice of acquisition by a willing seller, might be expected to realize. The fact that the market value is taken into account for land reallocation instead of the agronomic value (as happens in Germany, Netherlands, Greece and India) (Bullard, 2007), despite the fact that the land consolidation goal involves the promotion of rational agricultural development in agricultural zones, may sound questionable. However, it is justified in Cyprus since the extent of the land area is small and hence the availability of land is limited, and most importantly, limited housing development in agricultural zones under some conditions met is permitted (e.g. only one house cab be built on a land parcel with size of more than 4,000 square meters, if that parcel has adequate access to a registered road). Consequently, the housing prospects of agricultural land should be taken into account.
The literature suggests five main valuation methods for defining the market value (Scarrett, 2008), among which the sales comparison method is the most traditional and popular for most types of properties and is preferred both by valuators and courts (e.g. Wyatt, 1996; IAAO, 2003). The sale comparison method is based on the sales price of similar properties in the market. The International Association of Assessing Officers (IAAO, 2003; 2013a) suggests that if agricultural land is appraised on its market value and adequate sales data exist, then the sales comparison method would be preferred. However, if sales data are not adequate, then the income approach would be more appropriate. The income approach is based on the amount of the net income the property may produce. In addition, if agricultural land includes improvements, then both the income and sales methods could be employed, with the former most suitable when land is far from urban areas (IAAO, 2003), which is the case for most land consolidation areas in Cyprus. However, in many cases, there are not an adequate number of sale transactions within land consolidation areas. Therefore, the LVC initially analyzes the sales transactions (provided by the Land and Surveys Department) within the land consolidation area for the last couple of years and tries to figure out a range of minimum and maximum land values by comparing the land parcels’ physical and legal characteristics with those parcels for which a sale value is available. In practice, the five members of the LVC visit every unique parcel within the consolidated area to carry out this comparison process to assign land values. Consequently, this comparison is a result of an empirical analysis and human judgement and not an outcome of a robust standardized analysis using appropriate tools such as a GIS. The latter is emphasized by the FAO (2002), stating that the critical point in valuation is not the valuation method followed but the method of analysis utilized; hence, if analysis is successful and accurate (e.g. through a GIS), then it will be reflected in the method of valuation. In addition, the process takes a long time, i.e. for the case study area noted later, valuation took 25 working days for an area size of 266 hectares and including approximately 500 land parcels. Thereafter, land values may involve several inconsistencies (as shown later) and transparency is limited since in many cases it cannot be analytically explained to landowners why a parcel was assigned a certain land value. In addition, the process is costly since at least six persons work for more than a month and there are external costs such as transportation. Thus, clearly. there is a need for a new more efficient and reliable land valuation process that will involve using AVMs within a GIS environment. This trend, i.e. the integration of AVMs with broader decision-making platforms (especially with GIS) has been emphasized as an area of innovation to support faster and cost-effective related decisions (CML, 2007). 3. Modelling methodologies This section outlines the two methods utilized for exploring and assessing the quality of work of the LVC: hedonic regression modeling (i.e. MRA) and geographically weighted regression (GWR) which is a variation of the former. It should be noted that the quality assessment of the work of the LVC is undertaken by utilizing MRA because it represents the way the LVC carries out valuations in practice. In particular, the LVC uses the sales comparison method which involves the comparison of a subject property's characteristics with those of comparable properties which have recently been sold in similar transactions. Thus, the LVC empirically adjusts the prices of the comparable transactions according to the presence or absence of characteristics or the extent to which characteristics influence value. All sales comparison approach methods are variations of hedonic-type measurements, which determine the value of something as the sum of the value of the various components which contribute utility. Therefore, based on these facts, the land values assigned by the LVC are expected to be approximately linear.
3.1 Hedonic regression modelling In alignment with the popularity of residential real estate properties, hedonic regression modelling, i.e. linear multiple regression analysis (MRA) is the oldest statistical calibration methodology that has been utilized for estimating property values and has gained the most attention since 60s early 70s (Smith, 1971) until nowadays (e.g. Eckert, 2006; Schulz et al., 2013). Hedonic regression modelling based on multiple regression analysis (MRA) is one of the most well-known statistical approaches with huge range of applications especially for prediction and forecasting. It involves the estimations of relationships between a dependent variable and one or more independent variables. The general regression model relates Y to a function of X and β, as follows: (
The dependent variable is Y whilst the independent variables denoted by X. The unknown parameters, denoted as β, may represent a scalar or a vector. The development of the above function involves a general multiple regression model with p independent variables as shown in the equation below: (2) where Xij is the ith observation on the jth independent variable, and where the first independent variable takes the value 1 for all i (so is the regression intercept). is an error term for observation i. Whilst MRA is a well-defined and widely accepted process and involves a series of easily interpreted goodness-of-fit statistics, it presents some deficiencies (IAAO, 2003). In particular, it ignores spatial autocorrelation (discussed later) and spatial heterogeneity (Jahanshiri et al., 2011), hence specific investigation needs to be carried out within GIS. In addition, MRA requires an adequate sample size that may be large and it cannot adequately represent non-linear relationships even with transformed data. Moreover, the performance of MRA depends on the satisfaction of four main assumptions (Norusis, 2005) related to the residuals, i.e. the difference between the actual market value and the predicted value: (i) normality; (ii) constant variance (homoscedasticity); (iii) linearity; and (iv) independence, which sre discussed later in the application of the models. 3.2 Geographically weighted regression Spatial dependency is the co-variation of spatial feature attributes within geographical space, that is, attributes of proximal features are more likely to have values that are more similar (positive correlation) and attributes of features which are far apart from each other are more likely to have values that are less similar (negative correlation). The degree of this dependency called spatial autocorrelation (SA). Thus, there is accordingly, either positive or negative SA which is measured by employing specific indicators such as Moran’s I, Geary’s C and Getis’s G, among which the former is the most popular. SA frequently occurs in many observations in geographic spaces such as in ecological, morphological and geological datasets and it is a natural result of environmental processes; hence SA does not inherently constitute a problem. However, in statistical modelling such as MRA, when SA occurs in the residuals between model predictions and actual data, then this
is problematic because it violates the independence assumption of MRA and hence the standard significant tests are unreliable. In particular, the standard error of the coefficients is underestimated and the model is misspecified in that some key independent variables are missing or some variables are redundant, i.e. they exhibit multicollinearity. Therefore, when there is SA in the residuals from the spatial regression then an alternative method is Geographically Weighted Regression (GWR). In addition, GWR may be used when the regression coefficients unexpectedly vary over space, i.e. the coefficients (and the relevant explanatory variables) are not stationary (heterogeneity) and global in the study area concerned. GWR is a linear model of MRA which involves a separate equation for every feature or location (Brunsdon et al., 1996) in the dataset and it should be applied for best results when several hundred features are involved. Thus equation 2 can be transformed to: (3) where
is the value of the
parameter of feature or location .
In contrast to the ordinary least squares (OLS) method utilized for calibrating MRA models, GWR employs a weighted least squared approach; that is, a different emphasis can be placed on different observations, based on its proximity to i, for estimating coefficients’ parameters. GWR may produce many different local models for a study area, which is divided into neighborhoods called windows (Jahanshiri et al., 2011). 4. Case study 4.1 The land consolidation area Choirokoitia is a village in the Larnaca District of Cyprus, which is well-known for its UN World Heritage archaeological site, dating from the Neolithic age (around 6000 BC). Choirokoitia is located around 33 kilometers southwest of Larnaca town (Figure 1a). The village is built on a hill with an average level of 230 meters and the land consolidation area is located northwest of the village (Figure 1b) in lowland with limited hills and has an average height of 186 meters above sea level. The land consolidation area is included as part of an agricultural zone while on the east it almost coincides with the F112 main road (Figure 1b) that connects some of the mountainous villages of the District of Larnaca with the main A1 motorway (Figure 1b) of Cyprus that connects the two largest cities of Cyprus i.e. Nicosiawith Limassol (Figure 1a). The land consolidation area is 266 hectares and the land use is mainly citrus, olives, various fruit trees and cereals. Some land parcels are irrigated through individual drillings and several land parcels are irrigated through a network connected to a water reservoir.
Land Consolidation Area
Figure 1. The location of Choirokitia village on the map of Cyprus (a) and; the approximate location of the land consolidation area (b) The land consolidation area originally involved 488 land parcels with a minimum, maximum, mean and standard deviation of size of, 10m2, 47,826 m2, 5,456 m2 and 5,753, respectively. The base of the GIS data model is a digitized cadastral map (Figure 2) provided as a hardcopy at a scale of 1:5,000, showing the original cadastral situation before land consolidation in terms of land parcels. This map is joined with a series of databases containing information regarding landowners, ownerships and land parcels. In addition, it was provided in hardcopy form together with the official land valuation map (prepared by the LVC) with an associated catalogue, a soil map and a zoning map which were digitized, georeferenced and joined with the base map. Furthermore contours were digitized from a map at a scale of 1:5,000 with a contour interval of 4 meters. All these data were built within ArcGIS 10.0. The land valuation in the area was carried out by the LVC periodically from October 2008 until February 2009. The highest land value has been defined in €35,000 per decare (1,000 m2) and the lowest in €2,000. Land values were grouped in the official LVC map in 26 categories. It should be noted that the LVC separately valued any constructions included within the land area, i.e. a house, a farmstead, a drilling, a fence and any isolated trees included in the land, e.g. large olive and carob trees. In addition, the LVC added a standard extra value to irrigated land parcels and to land parcels including an organized plantation (and not isolated trees), e.g. an orchard with citrus or olives.
Figure 2. The land consolidation case study area 4.2 Land valuation factors After considering a series of land valuation factors employed in other studies dealing with agricultural land (Clark, 1973) and taking into account the prevailing conditions of the land valuation process within land consolidation areas, 14 land valuation factors were considered with the relevant models. According to Wyatt (1996), who investigated classifications suggested by various authors, these factors can be grouped in two major categories: internal and external in relation to the property. Each category can be split further in two subcategories: physical attributes and legal factors for the former and locational characteristics and economic conditions for the latter. In particular, each category includes the following factors (with the variable name in parentheses): physical attributes include: size (size); shape (shape); slope (slope); elevation (elevation); aspect (aspect); existence of a stream (stream); and soil type (soil); and for legal factors, there is the existence of irrigation rights (irrigation) for a parcel. Locational characteristics include: access through a registered road (access1); access through a registered pathway (access2); the distance from residential zones (zone); the distance from the main road that connects the neighborhood villages with the motorway (main_road); and the existence of sea view (sea_view). Economic conditions include landuse/productivity (land_use) for the agricultural economic potential of a parcel and the generic prevailing socioeconomic conditions of the country concerned are represented by the Purchasing-Power Parity (PPP). It should be noted that the latter factor is taken into account only for building the sales model that involves transactions over several years. Internal factors are also important in defining land values. In particular, it is reasonable to include the size of land parcels since normally the larger the size of a parcel the lower the land value per decare for parcels with the same attributes. Similarly, the shape of land parcels is critical because it encourages rational agricultural development by facilitating agricultural
mechanisation, cultivation and harvesting as well as the exploitation of parcels in general (e.g. Lee and Sallee, 1974). For validating the shape of parcels a recently developed index called the parcel shape index (PSI) is employed (Author et al., 2013) (PSI) which integrates six geometric parameters (length of sides, acute angles, reflex angles, boundary points, compactness and regularity). This index takes values between 0 (denoting the worst or most irregular shape) and 1 (indicating the best or most rectangular shape with a length:breadth ratio defined as 2:1 as best for agricultural purposes in land consolidation areas). The slope measures the steepness of the surface at any location and considerably affects management costs in terms of machine power needed for land preparation and harvesting and extra costs for erosion control (FAO, 2003). It is measured as the mean slope (in percentage) of a parcel. In addition, the elevation of a parcel from sea level is a factor affecting the kind of crops that can be cultivated and the time when they are ready for harvesting, hence their price is relevant. Aspect, which is measured clockwise in degrees (i.e. 0 north, 90 east, 180 south and so on), refers to compass direction that a hillside or slope faces. As a result, aspect defines the degree to which sunlight strikes a hillside and hence can be critical in agricultural production processes. The presence of a stream within a land parcel is a negative factor both in terms of soil erosion and wasted land for cultivation. Soil is also a major component in agriculture since different types of soil present different properties and hence a varying degree of suitability for a wide spectrum of crops. Based on the soil map provided by the Geological Survey Department, the case study area encompasses two soil types: Skeletic-calcaricREGOSOLS and calcaric-lithic-LEPTOSOLS represented by the letter “A” and calcaricCAMBISOLS and calcaric-REGOSOLS represented by the letter “B”. Irrigation rights either through individual drills or public irrigation networks connected with a water reservoir represent a vital issue, especially in the draught climate of Cyprus. Further to the noted internal factors, the external factors and especially the locational characteristics are a fundamental issue in any valuation. Namely, access1 provides a legal way to reach a land parcel which is usually (in Cyprus) a serious problem for many farmers. In addition, the existence of such access provides, under certain complementary provisions, the right to build a house. Similarly, but with a significantly lower importance, access2 (which in most cases involves a soil road as well) is also a positive element that needs to be considered. The other three locational factors, i.e. zone, main_ road and sea_view, reflect the potential of a land parcel to be utilized for building a cottage house and the future potential in terms of whether an agricultural zone can be converted into residential zone. The agricultural economic potential of a parcel is reflected in the current land use of a parcel and the expected production revenue for a certain type of crop (land use). Such figures, i.e. expected net revenue per decare for various crops, have been provided by the Agricultural Department of Cyprus. After considering various potential socio-economic factors such as the gross domestic product (GDP), unemployment rate, cost of goods and services, and the development rate, it was found that PPP may adequately represent the prevailing socioeconomic conditions in a country that eventually define the values of properties relating to the parcel attributes and characteristics. PPP, which originally developed as a theory of exchange rate determination among countries, is often utilized to compare living standards across countries (Lafrance and Schembri, 2002). Although all the aforementioned factors are relevant to land valuation, the selection of which of them needs to be involved in the automated valuation modelling process is very critical and laborious, as discussed in the next section. All the above factors for the case study area (except PPP) are represented via GIS thematic layers shown in Figure 3 (continuous variables) and Figure 4 (categorical variables).
5. Evaluating the work of the Land Valuation Committee 5.1 The LVC model The scores for all land valuation factors were automatically extracted from ArcGIS (through some programming routines) and passed to IBM SPSS 21.0. In order to investigate and assess the quality of land valuation carried out by the LVC, a multiple regression model was run using the stepwise method which is the most popular for regression model building. It combines two other methods (Norusis, 2005), i.e. forward selection and backward elimination of variables. In other words, it begins like a forward selection by entering a variable into the model and it removes any variables already in the model that are no longer significant predictors. The regression summary statistics are very good, namely R=0.888, R2=0.789 and adjusted-R2=0.786 (as shown in graphically in Figure 5b as well), although there ares a number of extreme values, i.e. outliers (shown in Figure 5a) that may not be justified. Among the 14 independent variables initially included in the model, eight variables remained in the final model: zone, access1, irrigation, access2, slope, size, sea_view and land_use. Among these variables the most important taken into account by the LVC (based on the statistical significance) are (in order of importance) for each parcel: distance from residential zones (zone), access through a registered road (access1), access through a pathway (access2), mean slope (slope), the availability of irrigation (irrigation) and the size (size). Each coefficient for a variable reflects how much the value of the dependent variable changes when the value of that independent variable increases by 1 and the values of the other independent variables do not change. Each estimated coefficient of independent variables has a positive or negative sign suggesting the correlation with the dependent variable. In other words, a negative sign means that the predicted value of the dependent variable decreases when the value of the independent variable increases whilst; a positive sign suggests that the predicted value of the dependent variable increases when the value of the independent variable increases. In this model all signs are reasonable. In particular, access1, irrigation, access2, sea_view and land_use each as a positive sign while zone, slope and size of parcel each has a negative sign.
Figure 3: The continuous land valuation factors of the case study area:size (a), shape (b), slope (c), levation (d), aspect (e), zone (f), main_road (g).
Figure 4: The categorical land valuation factors of the case study area: stream (a), soil (b), irrigation (c), access1 and access2 (d) sea_view (e) and land_use (f).
Figure 5. The distribution (a) and the normal P-P Plot (b) of standardized residuals including outliers In addition, all variables have a significance value of zero except sea_view and land_use which have values slightly above zero, i.e. 0.038, indicating that they are slightly less important than all the other variables. This finding is indirectly confirmed by the minutes of the LVC where it is noted that it took account all the land valuation factors eventually
included in the model except sea_view. Accordingly, the LVC seems to have paid less importance to the other factors such as streams, elevation, shape, main_road, aspect and soil type. This outcome also seems reasonable because the aim of the LVC was to estimate the market value of land parcels which is definitely less defined by the agricultural related land valuation factors included in the later list. As a result, it can concluded that all 14 variables need to be considered (if applicable) for land valuation in consolidated areas. However, the importance of each factor strongly depends on the specific prevailing socioeconomic conditions in the country concerned and more specifically the local real market conditions and the consolidated area in particular. Although the above results suggest that land values assigned by the LVC are quite good in terms of fitting to a straight line, on the other hand they do not inherently mean that the relationships modelled are reliable and consistent. In particular, when fitting a regression line, the four main assumptions needed for testing hypothesis (Norusis, 2005) are: normality of the distribution of each value of the independent variables against the values of the dependent variable; constant variance (homoscedasticity) between the dependent variable and all values of the independent variables; independence of observations; and linearity of the relationship between the dependent variable and all independent variables. Examining data for violations against these assumptions is important since many regression tests such as significance levels and confidence intervals are sensitive to certain types of violations. All the four assumptions can be checked by examining the residuals. In the LVC regression model, some of the assumptions are not met; normality is not confirmed by both statistical tests provided by SPSS that is, Kolmogorov-Smirnov and Shapiro-Wilk, with significance in both cases of 0.00. Thus, although the distribution of residuals presents a normal form (Figure 5a) (and standard deviation is very close to one, i.e. 0.99), it involves some outliers shown in the boxplot within the same figure. Homoscedasticity is also not met as indicated in Figure 6 because the scatterplot of studentised residuals against the standardized predicted value presents a pattern, i.e. a trend for a funnel shape meaning that the variability of residuals is decreasing with the increasing of predicted values. Similarly, linearity is not met for slope and size variables as indicated in Figure 7.
Figure 6: The scatterplot of studentised residuals against the standardized predicted value
Figure 7: Scatterplots of slope (a) and size (b) variables against the dependent variable Independence of residuals is marginally confirmed by the Durbin-Watson test that equals 1.52, suggesting initially no correlation. A value between 1.5 - 2.5 is acceptable (Norusis, 2005). It is noted that the Durbin-Watson test may take values from 0 to 4, hence values close to 2 involve no correlation. However, the latter test suggests that we have to check for potential spatial autocorrelation of residuals that cause this unreliability in the significance tests. Furthermore, statistics provided by the OLS tool of ArcGIS that is, Koenker (BP) (19.46) and Jarque-Bera (37.65) are statistically significant, suggesting that the relationships in the model are not consistent (either due to non-stationarity or heteroscedasticity) and the model outputs are biased, respectively. Based on the above findings the SA tool of ArcGIS was used for exploring this issue. Figure 10 shows the critical z-score and the associated p-value for 90%, 95% and 99% confidence levels (vertical segments in both tails of the normal distribution). Both p-value and z-scores are associated with the standard normal distribution as shown in Figure 8.
Figure 8: p-value and z-scores in the normal distribution for 90%, 95% and 99% confidence levels.
The p-value is a probability showing the randomness of the spatial pattern created. When the p-value is very small (i.e.