Enhancing Operations with Spatial Access ... - Semantic Scholar

9 downloads 132580 Views 313KB Size Report
The authors wish to thank anonymous reviewers for carefully ... (e.g., urban and regional planning, cadastrial registration, archaeology, natural ... The design and implementation of a DBMS for the application domain of GIS is an active area of.
Cartography and Geographic Information Systems, Vol. 25, No. 1, pp. 16-32.

Enhancing Operations with Spatial Access Methods in a DBMS for GIS

EMMANUEL STEFANAKIS and TIMOS SELLIS Department of Electrical and Computer Engineering, National Technical University of Athens Zographou, Athens, GREECE 15773, E-mail: {stefanak, timos}@cs.ntua.gr,

Abstract. Much attention has been devoted in the past to support classes of applications which are not well served by conventional database systems. Focusing on the application domain of geographic information systems (GIS), several architectural approaches have been proposed to implement commercial or prototype systems and satisfy the urgent needs for geographic data handling. However, those systems suffer from several limitations, because either they perform lots of processing on an application layer, which lies on the top of the database management system (DBMS), or the underlying data models are not rich enough to represent the spatial dimension of geographic entities. This study examines the spatial operations that should be provided by a DBMS for the application domain of GIS and focuses on the various techniques which may be used support the efficient execution of both simple operations and composite procedures that involve the spatial dimension of geographic entities.

Keywords. GIS operations, spatial access methods, spatial query optimization, database management systems.

Acknowledgments Emmanuel Stefanakis and Timos Sellis are partially supported by a research grant from the General Secretariat of Research and Technology of Greece (YPER’94) and by the European Commission funded TMR project CHOROCHRONOS. The authors wish to thank anonymous reviewers for carefully reading the manuscripts and providing useful comments.

Author identification information. Emmanuel Stefanakis (Dipl.Eng., M.Sc.E., Ph.D.) is a Research Engineer in the Department of Electrical and Computer Engineering at the National Technical University of Athens. Timos Sellis (Dipl.Eng, M.Sc., Ph.D.) is a full Professor in the Department of Electrical and Computer Engineering at the National Technical University of Athens.

1. Introduction Geographic Information Systems (GIS) are computer-based systems designed to support the capture, management, manipulation, analysis, modeling and display of spatially referenced data at different points in time (Aronoff 1989). Today, GIS are widely used in many government, business and private activities; which fall into three major categories (Maguire et al. 1991): a) socio-economic applications (e.g., urban and regional planning, cadastrial registration, archaeology, natural resources, market analysis, etc.); b) environmental applications (e.g., forestry, fire and epidemic control, etc.); and c) management applications (e.g., organization of pipeline networks and other services, such as electricity and telephones, real-time navigation for vessels, planes and cars, etc.). The role of GIS in these applications is to provide the users and decision-makers with effective tools for solving the complex and usually ill- or semi-structured spatial problems (Hopkins 1984, Abiteboul 1997), while providing an adequate level of performance (Armstrong and Densham 1990, 1994, Densham 1991, Armstrong et al. 1992). GIS are enormously complicated software systems. A GIS should be supported by an operating system, and depends on the presence of a graphic package for input and output, routines supplied by the programming language in which the GIS is written, and numerous other software products (Goodchild 1991). Many of the more powerful contemporary GIS are designed to rely on a Database Management System (DBMS), relieving developers of many of the common data housekeeping functions. However, traditional DBMS have only dealt with alphanumeric domains and have proved not to be suitable for non-standard applications, such as GIS, which are characterized by more complex domains. The design and implementation of a DBMS for the application domain of GIS is an active area of research (Maguire et al. 1991, Muehrke 1990, Egenhofer 1991, Mueller 1991, Kanakubo 1993, Worboys 1995). Apart from the powerful data models for representing the spatial dimension of geographic data, such a system should provide a basic set of operations for geographic data handling as well as mechanisms for efficient execution of both simple operations and composite procedures at the physical level. The purpose of this study is to examine those issues extensively. The discussion starts by presenting the architectural approaches adopted in the past towards the development of commercial and prototype systems as well as their major inconveniences (Section 2); and proceeds with a reference to the requirements for modeling the spatial dimension of geographic data (Section 3). In the sequel, a classification of operations that should be provided by a DBMS for the application domain of GIS is given (Section 4). The discussion is then focused on the techniques available to database developers towards the efficient execution of spatial operations (Section 5). Specifically, spatial access methods (SAM) organizing the minimum bounding rectangle (MBR) approximations of spatial objects are considered and it is shown how they can be used to support the execution of different classes of spatial operations. Finally, the execution of composite procedures consisting of a series of operations and involving both the spatial and non-spatial dimensions of geographic data is considered and different techniques (i.e., heuristic rules, systematic cost estimate, precomputed attribute values and relationships) for planning the execution strategy are examined (Section 6). The discussion is concluded by summarizing the contributions of the paper and giving hints for future research in the area of DBMS for the application domain of GIS (Section 7).

2. Architectural Approaches Four architectural approaches have been adopted generally, to implement commercial or prototype systems and satisfy the urgent needs for operational spatial data handling. These approaches are (Aronoff 1989, van Oosterom and Vijlbrief 1992, Samet and Aref 1995, Stefanakis and Sellis 1996b): − Single conventional DBMS: In this approach (Figure 1a), both spatial and non-spatial data are represented in tabular form in the pure relational model. Operators needed to define and manipulate spatial entities are contained in an application layer which is built on the top of the conventional DBMS.

2

− Partial conventional DBMS: In this approach (Figure 1b), a conventional DBMS is used to represent thematic information associated to spatial entities, while a separate subsystem is used to handle spatial data. Operators needed to define and manipulate spatial entities are contained in an application layer which provides a uniform interface to both subsystems. This architecture is adopted by most commercial GIS packages: e.g., Arc/Info (ESRI 1989), MGE (Intergraph 1990), SICAD (Siemens 1987). − Extended conventional DBMS: In this approach (Figure 1c), a conventional DBMS is modified to also support GIS application domains. This is accomplished by adding new constructs to a conventional DBMS so as to enhance its modeling power and provide better support for spatial applications. These new constructs include support for abstract data types (ADT), procedural fields, complex objects, composite attributes, and so on. Additional operators needed to define and manipulate spatial entities are contained in an application layer which lies on top of the extended DBMS. Several prototypes have been implemented using this approach (Aref and Samet 1991a, Haas and Cody 1991, van Oosterom and Vijlbrief 1991). − Object-Oriented DBMS: In this approach (Figure 1d), the object-oriented model is used as a basis for the application domain of GIS. The concepts of the underlying model (e.g., classes, inheritance, encapsulation, types, methods) are very convenient. Any additional operators needed to define and manipulate spatial entities are contained in an application layer which lies on top of the object-oriented DBMS. Several prototype systems adopt this approach (Abel 1989, Scholl and Voisard 1992, David et al. 1993). Figure 1: Architectural approaches. The role of the application layer in all approaches above is to supplement the set of capabilities offered by the underlying system architectures, so that the operational needs for spatial data handling are satisfied. In other words, the application layer provides the set of GIS operations which are not available in the underlying DBMS. Obviously, this set varies from one software package to another and heavily depends upon the system architecture. The execution of operations within the application layer is frequently accompanied by up-/down-loading of voluminous data (notice that spatial data are usually voluminous) from/to the database. This is an expensive task which should be avoided in a production environment. What is usually suggested as an alternative solution is pushing the operations of the application layer into the DBMS (Stefanakis and Sellis 1996b, Stefanakis 1997). Although several commercial and prototype systems have been extensively used to support a wide variety of geographic applications (Maguire et al. 1991), they suffer from several limitations, which render them inefficient tools for geographic data handling (Stefanakis and Sellis 1996a, 1997). Systems based on a single conventional DBMS adopt the relational model, which is not rich or powerful enough to model the structural complexities of geographic data. In addition relational algebra does not offer the expressive power to support spatio-temporal operations. Systems based on a partial conventional DBMS organize the spatial and non-spatial component of geographic data into separate databases and consequently processing is mostly performed on the application layer. Systems based on an extended conventional DBMS or an object-oriented DBMS are more promising. However, prototype systems developed in the past either perform lots of processing on the application layer, or the underlying data models do not accommodate all dimensions of geographic data (i.e., thematic, spatial, temporal, quality and multimedia dimensions).

3. Spatial Data Modeling A DBMS for the application domain of GIS should provide two alternative views for representing the spatial component of geographic information (Peuquet 1984, Smith et al. 1987, DCDSTF 1988, Gatrell 1991, Mueller 1991, Worboys 1994, 1995, Gueting 1994). Those views are: 1. objects in space (or single spatial objects); and 2. space (or spatially related collections of spatial objects). Spatial objects are defined as of a set of locations together with a set of properties characterizing those locations. The basic abstractions of these objects in 2-D space are: point, line, and region. On the

3

other hand, the space is defined as a set of objects to which may be attached associated attributes (properties) together with a set of relationships defined on that set. The basic abstractions for spatially related collections of spatial objects are: partitions, networks, nested partitions, and digital terrain (elevation) models. This study adopts a model first introduced by Tomlin (1990), which treats in an elegant manner both single and related collections of spatial objects; and can be used to define spatial operations independently on the fundamental models available in literature (Aronoff 1989, Worboys 1995): the vector model and the raster model. In this general model geographic information can be viewed as a hierarchy of data (Tomlin 1990, Samet and Aref 1995). At the highest level, there is a library of maps (more commonly referred to as layers), all of which are in registration (i.e., they have a common coordinate system) (Figure 2a). Each layer is partitioned into zones (regions), where the zones are sets of individual locations with a common attribute value (Figure 2b). Examples of layers are the land-use layer, which is divided into land-use zones (e.g., wetland, river, desert, city, park and agricultural zones) and the road network layer, which contains the roads that pass through the portion of space that is covered by the layer.

4. GIS Operations There is no standard algebra defined on geographic data. This means that there is no standardized set of base operations for geographic data handling. Several taxinomies of GIS operations can be found in the literature (Tomlinson and Boyle 1981, Dangermond 1983, Berry 1987, Rhind and Green 1988, Tomlin 1990, Giordano et al. 1994), with different structure, scope, detail and internal consistency (Giordano et al. 1994). The set of operations available in GIS vary from one system to another and heavily depends on the application domain. However, their fundamental capabilities can be expressed in terms of four types of operations (Tomlin 1990, Giordano et al. 1994): − Programming operations: They consist of a number of routines in the operating system level, such as supervise and direct the system operations and control the communication with peripheral devices connected to the computer. − Data preparation operations: They encompass a variety of methods for capturing data from different sources (e.g., digital or paper maps, land measurements), processing and storing them appropriately in the database. − Data presentation operations: They encompass a variety of methods for presentation of data, such as drawing maps, drafting charts, generating reports, and so on. − Data interpretation operations: These operations transform data into information and as such they comprise the heart of any geographic information system. Consequently, the discussion that follows focuses on them. Data interpretation operations available in GIS characterize (Aronoff 1989, Tomlin 1990, Samet and Aref 1995): − individual locations, − locations within neighborhoods, and − locations within zones, and constitute respectively the three classes of operations, i.e., local, focal and zonal operations. All data interpretation is done in a layer-by-layer basis. That is, each operation accepts one or more existing layers as input (the operants) and generates a new layer as output (the product), which can be used as operant into subsequent operations (Figure 2c). Figure 2: The data model and data interpretation operations.

4

The first class of data-interpreting operations (local operations) includes those that compute a new value for each location on a layer as a function of existing data explicitly associated with that location (Figure 2d). The data to be processed by these operations may include the zonal values associated with each location on one or more layers. Local operations include: − Classification and recoding operations, i.e., assignment of new attribute values (based on mathematical transformations or look up tables) to individual locations on a layer. − Generalization operations, i.e., reduction of detail associated with individual locations on a layer. − Overlay operations, i.e., assignment of new attribute values to individual locations resulting from the combination of two or more layers. This type of operations is analogous to join operations in conventional database systems and are usually termed spatial join operations. Focal operations compute new values for every location as a function of its neighborhood (Figure 2e). A neighborhood is defined as any set of one or more locations that bear a specified distance and/or topological or directional relationship to a particular location (or set of locations in general), the neighborhood focus. Focal operations include: − Neighborhood operations, i.e., assignement of new attribute values to individual locations on a layer, which depict their distance, topology or direction in a neighborhood with respect to the neighborhood focus. − Interpolation operations, i.e., assignment of new attribute values to individual locations on a layer derived by averaging sets of two or more target values associated to selected locations in their immediate or extended vicinity. − Surfacial operations, i.e., assignment of new attribute values to individual locations on a layer indicating their surfacial characteristics (slope, aspect, volume, etc.). − Connectivity operations, i.e., assignment of new attribute values to individual locations on a layer derived from a running total of the results being retained in a quantitative or qualitative step-bystep fashion and considering the values associated to locations in the immediate or extended vicinity (optimum path finding, intervisibility, etc.). The third and final class of data-interpreting operations (zonal operations) includes those that compute a new value for each location as a function of existing values associated with a zone containing that location (Figure 2f). Zonal operations include: − Search operations, i.e., retrieval of information characterizing individual locations on a layer that coincide with the zones of another layer (i.e., the mask or filter layer). This type of operations is analogous to selection operations in conventional database systems and are usually termed spatial selection operations. − Measurement operations, i.e., assignment of new attribute values to individual locations on a layer that correspond to a measurement (e.g., area, length) characterizing their zones. Table 1 summarizes the basic classes of data interpretation operations accompanied by representative examples (Aronoff 1989, Tomlin 1990). Notice that data interpretation operations may be combined to compose one or more procedures (a procedure is any finite sequence of one or more operations that are applied to meaningful data with deliberate intent; Tomlin 1990) and accomplish a composite task posed by the spatial decision making process. A simplified example for the task of site selection for a residential housing development can be found in (Stefanakis and Sellis 1996b). The basic approach to this is to create a set of constraints, which restrict the planning activity, and a set of opportunities, which are conducive to the activity. The procedure of site selection, is based on the sets of constraints and opportunities, and consists of a sequence of operations which extract the best locations for the planning activity. Notice that a more advanced decision making process can be achieved by incorporating fuzzy logic methodologies into data interpretation operations (Stefanakis et al. 1996, 1997, Stefanakis 1997).

5

Table 1: Basic classes of data interpretation operations. Classes of Operations Local Operations • Classification & recoding • Generalization • Overlay (spatial join) Focal Operations • Neighborhood − window & point queries − topological − direction − metric (distance) & buffer zones − nearest neighbor • Interpolation − location properties − thiessen polygons • Surfacial − visualization − location properties • Connectivity − routing & allocation (network) − intervisibility Zonal Operations • Mask queries (spatial selection) • Measurement

Examples of Operations

re-code, re-compute, re-classify generalize, abstract overlay, superimpose

zoom-in, zoom-out, point-in-polygon disjoint, meet, equal, contains, inside, covers, overlap north, north-east, weak-bounded-north, north-south near, about, buffer, corridor, thinning nearest-neighbor, k-nearest-neighbors point-linear, (inverse) distance-weighted thiessen-polygons, voronoi-diagrams contours, TINs height, slope, aspect, gradient optimum-path-finding, optimum-routing, spread, seek visible, light-of-sight, viewshed, perspective, illumination select-from-where, retrieve distance, area, perimeter, volume

5. GIS Operations and Spatial Index Structures Data interpretation operations fall into two categories depending on whether it is worthwhile to be supported by a spatial index structure or not. These categories are: 1. Spatially-indexed operations: All operations whose execution involves at least one spatial predicate and their execution can be supported by a spatial index structure. The role of this structure is to facilitate the selection of those database objects that satisfy the spatial predicate. 2. Non-spatially-indexed operations: All other operations, which either involve thematic predicates only and can be supported by traditional index structures, such as B-trees, or they are computational with a sequential scan nature on database objects. Example operations for the first category are neighborhood and overlay operations. On the other hand, recoding, generalization, and routing are example of non-spatially-indexed operations. Notice that some classes of operations can be partially supported by a spatial index structure. This occurs because those operations are composite and consist of more than one simple operations, which might be spatially-indexed. For instance, if the interpolation method is based on k-nearest neighbors (where k a very small number compared to the amount of database objects), a spatial index may support the retrieval of these neighbors (Section 5.2). Multidimensional access methods constitute a large class of access methods to support searches in spatial database systems. These access methods are distinguished between point access methods (PAM) and spatial access methods (SAM). PAM have been designed to support spatial searches on point databases; while SAM are able to manage objects with a spatial extension, such as lines or regions. This study focuses mostly on SAM, because geographic databases are designed to manage extended objects in general. A large variety of SAM have been suggested in the past (Peucker and Chrisman 1975, Guenther 1988, Samet 1990, Gaede and Guenther 1995). Each method organizes the space and the objects in it in

6

some way, so that only parts of the space and a subset of the database objects need to be considered to support the execution of a spatial operation. Hence any SAM is tailored for a particular spatial data model and is designed to support one or a limited number of operations. In the sequel, the discussion is confined by the following assumptions, which are required in order to avoid an exhaustive consideration of an unlimited number of combinations of spatial data models, object abstractions and operations involved. These assumptions are: − The database consists of objects with a spatial extension. Consequently, neither one-dimensional access methods, such as B-tree, nor point access methods (PAM), such as grid-file, are capable to support indexing. − Spatial objects are represented in the index structures by their minimum bounding rectangle (MBR) approximation. Therefore, SAM organizing MBR are considered only. In the first place, SAM have been introduced to support the basic spatial operations of point and window (range) search in a set of spatial objects. Given a set of geometric objects in n-dimensional Euclidean space En, a window search computes all those objects that overlap a given search space S subset of En. The point search can be viewed as a degenerated window search, which computes all objects containing the given search point P of En. Later on, the need to support other types of spatial operations, such as topological or direction relations, forced computer scientists to find new methods to support spatial searching. An easy solution was to express these types of operations by one or a set of window queries (e.g., Papadias et al. 1995, 1997). For instance, in order to find out all European countries which are north of Greece (i.e., direction relation north of) a window search is performed on the map of Europe (i.e., the database of Europe) with the bottom side of the rectilinear rectangle tangent to the most north boundary of Greece. However not all classes of spatial operations can be supported by spatial index structures. For example, the optimum path finding in line networks constitutes an operation which cannot be supported completely by an SAM, provided that all edges of the network should be considered in general (Sedgewick 1990). The following Subsections concentrate on spatially-indexed operations. The discussion starts by presenting some details on SAM organizing MBR approximations and the techniques they adopt (Subsection 5.1). Then it proceeds with an examination of how data interpretation operations may be supported by SAM organizing MBR. Specifically, it presents different SAM proposed in the past to support various spatial operations and provides suggestions for other data interpretation operations (Subsection 5.2). Finally, it states the research directions in the area of spatial data structures (Subsection 5.3). 5.1 Spatial Access Methods Numerous SAM which are based on the abstraction of a complex spatial object by its minimum bounding rectangle (MBR) have been proposed in the past. The MBR is the smallest n-dimensional rectangle, with sides parallel to the axes of the n-dimensional space, which contains entirely the spatial object it represents (Figure 3). The most important characteristic of this simple abstraction is that the essential geometric properties of an object, i.e., its position in space and its extents along each axis, are preserved, with the cost of a limited number of bytes (the two opposite corner points are adequate to describe the object MBR). On the other hand, the MBR approximations are very rough, since they are usually accompanied by large “dead spaces” (the dead space refers to the disjoint area between a spatial object and its MBR approximation) and spatial indexing based on MBR is sensitive to orientation (i.e., the set of MBR used to represent the spatial objects depends on the orientation of the axes in the coordinate system). Alternative abstractions to overcome these inconveniences have been proposed (e.g., the convex polyhedra in Guenther 1988, 1989, and the convex hulls in Brinkhoff et al. 1993 minimize the dead-space; while the minimum bounding sphere in van Oosterom and Claasen 1990 is independent on the orientation). However, they are more complicated and, as a consequence, not widely adopted.

7

Notice that the spatial index only produces a set of candidate solutions (i.e., set of objects that possibly satisfy a spatial predicate). Hence a refinement step is required (i.e., two step strategy), in which the exact geometries of the candidate objects are examined and tested against the search predicate. If the predicate evaluates to true the object is added to the query result, otherwise it is ignored as a false hit. Figure 3: MBR approximations of European countries. SAM organizing MBR approximations are classified into four groups: a) ordering, b) transformation, c) clipping, and d) overlapping. Each of these groups is characterized by a special technique, which allows the extension of an one-dimensional access method or a point access method (PAM) to an SAM for managing MBR. The performance of such SAM depends on the underlying one-dimensional access methods or PAM and the technique applied for extending these methods to SAM. 5.1.1 The Ordering Technique The ordering technique introduces an one-dimensional ordering among a set of MBR based on both their location and extents. A space filling curve (e.g., Peano, Hilbert, Row, Row prime, etc.) is adopted so that multidimensional MBR are transformed into points or line segments (intervals) in an one-dimensional space (Jagadish 1990). The keys assigned to each MBR are the keys of the cells that are completely covered by that MBR. The transformation tries to preserve the distance, that is, points that are close in the n-d space are likely to be close in the 1-d space. The set of ordered MBR can be organized by any one-dimensional access method using the ordering number as a key. Several SAM using the ordering technique have been proposed (Abel and Smith 1983, 1984, Orenstein 1986, Faloustos 1988, Faloutsos and Rong 1991). The major advantage of this technique is that the SAM inherits the good properties (e.g., logarithmic performance) of the underlying onedimensional access method (e.g., B+-tree, hashing techniques). However, the introduction of an artificial one-dimensional ordering may require further enlargement of the MBR, which causes extensive overlap among ordering keys. This undesirable overlap has a negative impact on the search performance of the SAM using the ordering technique. 5.1.2 The Transformation Technique The basic idea of the transformation technique is to represent the MBR of spatial objects by points in a space of higher dimensionality (Hinrichs and Nievergelt 1983). An n-dimensional MBR is characterized by 2n coordinates (the two opposite corner points; or the center point plus the n extents of the MBR), and thus it can be considered as a point in a 2n-dimensional space. For instance, a set of 2-d MBR may be represented by an equal in number set of points in the 4-d space. The higher dimensionality points can be organized by any PAM, such as the grid file (Nievergelt et al. 1984) or the LSD-tree (Henrich et al. 1989). Notice that all properties of the underlying PAM are inherited by the SAM. The drawbacks of the transformation technique are, first, the formulation of spatial queries in the point space is much more complicated than it is in the original space; second, MBR lying close together are spread out in the point space, especially when they have widely varying extents; and third, the distribution of points in the point space is non-uniform even though the MBR are uniformly distributed in the original space. 5.1.3 The Clipping Technique In this technique the data space to be indexed is partitioned into a set of disjoint regions. A spatial object is associated with all regions which intersect its MBR. Specifically, when an MBR R intersects with more than one regions, it is partitioned into a minimal set of rectangles {R1, R2, ... , Rk} (or polyhedra in general), whose union gives R, and so that each Ri, 1 ≤ i ≤ k, intersects with exactly one disjoint region. These rectangles are represented in a file and may be organized by any PAM, like the grid file (Nievergelt et al. 1984) or the KDB-tree (Robinson 1981).

8

Examples of SAM using the clipping technique are the R+-tree (Sellis et al. 1987) and the Cell-tree (Guenther 1989). Since these structures avoid overlapping regions, the search operation is very efficient. Additionally, the properties of the underlying PAM are inherited. However, a drawback is obviously the significant data replication (i.e., each object is potentially represented in more than one regions), which requires extra storage and hence more expensive and complex insertion and deletion procedures. 5.1.4 The Overlapping Technique Contrary to clipping, with this technique the space can be partitioned into overlapping regions. Regions are derived from the MBR indexed and not the other way around. In addition, each MBR is inserted into a unique region (i.e., duplications are not allowed). Like the clipping technique, the MBR are represented in a file and may be organized by any PAM. The most popular SAM using the overlapping technique is the R-tree (Guttman 1984). R-tree is a balanced tree generalizing the B+-tree concept (Knuth 1973, Comer 1979) to spatial objects. The advantage of the overlapping technique is that the storage utilization depends only on the PAM, since every MBR is uniquely represented in the file. For instance, the space utilization in the R-tree is at least 50% (and usually around 67%), due to the underlying B+-tree structure. However, the area overlap between regions resulting from this technique may cause significant performance degradation (Faloutsos et al. 1987), thus it must be minimized (Roussopoulos and Leifker 1985, Beckmann et al. 1990). 5.2 GIS Operations and SAM Organizing MBR The scope of this Section is to examine how SAM organizing MBR approximations may be adopted to support the execution of data interpretation operations. As stated previously, SAM have been introduced to support the basic geometric operation of window search (point search can be viewed as a degenerated case). Hence, all four techniques for SAM are capable to support the operation of window search (for ordering see Sagan 1994; for transformation see Hinrichs 1985; for clipping see Sellis et al. 1987; for overlapping see Guttman 1984 and Beckmann et al. 1990). Several comparative studies have been performed in the past combining the different SAM available with a large variety of data sets (with variable sizes and distributions in space). Gaede and Guenther (1995) present an interesting survey on these studies and their results. Recently, the four techniques for SAM have been adopted to support other types of spatial operations. The basic strategy for handling those operations is to express the spatial predicate through one or more window searches and perform a refinement to the qualified set of objects. Specifically, Papadias, Theodoridis, Sellis and Egenhofer (1995) show how queries involving topological relations can be supported by both an overlapping (i.e., R*-tree) and a clipping (i.e., R+-tree) technique from R-trees family. Papadias, Theodoridis, Stefanakis and Sellis (Theodoridis et al. 1995, 1996, Papadias et al. 1997,) extend this work to support, additionally to topological, direction and distance relations using an overlapping technique (i.e., R*-tree). Brinkhoff, Kriegel, Schneider and Seeger (Brinkhoff et al. 1993, 1994) present how overlay operation, or spatial joins in general, can be supported efficiently using an overlapping technique (i.e., R*-tree). Roussopoulos, Kelley and Vincent (1995) show how an adaptive (what differentiates this method from other approaches is that the search window size changes during the traversal of the tree structure) overlapping technique (i.e., R-tree) may support the k-nearest neighbors operation. Finally, Gaede (1995) presents how the ordering technique (i.e., zordering) can be applied to provide an efficient execution of spatial join operation. Obviously, if a spatial predicate might be expressed through a set of window searches, all four techniques for SAM organizing MBR may support the execution of the corresponding data interpretation operation, possibly with a variable cost. An interesting area of research is to examine which technique is the most effective in each case. Apparently, this is interwoven with the set of windows forming the spatial predicate.

9

SAM organizing MBR approximations may be adopted, in a similar manner, to support other classes of data interpretation operations as well. The following Subsections make an attempt to prove this statement for several special operations commonly used in spatial data handling. 5.2.1 Location Properties for Surfaces Queries on location properties for surfaces, such as information regarding the height or slope of a given point (Table 1), are commonly posed on geographic databases. SAM organizing MBR approximations may support the retrieval of location properties, if original surface data (i.e., spot heights or contour lines) are structured on a Triangular Irregular Network (TIN). TIN structures (Peucker 1977, Kumler 1994) consist of triangular elements with vertices at the sample points of the area under study (i.e., spot heights). In order to compute the locational properties of a surface represented by a TIN (e.g., Figure 4c), it is required to find out the triangle which encloses that location. This constitutes a point-in-polygon operation, and consequently it can be supported by an SAM organizing the MBR approximations of the triangles used to represent the surface. Notice that the absence of the SAM and that of topological structures for the triangles of the TIN (which are not considered here) would impose a sequential scan on TIN’s elements. 5.2.2 Optimum Path Finding in Space Various algorithms have been proposed for the determination of the optimum path(s) in line networks. Moving in space under constraints is a far more complex problem, where research has been scarce. Examples would be the determination of the fastest path between two villages (Figure 4a); the shortest sea course between two ports (Figure 4b); the most regular gradient on ground path over a mountainous terrain (Figure 4c); the least-risky path in a hostile environment, for instance, the path with the maximum concealment time vis-à-vis an enemy or an observer (Figure 4d). An interesting graph based approach to the determination of the optimum path in space has been introduced recently by Stefanakis and Kavouras (1995, 1997). The concept behind this approach is to establish a network connecting a finite number of locations (including departure and destination points) in space, so that effective algorithms coming from the weighted graph theory (Gibbons 1985, Sedgewick 1990) and artificial intelligence (Rich and Knight 1991) can be adopted to indicate the optimum path for the desired trip. Specifically, the approach consists of four steps: 1. 2. 3. 4.

Determination of a finite number of spots in space. Establishment of a network connecting these spots. Formation of the travel cost model. Determination of the optimum path(s) from the point of reference (i.e., the departure or destination spot). Figure 4: Examples of optimum paths in space.

SAM organizing MBR approximations may support the execution of the third step. The travel cost model assigns weights to the edges of the network established in the previous step. Its form depends on both the space under study and the application needs. The most common examples of travel cost models are: − − − −

the model of distance; e.g., the shortest path finding (minimize overall distance) the model of time; e.g., the fastest path finding (minimize overall time) the model of expenses; e.g., the least expensive path finding (minimize the overall expenses) the model of risk; e.g., the least risky path finding (minimize the overall risk)

Assuming the cost model of distance (i.e., the shortest path finding) the space under study is represented by a set of spots which may be accessible or non-accessible (i.e., lying on obstacles; usually not considered). For instance, spots lying on the sea are accessible, while those lying on the continents are non-accessible for a ship (Figure 4b). The travel cost between two accessible spots is equal to the length of the shortest line connecting them, if they are intervisible (Aronoff 1989). Two

10

spots are intervisible, if the shortest line connecting them passes through no obstacle. If this is not the case, the travel cost between the two spots is computed implicitly passing through intermediate accessible spots. Obviously, if obstacles (e.g., continents) are approximated by their MBR and maintained in an SAM (e.g., an R-tree variant), the process of characterizing a spot as accessible or not, as well as that of concluding whether two spots are intervisible can be supported. The first process constitutes a pointin-polygon operation (i.e., find all spots falling within obstacles), while the second a spatial join operation (i.e., find all network edges which intersect obstacles). Notice that the absence of the SAM would require a sequential scan on database objects representing obstacles in order to execute both processes. Apparently, the idea may be easily extended to more complex spaces and travel cost models. 5.2.3 Intervisibility in Plane Surface with Obstacles Intervisibility operations are usually applied in geographic applications. Their objective is to retrieve all spatial objects that are visible (i.e., can be “seen”) from the specified target locations. Intervisibility operations make use of digital elevation data, while effective algorithms are available when a TIN is provided (Petrie and Kennie 1990, Weibel and Heller 1991, Kumler 1994). Assuming the simple case of a plane surface on which obstacles are present (Figure 4d), an SAM organizing MBR approximations may support the retrieval of all spatial objects that are visible to a target location (i.e., those objects intersect the shaded areas of Figure 4d). Notice that this operation consists of a set of window operations on database objects, while the absence of the SAM would impose a sequential scan on the entire set of objects in the database against both the obstacles and target locations. 5.3 Extensions The set of data interpretation operations that can be supported by SAM should be extended in order to achieve an efficient execution of operations and procedures in a production environment. There are three general directions where research on spatial data structures should be focused in order to support the DBMS for the application domain of GIS at the physical level: − Express more data interpretation operations, required by different application domains, through a set of window searches, so that their execution could be supported by the four techniques for SAM organizing MBR approximations. − Design new SAM for specific operations, such as the shortest path finding in line networks, which cannot be transformed into a set of window searches. This direction should also take into account operations that can be expressed into a set of window searches, like the nearest neighbor, in order to invent more effective techniques to support the execution of specific operations. − Perform comparative studies in order to evaluate the performance of existing SAM for the range of operations which may be supported and end up with a set of rules capable to indicate which SAM is the most effective according to a small set of parameters (such as operation class, database size, average object size, and object distribution in space). This task involves the formulation of analytical cost models for SAM and exhaustive experimental tests.

6. Planning the Execution Strategy of Spatial Procedures The optimization of spatial procedure execution is an important issue, which constitutes a relatively undeveloped field in the research area of spatial databases. The term “optimization”, although commonly used, is a misnomer, because in many cases (especially in non-conventional DBMS, like geographic DBMS) the execution strategy chosen by the DBMS is not the optimal strategy; it is just a reasonably efficient strategy for executing a sequence of operations. In conventional DBMS two are

11

the main techniques for query optimization, which are usually combined in a query optimizer (Elmasri and Navathe 1989, Date 1990): a) heuristic rules; and b) systematic cost estimate. The first technique is based on heuristics rules for ordering the operations in a procedure execution strategy. These rules determine an order for executing the operations forming a procedure. The main heuristic applied in conventional DBMS is to execute first operations that are not costly and reduce the size of intermediate temporary results (e.g., relations). Hence, SELECT and PROJECT operations should be performed as early as possible and before JOIN operations. In addition, SELECT and JOIN operations that are most restrictive (i.e., result in relations with the smallest absolute size) should be executed before other similar operations. The second technique is based on a systematic estimate of the cost of different execution strategies and choice of the execution plan with the lowest cost estimate. For this technique to work well, there should exist accurate cost estimates for each execution strategy, so that different execution strategies are compared fairly and realistically. Notice that the number of execution strategies considered must be limited, otherwise too much time will be spent on the cost estimation for many possible execution strategies. The two techniques applied in conventional DBMS can also be adopted and intermixed in the DBMS for the application domain of GIS in order to achieve a reasonably effective strategy for the execution of spatial procedures. However, this is a more complex task (Guenther and Buchmann 1990, Samet and Aref 1995), because both the diverse nature of geographic data as well as the different representations and SAM used should be taken into account. In addition, spatial predicates are more complicated than other types of predicates and heavily depend on the application domain. The spatial query optimization issue has been addressed in several prototype architectures for spatial database systems developed in the past (e.g., Gral in Gueting 1989, GeoQL in Ooi 1990, SAND in Aref and Samet 1991a,b). In all these studies an attempt is made to find some heuristic rules in order to reduce the cost of executing composite queries that involve spatial and non-spatial predicates. The following Subsections summarize and extend the set of heuristic rules (Section 6.1) and show how the systematic cost estimate technique may also be applied in planning the execution strategy of spatial procedures (Section 6.2). Finally, the issue of precomputing and storing certain spatial attributes and relationships, in order to facilitate the execution of spatial operations, is considered (Section 6.3). 6.1 Heuristic rules Heuristic rules determine an efficient order for executing the individual operations forming a procedure. Their objective is to reduce the overall execution cost (as regards to CPU time, I/O time, number of disk accesses, and storage requirements) by trying to use either indices to restrict an operand or precomputed and maintained spatial attribute values and relationships (see Subsection 6.3) or temporal results obtained from inexpensive and restrictive predicates (spatial and non-spatial). A set of heuristic rules have been introduced in order to support the execution of composite operations in the prototype architectures for spatial database systems developed in the past. Obviously, those rules extend the ideas of traditional database systems (Elmasri and Navathe 1989, Date 1990) and heavily depend on the system architecture, i.e., spatial data model in use, available SAM, etc. In the following paragraphs an attempt is made to list several standard rules for ordering the execution of individual operations forming a spatial procedure. It is assumed that database objects have spatial and thematic attributes and indices (spatial and non-spatial) can be built and maintained on them. − Perform selection and projection operations as early as possible. This is a standard technique for query optimization in conventional database systems. Independently on the data model in use, selection and projection can be seen as operations which typically reduce the size of a single file and never increase its size. On the other hand, the size of the files resulting from join and other binary operations is usually increased and it is a function of the sizes of the input files, in some cases a multiplicative function. Hence, before applying join or other binary operations, it is

12

− −







advantageous to reduce the sizes of the input files, so that all three storage requirements, I/O time, and disk accesses are minimized. Break up any selection operation with conjuctive conditions (spatial and non-spatial) into a cascade of simple selection operations. This permits a greater degree of freedom in reordering the execution of individual selections, so that the optimal strategy is achieved. Perform the most restrictive selections first. This ensures that intermediate temporary files are the smallest possible in both the number of objects (records) and absolute size. The most restrictive selections can be chosen based on the selectivity measure that is usually estimated in the database catalog (Elmasri and Navathe 1989). Perform operations that are supported by indices first. Both spatial and non-spatial indices may support the execution of selection operations. Operations that are supported by index structures should be performed first, so that intermediate temporary files, used as input to subsequent operations which are not supported by index structures, accommodate the description of a smaller number of objects and sequential scan is less expensive. Group operations that can be executed by a single access routine. When more that one selection operations refer either to the same attribute (spatial or non-spatial) and can be supported by existing index structures or to the same object and can not be supported by existing index structures (hence a sequential scan on the database objects in temporary files is imposed), their execution by a single access routine is beneficial. Replace some operations by alternative. This rule is application dependent and can be usually applied in spatial operations. A representative example is the alternative use of nearer versus nearest operation (Aref and Samet 1991b), in order to avoid the repetition of nearest object computation which meets a series of other conditions in a query.

6.2 Systematic cost estimate In order to choose the execution strategy based on a systematic cost estimate, analytical cost models are required, to compute the cost of individual operations. Obviously, this is not the case for all operations and corresponding SAM due to the complexity of the spatial index structures used, whose performance is heavily dependent on various parameters, such as the distribution of objects in space and their extents. This is the reason why prototype systems developed in the past did not adopt optimization techniques based on systematic cost estimate. However, recent developments in spatial database systems have introduced efficient models for predicting the cost of the R-tree index structure, which is commonly used to support the execution of common operations, such as the range (window) and join queries. The following Subsections present briefly the state-of-the art in the area of systematic cost estimate of these queries. 6.2.1 Cost Models for the Window Queries Several analytical models to estimate the performance of R-trees on window operations (range queries) have been proposed in the past (Kamel and Faloutsos 1993, Pagel et al. 1993, Faloutsos and Kamel 1994, Theodoridis and Sellis 1996). Following, the model introduced by Theodoridis and Sellis (1996), is described, which predicts R-tree performance using knowledge of the data set properties only, and is applicable to point and non-point objects. According to this model, the expected retrieval cost (i.e., number of disk accesses), for an ndimensional query window Q with extents (Q1, Q2, …, Qn), on each dimension, is given by the following formula, originally proposed in Kamel and Faloutsos (1993) and Pagel et al. (1993): h −1 n   C (Q) = 1 + ∑  N j ⋅ ∏ s j + Qi  j =1  i =1 

(

)

(Eq. 1)

In this formula the parameters involved are: h, which denotes the height of the tree structure; Nj, which denotes the expected number of nodes in the tree; and sj, which denotes the average node extent on each dimension (the model assumes that the sides of the nodes are equal on each dimension; this is

13

a simplification that is reasonable for “well structured” R-trees (Kamel and Faloutsos 1993)), at level j of the tree (the root is assumed at level j=h, and the leaf-nodes at level j=1). The expression for computing the height h of an R-tree is:

N   h = 1 + log c⋅ M c ⋅ M  

(Eq. 2)

where N is the number of distinct objects in the database, M is the maximum number of entries in an R-tree node, and c is the average node capacity (typically c = 67%; c⋅M denotes the average number of entries per node). The number Nj of nodes at level j is:

Nj=

N

(c ⋅ M )

(Eq. 3)

j

The following formula expresses the average node extent at level j (Theodoridis and Sellis 1996):

( )

s j = Dj Nj

1

n

(Eq. 4)

where Dj denotes the density of the node rectangles at level j which is computed as a function of the density Dj-1 of the node rectangles at level j-1 (notice that the density D of a set of N objects is defined as the sum of the object areas sj divided by the data space):

( )

1   n D j −1 − 1  D j = 1 +  1 (c ⋅ M) n    

n

(Eq. 5)

Hence, Dj can be recursively computed using D0 which denotes the density D of the data MBR. 6.2.2 A Cost Model for Spatial Joins Recently, the cost model presented in the previous Section has been extended by Theodoridis, Stefanakis and Sellis (1998) to also estimate efficiently the cost (in terms of disk accesses) of spatial join queries (e.g., overlay operation) between two spatial data sets. Assuming that each data set is indexed on an R-tree, the join query between the two sets is supported by applying a synchronized tree traversal on both R-tree indexes (Brinkhoff et al. 1994). According to the analysis presented in Theodoridis et al. (1998), the cost formula for join queries is the following (considering a simple path buffer scheme):

C ( R1 , R2 ) =

  N R2 , j ⋅ N R1 , j ⋅ j =1 

h −1



∏ (s n

k =1

R1 , j , k

)

+ s R2 , j , k + N R2 , j ⋅ N R1 , j +1 ⋅

∏ (s n

k =1

R1 , j +1,k

 + s R2 , j , k  

)

(Eq. 6) where C(R1,R2) denotes the average number of disk accesses needed to process a join query between the two data sets with NR1, NR2 data objects indexed in trees R1 and R2, respectively. NRi,j, denotes the average number of nodes of the tree Ri at level j and is given by the following formula as a function of the actual population NRi of the data set:

N Ri , j =

N Ri (c ⋅ M ) j

(Eq. 7)

14

h denotes the R-tree height, and is given by the following equation, also as a function of the actual population NRi of the data set:

 N Ri  h = 1 + log c⋅ M (c ⋅ M ) i  

(Eq. 8)

and SRi,j,k, denotes the average extent of nodes of the tree Ri at each dimension k at level j and is given by the formula:

(

sRi , j ,k = DRi , j NRi , j

)

1

n

(Eq. 9)

as a function of the amount NRi,j and the density DRi,j of the node rectangles at level j, which, in turn, is given by:

(

)

1   n D − 1  Ri , j −1 DRi , j = 1 +  1 (c ⋅ M ) n    

n

(Eq. 10)

recursively as a function of the actual density DRi of the data rectangles. 6.2.3 Evaluation of the Proposed Cost Models The proposed cost formulas for window and join queries are based on primitive data properties only without the corresponding R-trees needed to be built. However, the analysis assumes uniformity of data in order to express the density of the R-tree nodes at a parent level as a function of the density of the child nodes. This assumption might produce a model that could be efficient for uniform-like data distributions, but hardly applicable to non-uniform ones. To adapt the model in order to efficiently support any type of data sets (uniform or non-uniform ones) the use of a density surface describing the data set has been adopted instead of a single average density value. The density surface of a real data set from the TIGER database of the US Census Bureau is illustrated in Figure 5. Figure 5: A real data set and its density surface. Comparison results for the evaluation of the proposed cost formulas appear in both Theodoridis and Sellis (1996), and Theodoridis et al. (1998). The analytical estimations of disk accesses were compared with experimental results on synthetic (random and skewed) and real data sets. Table 2 summarizes the average relative errors of the experimental results compared to the analytical predictions. Table 2: Average relative error in estimating the number of disk accesses. Data sets Random data Skewed data Real data

Relative Error range queries 0%-5% 0%-10% 0%-20%

point queries 0%-10% 0%-15% 0%-15%

join queries 0%-15% 0%-20% 0%-15%

The estimated cost of window (point or range) and join queries is very close to the actual experimental results both for uniform-like and non-uniform data distributions, with the relative error never exceeding the 15% or 20%, respectively.

15

In conclusion, the above cost models are very useful and can be used in comparison with cost models available in traditional databases, such as the logarithmic cost for B+-trees, in order to provide a systematic cost estimate for operations combining spatial and thematic predicates. 6.3 Precomputation of spatial attribute values and relationships In order to facilitate the processing of spatial operations and procedures it is sometimes beneficial to precompute and store certain spatial attribute values or relationships. Several spatial attribute values, such as the area and perimeter of a region, which are commonly used in spatial operations, require both large CPU time to compute and many disk accesses to load region properties in the main memory. A trivial and effective solution to this problem is to precompute these values while entering (or updating) spatial objects in the database and store them as thematic attributes along with the objects for later use. On the other hand, spatial relationships are seldom stored explicitly due to their large cardinality. Despite the fact that their retrieval can be supported by SAM (Section 5), it is sometimes beneficial to precompute and store certain spatial relationships, if they are relatively stable, frequently referenced, and with manageable size. In order to store the precomputed relationships (e.g., intersect and contain) a special index file, such as a spatial join index (Rotem 1991), can be created and maintained. What the spatial join indices store are pairs of object identifiers having the predefined spatial relationship. Notice that the technique to precompute spatial attribute values and relationships should not be pushed into extreme, because: a) in most of the cases data are not static and change quite often; and b) the database is usually very large and the relationships between objects are many-to-many for each relation. Obviously, it is expensive to recompute such indices and keep all data consistent and up-todate.

7. Conclusion The design and implementation of a DBMS for the application domain of GIS constitutes a hard task due to peculiarities of spatial data and operations involved (Stefanakis and Sellis 1996a,b, 1997, Stefanakis 1997). This paper examines the basic classes of operations that should be provided by such a system and focuses on the mechanisms available to database developers towards the efficient execution of simple operations and composite procedures involving both the spatial and non-spatial dimensions of geographic data. Specifically, it is shown how the execution of various classes of individual data interpretation operations available in GIS packages can be supported by existing spatial access methods (SAM) organizing minimum bounding rectangle (MBR) approximations of the spatial objects. The general concept behind this approach is to express data interpretation operations through a set of window searches (range queries) which can be readily supported by existing SAM. In addition, the paper provides guidelines for planning the execution strategy of composite procedures consisting of a series of individual operations and involving both the spatial and non-spatial dimensions of geographic data. For this reason different techniques are considered, such as the formulation and adoption of heuristic rules, the systematic cost estimate of various strategies based on available cost models, and the precomputation of spatial attribute values and relationships commonly used in spatial procedures. Future research is focused on: a) the transformation of more data interpretation operations, required by different application domains, to a set of window searches, so that their execution could be supported by SAM organizing MBR approximations of spatial objects; b) the examination of other techniques, apart from SAM organizing MBR approximations of spatial objects, and how they can be used to support different classes of data interpretation operations; c) the performance of comparative studies in order to find out the most effective spatial index structure for a wide range of data interpretation operations; and d) the mechanisms adopted to plan the execution strategy of composite

16

procedures that involve both the spatial and non-spatial dimensions of geographic data, i.e., on the extension of the set of heuristic rules, the formulation of analytical models that estimate the cost of various individual operations, and finally the design of efficient index structure to store precomputed spatial attribute values and relationships commonly used in spatial procedures.

References ABEL, D.J., 1989, A database toolkit for Geographical Information Systems. International Journal of Geographical Information Systems, 3, pp. 103–115. ABEL, D.J., and SMITH, J.L., 1983, A data structure and algorithm based on a linear key for a rectangle retrieval problem. Computer Vision, Graphics and Image Processing, 24, pp. 1-13. ABEL, D.J., and SMITH, J.L., 1984, A data structure and query algorithm for a database of areal entities. The Australian Computer Journal, 16(2), pp. 147-157. ABITEBOUL, S., 1997, Querying semi-structured data. In Proceedings of the 6th International Conference on Database Theory. Dephi, Greece, pp. 1-18. AREF, W.G., and SAMET, H., 1991a, Extending a DBMS with spatial operations. In Proceedings of the 2nd Symposium on Large Spatial Databases, Zurich, Switzerland, pp. 299–318. AREF, W.G., and SAMET, H., 1991b, Optimization strategies for spatial query processing. In Proceedings of the 17th Symposium on Very Large Data Bases, Barcelona, Spain, pp. 81-90. ARMSTRONG, M.P., and DENSHAM, P.J., 1990, Database organization strategies for spatial decision support systems. International Journal of Geographical Information Systems, 4, pp. 3-20. ARMSTRONG, M.P., and DENSHAM, P.J., 1994, Toward the development of a conceptual framework for GIS-based collaborative spatial decision-making. In Proceedings of the 2nd ACM Workshop on Advances in GIS, Baltimore, Maryland, USA, pp. 4-7. ARMSTRONG, M.P., DENSHAM, P.J., LOLONIS, P., and RUSHTON, G., 1992, Cartographic displays to support locational decision-making. Cartography and Geographic Information Systems, 19(3), pp. 154-164. ARONOFF, S., 1989, Geographic Information Systems: A Management Perspective (WDL Publications). BECKMANN, N., KRIEGEL, H.P., SCHNEIDER, R., and SEEGER, B., 1990, The R*-tree: an efficient and robust access method for points and rectangles. In Proceedings of the ACMSIGMOD Conference on Management of Data, Atlantic City, N.J., USA, pp. 322-331. BERRY, J.K., 1987, Fundamental operations in computer-assisted map analysis. International Journal of Geographical Information Systems, 1(2), pp. 119-136. BRINKHOFF, T., KRIEGEL, H.P., SCHNEIDER, R., 1993, Comparison of approximations of complex objects used for approximation-based query processing in spatial database systems. In Proceedings of the IEEE Conference on Data Engineering, Vienna, Austria, pp. 40-49. BRINKHOFF, T., KRIEGEL, H.P., SCHNEIDER, R., SEEGER, B., 1994, Efficient processing of spatial join using R-trees. In Proceedings of the ACM-SIGMOD Conference on Management of Data, Minneapolis, Minessota, USA, pp. 237-246. COMER, D., 1979, The ubiquitous B-tree. Computing Surveys, 11, pp. 121-137. DATE, C.J., 1990, An Introduction to Database Systems (Addison-Wesley). DAVID, R., RAYNAL, I., SCHORTER, G., and MANSART, V., 1993, GeO2: why objects in a geographical DBMS. In Proceedings of the 3rd Symposium on Large Spatial Databases, Singapore, Australia, pp. 264–276. DANGERMOND, J., 1983, A classification of software components used in GIS. In Peuquet and O’Callaghan 1983. DCDSTF, 1988, The proposed standard for digital cartographic data. The American Cartographer, 15(1), pp. 9-140. DENSHAM, P.J., 1991, Spatial decision support systems. In Maguire et al. 1991, pp. 403-412. EGENHOFER, M.J., 1991, Extending SQL for geographical display. Cartography and Geographic Information Systems, 18(4), pp. 230-245. ELMASRI, R., and NAVATHE, S.B., 1989. Fundamentals of Database Systems (BenjaminCummings).

17

ESRI, 1989, Arc/Info: the georelational model revisited. ARC News, 11(1), pp. 9. FALOUTSOS, C., SELLIS, T., and ROUSSOPOULOS, N., 1987, Analysis of object oriented spatial access methods. In Proceedings of the ACM-SIGMOD Conference on Management of Data, San Francisco, California, USA, pp. 426-439. FALOUTSOS, C., 1988, Gray codes for partial match and range queries. IEEE Transactions on Software Engineering, 14, pp. 1381-1393. FALOUTSOS, C., and RONG, Y., 1991, DOT: a spatial access method using fractals. In Proceedings of the IEEE Conference on Data Engineering, Kobe, Japan, pp. 152-159. FALOUTSOS, C., and KAMEL, I., 1994, Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension. In Proceedings of the 13th ACM Symposium on Principles of Database Systems, Minneapolis, Minnesota, USA, pp. 4-13. GAEDE, V., 1995, Geometric information makes spatial query processing more efficient. In Proceedings of the 3rd ACM Conference on GIS, Baltimore, Maryland, USA, pp. 45-52. GAEDE, V., and GUENTHER, O., 1995, Multidimensional access methods. Technical Report TR-96043, Institute of Information Systems, Humboldt-Universitaet, Berlin, Germany (Submitted for publication). GATRELL, A.C., 1991, Concepts of space and geographical data. In Maguire et al. 1991, pp. 119134. GIBBONS, A., 1985, Algorithmic Graph Theory (Cambridge University Press). GIORDANO, A., VEREGIN, H., ROAK, E., and LANTER, D., 1994, A conceptual model of GISbased spatial analysis. Cartographica, 31(4), pp. 44-57. GOODCHILD, M.F., 1991, The technological setting of GIS. In Maguire et al. 1991, pp. 45-54. GUENTHER, O., 1988, Efficient Structures for Geometric Data Management (Springer-Verlag). GUENTHER, O., 1989, The design of the cell-tree: an object-oriented index structure for geometric databases. In Proceedings of the IEEE Conference on Data Engineering, Los Angeles, California, USA, pp. 598-605. GUENTHER, O., and BUCHMANN, A., 1990, Research issues in spatial databases. SIGMOD Record, 19, pp. 61-68. GUETING, R.H., 1989, Gral: an extensible relational database system for geographic applications. In Proceedings of the 15th Conference on Very Large Data Bases, Amsterdam, The Netherlands, pp. 33-44. GUETING, R.H., 1994, An introduction to spatial database systems. VLDB Journal - Special Issue on Spatial Database Systems, 3, pp. 357-399. GUTTMAN, A., 1984, R-trees: a dynamic index structure for spatial searching. In Proceedings of the ACM-SIGMOD Conference on Management of Data, Boston, Massachusetts, USA, pp. 47-57. HAAS, L.M., and CODY, W.F., 1991, Exploiting extensible DBMS in integrated GIS. In Proceedings of the 2nd Symposium on Large Spatial Databases, Zurich, Switzerland, pp. 423– 449. HENRICH, A., SIX, H.W., WIDMAYER, P., 1989, The LSD-tree: spatial access to multidimensional point and non-point objects. In Proceedings of the 15th International Conference on Very Large Data Bases, Amsterdam, The Netherlands, pp. 45-53. HINRICHS, K., 1985, Implementation of the grid-file: design concepts and experience. BIT, 25, pp. 569-592. HINRICHS, K., and NIEVERGELT, J., 1983, The grid-file: a data structure to support proximity queries on spatial objects. Technical Report 54, Institut fur Informatik, ETH, Zurich, Switzerland. HOPKINS, L.D., 1984, Evaluation of methods for exploring ill-defined problems. Environmental Planning B: Planning and Design, 11, pp. 339-348. INTERGRAPH, 1990, MGE: the modular GIS environment. Intergraph Brochure, New York. JAGADISH, H.V., 1990, Linear Clustering of Objects with Multiple Attributes. In Proceedings of the ACM-SIGMOD Conference on Management of Data, Atlantic City, N.J., USA, pp. 332-342. KAMEL, I., and FALOUTSOS, C., 1993, On Packing R-trees. In Proceedings of the 2nd International Conference on Information and Knowledge Management, Washington, D.C., USA, pp. 490499. KANAKUBO 1993, ICA report: the selected main theoretical issues facing cartography. Cartographica, 30(4), pp. 1-20. KIM, W., 1995, (ed), Modern Database Systems (ACM Press). KNUTH, D., 1973, The Art of Computer Programming: Sorting and Searching (Addison-Wesley).

18

KUMLER, M.P., 1994, An intensive comparison of triangular irregular networks (TIN) and digital elevation models (DEM). Cartographica - Monograph 45, 31(2), pp. 1-99. MAGUIRE, D.J., GOODCHILD, M.F., and RHIND, D.W., 1991, Geographic Information Systems: Principles and Applications (Longman). MUEHRKE, P.C., 1990, Cartography and GIS. Cartography and Geographic Information Systems, 17(1), pp. 7-15. MUELLER, J.C., 1991, Advances in Cartography (Elsevier Applied Science, New York). NIEVERGELT, J., HINTERBERGER, H., and SEVCIK, K.C., 1984, The grid-file: an adaptable, symmetric multikey file structure. ACM Transactions on Database Systems, 9, pp. 38-71. OOI, B.C., 1990, Efficient Query Processing for Geographic Information Systems (Springer-Verlag). van OOSTEROM, P., and CLAASEN, E., 1990, Orientation insensitive indexing methods for geometric objects. In Proceedings of the 4th International Symposium on Spatial Data Handling, Zurich, Switzerland, pp. 1016-1029. van OOSTEROM, P., and VIJLBRIEF, T., 1991, Building a GIS on top of the open DBMS Postgres. In Proceedings of the 2nd European Conference on Geographical Information Systems, pp. 775-787. van OOSTEROM, P., and VIJLBRIEF, T., 1992, The GEO++ system: an extensible GIS. In Proceedings of the 5th International Symposium on Spatial Data Handling, Charleston, USA, pp. 40-50. ORENSTEIN, J., 1986, Spatial query processing in an object oriented database system. In Proceedings of the ACM-SIGMOD Conference on Management of Data, Washington, D.C., USA, pp. 326-336. PAGEL, B., SIX, H., TOBEN, H., and WIDMAYER, P., 1993, Towards an analysis of range query performance. In Proceedings of the 12th ACM Symposium on Principles of Database Systems, Washington, D.C., USA, pp. 214-221. PAPADIAS, D., THEODORIDIS, Y., SELLIS, T., and EGENHOFER, M.J., 1995, Topological relations in the world of minimum bounding rectangles: a study with R-trees. In Proceedings of the ACM-SIGMOD Conference on Management of Data, San Jose, California, USA, pp. 92103. PAPADIAS, D., THEODORIDIS, Y., and STEFANAKIS, E., 1997, Multi-dimensional range query processing with spatial relations. Geographical Systems. In press. PETRIE, G., and KENNIE, T.J.M., 1990, Terrain Modelling in Surveying and Civil Engineering (Whittles Publishing). PEUCKER, T.K., 1977, Data structures for digital terrain models: discussion and comparison. In Proceedings of the 1st International Advanced Study Symposium on Topological Data Structures for Geographic Information Systems, pp. 1-15. PEUCKER, T.K., and CHRISMAN, N.R., 1975, Cartographic data structures. The American Cartographer, 2(1), pp. 55-69. PEUQUET, D.J., 1984, A conceptual framework and comparison of spatial data models. Cartographica, 21(4), pp. 66-113. PEUQUET, D.J., and O’CALLAGHAN, J., 1983, (Eds), Design and Implementation of ComputerBased GIS (International Geographic Union Commission on Geographic Data Sensing and Processing). RHIND, D.W., and GREEN, N.P.A., 1988, Design of a GIS for heterogeneous scientific community. International Journal of Geographical Information Systems, 2(2), pp. 171-189. RICH, E., and KNIGHT, K., 1991, Artificial Intelligence (McGraw-Hill). ROBINSON, J.T., 1981, The K-D-B-tree: a search structure for large multidimensional dynamic indexes. In Proceedings of the ACM SIGMOD Conference on Management of Data, pp. 10-18. ROTEM, D., 1991, Spatial join indices. In Proceedings of the 7th IEEE Conference on Data Engineering, Kobe, Japan, pp. 500-509. ROUSSOPOULOS, N., and LEIFKER, D., 1985, Direct spatial search on pictorial databases using packed R-trees. In Proceedings of the ACM-SIGMOD Conference on Management of Data, Austin, Texas, USA, pp. 17-31. ROUSSOPOULOS, N., KELLEY, S., and VINCENT, F., 1995, Nearest neighbor queries. In Proceedings of the ACM SIGMOD Conference on Management of Data, San Jose, California, USA, pp. 71-79. SAGAN, H., 1994, Space-Filling Curves (Springer-Verlag). SAMET, H., 1990, The Design and Analysis of Spatial Data Structures (Addison-Wesley).

19

SAMET, H., and AREF, W.G., 1995, Spatial data models and query processing. In Kim 1995, pp. 339–360. SCHOLL, M., and VOISARD, A., 1992, Object-oriented database systems for geographic applications: an experiment with O2. In Bancilhon, F., Delobel, C., and Kanellakis, P., (eds), Building an Object-Oriented Database System: The Story of O2 (Morgan Kaufmann). SEDGEWICK, R., 1990, Algorithms (Addison-Wesley). SELLIS, T., ROUSSOPOULOS, N., and FALOUTSOS, C., 1987, The R+-tree: a dynamic index for multi-dimensional objects. In Proceedings of the 13th Conference on Very Large Data Bases, Brighton, England, pp. 507-518. SIEMENS, 1987, SICAD: the GIS for modern mapping. Siemens Data Systems Division Brochure, The Netherlands. SMITH, T.R., MENON, S., STAR, J.L., and ESTES, J.E., 1987, Requirements and principles for the implementation and construction of large-scale GIS. International Journal of Geographical Information Systems, 1, pp. 13-31. SMITH, T.R., and GAO, P., 1990, Experimental performance evaluations on spatial access methods. In Proceedings of the 4th International Symposium on Spatial Data Handling, Zurich, Switzerland, pp. 991-1002. STEFANAKIS, E., 1997, Development of intelligent geographic information systems. Ph.D. Dissertation. Knowledge and Database Systems Laboratory, Department of Electrical and Computer Engineering, National Technical University of Athens, Greece. STEFANAKIS, E., and KAVOURAS, M., 1995, On the determination of the optimum path in space. In Proceedings of the 2nd International Conference on Spatial Information Theory, Semmering, Austria, pp. 241-257. STEFANAKIS, E., and KAVOURAS, M., 1997, Navigating in space under constraints. Technical Report 97-08. Knowledge and Database Systems Laboratory, Department of Electrical and Computer Engineering, National Technical University of Athens, Greece (Submitted for publication). STEFANAKIS, E., and SELLIS, T., 1996a, A DBMS repository for Temporal GIS. Position paper in ESF GISDATA Specialist Meeting on Spatio-Temporal Change in Socio-Economic Units, Nafplion, Greece. In ESF GISDATA book: Time and Motion. To be published by Taylor & Francis in 1998. STEFANAKIS, E., and SELLIS, T., 1996b, A DBMS repository for the application domain of GIS. In Proceedings of the 7th International Symposium on Spatial Data Handling, Delft, The Netherlands, pp. 3B19-3B29. STEFANAKIS, E., and SELLIS, T., 1997, Towards the design of a DBMS repository for the application domain of GIS: requirements of users and applications, In Proceedings of the 18th ICA International Cartographic Conference, Stockholm, Sweden, pp. 2030-2037. STEFANAKIS, E., VAZIRGIANNIS, M., and SELLIS, T., 1996, Incorporating fuzzy logic methodologies into GIS operations. In Proceedings of the 1st International Conference on Geographic Information Systems in Urban, Regional and Environmental Planning, Samos, Greece, pp. 61-68. Also In Proceedings of the XVIII ISPRS Congress. Vienna, Austria. STEFANAKIS, E., VAZIRGIANNIS, M., and SELLIS, T., 1997, Incorporating fuzzy set methodologies in a DBMS repository for the application domain of GIS. Technical Report 9706. Knowledge and Database Systems Laboratory, Department of Electrical and Computer Engineering, National Technical University of Athens, Greece (Submitted for publication). TOMLIN, C.D., 1990, Geographic Information Systems and Cartographic Modeling (Prentice Hall). TOMLINSON, R.F., and BOYLE, R., 1981, The state of development of systems for handling natural resources inventory data. Cartographica, 18(4), pp. 65-95. THEODORIDIS, Y., PAPADIAS, D., STEFANAKIS, E., and SELLIS, T., 1995, Direction relations and two-dimensional range queries: optimization techniques. NCGIA Technical Report 95-9, National Center of Geographic Information and Analysis, USA. THEODORIDIS, Y., PAPADIAS, D., and STEFANAKIS, E., 1996, Supporting direction relations in spatial database systems. In Proceedings of the 7th International Symposium on Spatial Data Handling, , Delft, The Netherlands, pp. 12A.1-12A.15. THEODORIDIS, Y., and SELLIS, T., 1996, A model for the prediction of R-tree performance. In Proceedings of the 15th ACM Symposium on Principles of Database Systems, Montreal, Canada, pp. 161-171.

20

THEODORIDIS, Y., STEFANAKIS, E., and SELLIS, T., 1998, Cost models for join queries in spatial databases. In Proceedings of the IEEE International Conference on Data Engineering, Orlando, Florida, USA, (to appear). WEIBEL, R., and HELLER, M., 1991, Digital terrain modeling. In Maguire et al. 1991, pp. 269-297. WORBOYS, M.F., 1994, Object-oriented approaches to geo-referenced information. International Journal of Geographical Information Systems, 8(4), pp. 385-399. WORBOYS, M.F., 1995, GIS: A Computing Perspective (Taylor & Francis).

21

Application layer

Application layer

Relational DBMS

Relational DBMS

Spatial DBMS

(a)

(b)

Application layer

Application layer

Extended RDBMS

O-O DBMS

(c)

(d) Figure 1: Architectural approaches.

(a)

(b)

Operation

(c)

(d)

(e) Figure 2: The data model and data interpretation operations.

22

(f)

NO

FI

IC

SW

UK DE IR

GE

PL

NL CZ

BE LU

FR

RO

AU HU CH

BU YU AL PO SP

IT GR

(a)

(b)

Figure 3: MBR approximations of European countries.

Atlantic Ocean Bristol

lake

village 2

village 1 Halifax

(a)

Departure

(b) Observer Destination

(c)

obstacle

(d)

Figure 4: Examples of optimum paths in space.

( a ) LBcounty data set ( b ) LBcounty density surface Figure 5: A real data set and its density surface.

23