Nash Equilibrium based Semantic Cache in Mobile

0 downloads 0 Views 19MB Size Report
queries in distributed sensor grid database systems. It considers ... a distributed and self-organizing mobile system with SQL- like query processing capability ...... tocols in selfish, tariff-free, multihop wireless networks,” in Proceedings of IEEE ...
1

Nash Equilibrium based Semantic Cache in Mobile Sensor Database Systems Qingfeng Fan, Karine Zeitouni, Naixue Xiong, Qiongli Wu, Seyit Camtepe and Yu-Chu Tian

Abstract—Mobile applications are being increasingly deployed on a massive scale in various mobile sensor database systems. With limited resources from the mobile devices, how to process the huge number of queries from mobile users with distributed sensor grid databases becomes a critical problem for such mobile systems. While the fundamental semantic cache technique has been investigated for query optimization in sensor grid database systems, the problem is still difficult due to the fact that more realistic multi-dimensional constraints have not been considered in existing methods. To solve the problem, a new semantic cache scheme is presented in this paper for location-dependent data queries in distributed sensor grid database systems. It considers multi-dimensional constraints or factors in a unified cost model architecture, determines the parameters of the cost model in the scheme by using the concept of Nash equilibrium from game theory, and makes semantic cache decisions from the established cost model. The scenarios of three factors of semantic, time and locations are investigated as special cases, which improve existing methods. Experiments are conducted to demonstrate the semantic cache scheme presented in this paper for distributed sensor grid database systems. Index Terms—Semantic cache; sensor grid database system; location-dependent data query; Nash equilibrium; game theory

I. I NTRODUCTION With the rapid development of wireless network and communication technologies, mobile applications are being increasingly deployed on a massive scale in various sensor grid systems. Serving billions of mobile users who carry portable devices every day, those mobile applications generate a huge number of queries and thus require effective, efficient and distributed query processing. A sensor grid system is a distributed and self-organizing mobile system with SQLlike query processing capability [1]. It gathers, distributes and acts on the information about the behavior of all mobile participants including suppliers and consumers [1], [2], [3]. Such information and users’ queries are typically locationdependent, e.g., for tourists to learn about their geographic surroundings [4], [5]. Mobile services from a sensor grid system help tourists to find attractions, hotels, restaurants, petrol stations, shops and car parking. Manuscript received ???? ??, 2015. Q. Fan and K. ZAeitouni are with the Laboratory PRISM, the University of Versailles-Saint-Quentin, 78035 Versailles Cedex, France (email: {qingfeng.fan,karine.zeitouni}@prism.uvsq.fr). N. Xiong is with the School of Computer Science, Colorado Technical University, Colorado Spring, CO, 80907, USA (email: [email protected]). Q. Wu is with the Ecole Centrale de Paris and INRIA Saclay ˆIle-de-France, France (email: [email protected]). S. Camtepe and Y.-C. Tian are with the School of Electrical Engineering, Queensland University of Technology, GPO Box 2414, Brisbane, Australia (email: {seyit.camtepe,y.tian}@qut.edu.au).

How to process the huge number of queries from mobile users with distributed sensor grid databases is a critical problem in sensor grid systems. Mobile devices have limited resources in computing power, memory, storage, energy and network bandwidth. They are not able to respond to users’ queries as traditional centralized database systems with powerful servers and resources can do [6], [7], [8]. To save resources in sensor grid systems, when a mobile user generates a query, it seeks answers from its own cache, its neighboring nodes, and the system servers. Therefore, semantic cache and distributed processing of queries are important for mobile applications, particularly for some hot point questions. This has motivated the research and development in semantic cache optimization [9], [10] for distributed query processing in sensor grid database systems. In spite of some progress, semantic cache is still a difficult problem in sensor grid systems for dealing with locationdependent applications. There is a lack of quantitative analysis of dynamic and location-dependent information from the sensor grid systems. In the limited effort on this problem [11], [12], only two factors have been considered, which are semantics and time factors. While the work in [13] has considered the dynamic location factor, it has not addressed the semantics and time factors at all. Reports on semantic cache with consideration of over two factors have not been found in the literature except our preliminary studies in a recent conference paper [14], in which three factors have been investigated, i.e., semantics, time and dynamic location factors. To address this difficult problem, a new semantic cache scheme is presented in this paper for location-dependent data (LDD) queries in distributed sensor grid database systems. It considers multi-dimensional constraints or factors in a unified cost model architecture, derives the model parameters through using the concept of Nash equilibrium from game theory, and makes decisions on semantic cache from the established cost model. Particularly, the scenarios of three factors, i.e., semantics, time and locations, which we discussed recently in a preliminary work [14] for improvement of existing methods, are investigated as special cases. Practical algorithms are further developed to practically implement the theoretic results of the semantic cache scheme. The reminder of the paper is organized as follows. Notations used throughout this paper are listed in Table I. Section II reviews related work and motivates the research of this paper. Section III presents the theoretic development of our semantic cache scheme. This is followed by algorithm design of the scheme in Section IV. The performance of the semantic cache scheme is evaluated in Section V through simulation

2

experiments. Finally, Section VI concludes the paper. TABLE I

PARAMETERS OF THE S EMANTIC C ACHE S CHEME Parameters B&BGDSF C CCache CDisk Ci d Ct Cseg CSC CSDS Ci Ei Fi I iDistM J K Ki LDD L LRU Mi (Lx , Ly ) Q QA QC QL QP QR Pi P rioA Qi,j S SA SC Si SL SP SR ST S T ∆t tc tx tf V (Vx , Vy ) Wi,j Wi,∗ Wi,∗,k wi,j,k Xi (xpos,ypos) αi,k Φ Ψ θi

Description Branch bound and greedy dual-size frequency CSC clusters: semantically related or adjacent queries CPU cost of an in-cache query in B&BGDSF CPU cost of an out-of-cache query in B&BGDSF Cluster id in CSC scheme Cluster timestamp in CSC scheme A set of segments in a CSC cluster Clustering semantic cache Collaborative spatial data sharing Cost of bringing object i into cache in B&BGDSF ith cell of a grid in CSDS scheme Frequency of access to an object i in B&BGDSF Number of mobile devices in LDD Distance between a mobile device and a cell in CSDS Number of semantic cache blocks in LDD Number of factors (e.g., time and location) in LDD Priority key of an object in B&BGDSF Location-dependent data Running life factor in B&BGDSF Least recently used cache replacement policy ith mobile device in CSDS scheme Coordinates of querying mobile client Location-dependent data query LDD query attributes, e.g., hotel name LDD query result LDD query bound location, i.e., (Lx , Ly ) LDD query selection conditions, e.g., price and vacancy LDD query semantics, e.g., hotel A specific location that a mobile client passes Replacement priority of node A in CSDS scheme j th LDD query at location i Semantic segment, cached LDD queries and results Cached LDD query attributes, e.g., hotel name Cached semantic results, a link to the first page Size of the object i in B&BGDSF Cached LDD query bound location, i.e., (Lx , Ly ) Cached LDD query selection conditions, e.g., price Cached LDD query semantics, e.g., hotel Cached LDD query time-stamp Size of the new query result in B&BGDSF Time increment in CSDS scheme Current time in CSDS scheme Time Mi passes the midle point of Ei in CSDS A future time later than tx in CSDS scheme Velocity of mobile device Mi in CSDS scheme Velocity of a mobile device in x and y directions Weight of mobile node i and cache j over all factors Linear combination of K weight vectors Wi,∗,k Weight vector for mobile node i and factor k Weight of mobile node i, cache block j and factor k 1 if object i is in cache, 0 otherwise in B&BGDSF Coordinates of a queried object, e.g., hotel Non-negative scalar coefficient for wi,j,k Series of n locations Pi that a mobile client passes Whole set of LDD queries Qi,j Balanced solution vector angle for player 0 ≤ i ≤ K

semantic cache (CSC), branch bound and greedy dual-size frequency (B&BGDSF ) semantic cache, and collaborative spatial data sharing (CSDS) semantic cache. After that, the motivations of our work in this paper are discussed. A. LDD Query and Semantic Cache Modelling location-dependent applications requires to represent moving objects and define an LDD query. A moving object is described with its moving behavior characterized by its location, speed and direction [15]. An LDD query Q is expressed by tuple Q = (QR , QA , QP , QL , QC ), where QR specifies query semantics (e.g., Hotel), QA represents query attributes (e.g., Hotel Name), QP defines selection conditions, QL gives location bounds, and QC is the semantic result, respectively. A semantic cache S is a set of cached LDD queries with semantic results. Each of the cached LDD queries is denoted as S = (SR , SA , SP , SL , SC ) with the same meaning as Q = (QR , QA , QP , QL , QC ). A simple example of an LDD cache index is a yellow page database with two relations: Hotel (Hno, Hname, Price, Vacancy, hxpos, hypos) and Restaurant (Rno, Rname, Type, Schedule, rxpos, rypos), as shown in Table II. When a mobile node queries the database, the system computes the query results, which are candidates to be cached afterwards [16]. A semantic cache scheme consists of Semantic Caching and Cache Replacement Policy. Semantic cache is organized in semantic segments composed of a set of items attached with related semantic descriptions. The cache replacement policy replaces part of the current cache with new query results. With semantic cache, a query is processed in several steps. When a new query comes, the system rewrites the query to two disjoint pieces: remainder query and probe query. The remainder query can be answered locally from the cache, while the probe query is sent to the database server for a remote response. After the query processing, some query results are stored in the cache as a cache fragment for future use. This concept is applicable to location-dependent applications in mobile grid systems [17]. For cache replacement, several methods have been developed. The most popular method is the Least Recently Used (LRU), which replaces the cached document that was requested the least recently. Extensions or variations to LRU have been proposed for various scenarios. The other three representative methods are CSC [11], B&BGDSF [12] and TABLE II A N EXAMPLE OF LDD SEMANTIC CACHE INDEX . S SR

II. BACKGROUND , R ELATED W ORK AND M OTIVATIONS Semantic cache has been widely used in centralized database systems. It has also been adopted in mobile computing environments. This section begins with a brief introduction of the concepts of LDD query and semantic cache. Then, it reviews three representative types of existing techniques for semantic cache replacement in mobile grid systems: clustering

SA

SP SL SC ST S (Lx − 10 ≤ xpos ≤ S1 Hotel Hname Lx + 10) ∧ (Ly − 10 20,40 2 T1 ≤ ypos ≤ Ly + 10) (Lx − 5 ≤ xpos ≤ S2 Rest Rname,Type Lx + 5) ∧ (Ly − 10 -10,30 5 T2 ≤ ypos ≤ Ly + 10) (Lx − 5 ≤ xpos ≤ S3 Hotel Hname Lx + 5) ∧ (Ly − 15 -10,-40 7 T3 ≤ ypos ≤ Ly + 15) Lx , Ly : location of the mobile client in Euclidean space. xpos, ypos: location coordinates in x and y directions, respectively.

3

CDCS [13], which use different policies to replace cache. They will be reviewed below in detail. B. Clustering Semantic Cache The CSC technique originally reported in [11] considers two factors in semantic cache: semantics and time. It divides queries into groups, semantically related queries and semantically adjacent queries. Each of these two groups of queries is called a cluster C. A cluster C is associated with a unique ID Cid , a set of segments that are linked as an LRU queue Cseg , and a time-stamp Ct , which is the most recent time-stamp among all the segments in Cseg [18], [19]. CSC maintains a semantic cache index, which refers to a cluster structure for a set of clusters. Table III shows an example of a semantic cache index and cluster structure. It is seen from Table III that CSC utilizes the similarity between the semantic predicates SPs. Cluster 1 includes S1 , S3 and S4 with the last operation time being T4 ; while cluster 2 includes S2 only with the last operation time being T2 . When a CSC system is queried, it determines the cluster that the query belongs to, and then processes the query. Different from previous semantic replacement strategies that used temporal locality only, CSC uses semantics in additional to temporal locality. This forms a two-level LRU cache replacement strategy. The first level LRU at the cluster level selects the cluster with the oldest time-stamp as a candidate cluster for further examination. Then, for all segments in the cluster, the second level LRU removes those segments which were least recently accessed. This process continuous until the cache space is sufficient to hold the new query, and then the new query is cached together with the query results. C. B&BGDSF Semantic Cache The B&BGDSF method was originally reported in [12]. As in CSC discussed above, B&BGDSF also considered semantics and time factors. But different from CSC, B&BGDSF moved forward from qualitative analysis to quantitative descriptions. When a new query result needs to be stored in a saturated cache, the most irrelevant queries must be evicted. A cache replacement needs to consider not only the constrains on the cache size but also the cost of access to other spatially distributed objects. B&BGDSF formally defines the cost, and then minimizes the cost with respect to cache replacement

TABLE III S EMANTIC CACHE INDEX ( UPPER PART ) AND CLUSTERING ( LOWER PART ). S S1 S2 S3 S4

SR Hotel Rest Hotel Hotel Cid

SA Sname Cno,Cname Sname,Price Sname,Cname Ct

SP 20 ≺ Price ≺ 40 Location=“Paris” Price ≺ 40 Price ≺ 20 Cseg

1

T4

S1 — S3 — S4

2

T2

S2

STRUCTURE

SC ST S Cluster 2 T1 1 5 T2 2 3 T3 1 7 T4 1

options and subject to the cache size constraints. This forms a constrained optimization problem. In B&BGDSF , the object with the smallest key value for an appropriately defined cost function is replaced. When an object i is requested, it is given a priority key Ki as follows: Ki = Fi · Ci /Si + L,

(1)

where Fi is the frequency of access to the object i from itself and all other nodes, Ci is the cost associated with bringing the object i into cache, Si is the size of the object i, and L is a running life factor that starts at 0 and is updated for each replaced object O to the priority key of this object in the priority queue, i.e., L = K0 . The cost Ci is a function of the CPU costs of in-cache query result (CCache ) and out-of-cache query result (CDisk ), i.e., Ci = fi (CCache , CDisk ).

(2)

The in-cache query result is extracted from within the cache, while the out-of-cache query result is re-computed from the other nodes in the mobile network. A simplified version of Ci is a linear combination of CCache and CDisk , i.e., Ci = Xi · CCache + (1 − Xi ) ∗ CDisk ,

(3)

where Xi is set to 1 if object i is kept in cache, or 0 otherwise. With the above quantitative definitions, a spatial cache replacement policy is given in the following constrained optimization problem: ∑n min i=1 (Xi · CCache + (1 − Xi ) · CDisk ) , ∑n (4) s.t. i=1 Xi · Si ≥ T where T represents the size of the new query result. The constraint in the above optimization problem means that the new query result to be cached must fit into the size of the cached queries to be replaced. The minimization function implies that the candidate set of objects to be removed must give the minimal in-cache and out-of-cache costs. D. CSDS Semantic Cache The CSDS method was presented in [13]. Notably, it considered the dynamic location factor. However, it did not consider semantics and time factors, which were considered in B&BGDSF . Its principle is described below. For Nodes A and B, P rioA and P rioB are the respective values of the replacement priority. For example, consider a scenario of the integral of distance over time. Node A stays in a place that is at a distance of 10 meters for approximately 5 minutes and in a place at a distance of 5 meters for approximately 5 minutes. The corresponding priority value is defined as P rioA = 10 m × 5 min + 5 m × 5 min = 75 m · min. Node B stays in a place at a distance 10 meters for approximately 10 minutes, thus P rioB = 10 m × 10 min = 100 m · min. As P rioB > P rioA , the replacement priority of node B is higher than that of node A. Let P rob(Mi , Ej , ∆t) denote the probability that a mobile device Mi will access the cell Ej during the time period [tc , tc + ∆t], where tc is the current time, and ∆t is time

4

increment. The cell Ej is a cell of a grid defined over a geographical area, where the mobile devices operate and wish to extract location-dependent data. V means the velocity of the mobile device Mi . We thus look ∆t time units ahead and predict how likely Mi will need Ej at that time [13]: P rob(Mi , Ej , ∆t) ≈ iDistM (Mi , Ej , ∆t) ∫ tc +∆t = distM (Mi .pos(t), Ej )dt tc

[ ] 1 = |V | (tx − tc )2 + (tf − tx )2 . (5) 2 where tx is the time the mobile device Mi passes the middle point of the cell Ej and tf is a future time point later than tc . It is understood from Eq. (5) that the closer a mobile device Mi gets to a grid cell Ej , the higher the probability that Mi will be interested in Ej . Those cells with large iDistM values will be reused least likely in the future, and thus will be chosen for cache replacement in CSDS. E. Motivations The three representative methods reviewed above have limitations when applied to real-world systems. This motivates our research in the present paper. While CSC was an early attempt to address two factors in mobile semantic cache, only qualitative analysis was formally reported in the literature. Also, there was a lack of quantitative descriptions of the technique in the CSC method. In comparison, our work in this paper not only considers more than two factors, but also describes our semantic cache scheme formally and quantitatively. As a quantitative semantic cache technique for mobile grid systems, B&BGDSF considered two factors: semantics and time. It described the problem quantitatively through constrained optimization. Therefore, it gave optimal solutions to the minimization of the cost of cache replacement while satisfying the cache size constraint. However, it did not consider location and other factors in mobile grid systems. By contrast, our work in this paper addresses location and other factors in a quantitative semantic cache scheme. Different from CSC and B&BGDSF methods, CSDS considered the dynamic location factor. The problem was also formally described with a probability that how likely a cell was needed by a mobile device. However, CSDS did not consider semantics and time factors as CSC and B&BGDSF did, limiting its applications in mobile grid systems. This motivates our work in this paper with consideration of multiple factors including dynamic location, semantics, time and more. Those factors are integrated in a unified cost model architecture for LDD optimization with semantic cache in mobile grid systems. To find a good trade-off among multiple factors, the concept of Nash equilibrium from game theory is adopted in the establishment of the cost model. The well-known knapsack problem heuristic [20] is used in our scheme to replace the cache segment with the highest cost model values. Our preliminary work presented in a conference [14] has considered three factors: dynamic location, semantics and

time. As a special case of the more general scenarios investigated in this paper, it is substantially extended in four aspects. 1) More than three factors are considered in this paper. They are quantitatively described in a unified optimization framework. 2) General solutions are derived at Nash equilibrium from game theory. 3) Practical algorithms are presented for implementation of the theoretical results. They are absent in the preliminary conference paper. 4) Comprehensive experiments are conducted to demonstrate the location-dependent semantic cache scheme. III. T HEORETIC S CHEME This section begins with definitions of some basic concepts. This is followed by mathematical modelling of the weights of mobile nodes for semantic cache in sensor grid systems. After that, the coefficients of the weights in the modelling are standardized. To find a trade-off among multiple weights, the concept of Nash equilibrium is introduced. Finally, theoretical solutions at Nash equilibrium are derived to the cost model with multiple factors. A. Definitions of Basic Concepts A quantitative investigation into LDD query optimization requires formal descriptions of the moving behavior and LDD queries of mobile users. The moving behavior of mobile users is characterized by location, speed and direction. To facilitate our theoretic development, a reference coordinate system is used in this paper. It is represented by a pair of coordinates (x, y) in Euclidean space. With this reference coordinate system, the location of a mobile user is characterized by a pair of coordinates (Lx , Ly ). The moving velocity V of the mobile user is represented by a vector < Vx , Vy > at the user’s location (Lx , Ly ), where Vx and Vy are velocities in the x and y directions, respectively. Moreover, the total numbers of mobile nodes, semantic cache blocks, and considered factors are denoted by I, J and K, respectively. Define three integer sets I, J and K as: I , {1, · · · , I}, J , {1, · · · , J}, K , {1, · · · , K}.

(6)

It follows that a mobile node i ∈ I, a semantic cache block j ∈ J, and a factor or constraint k ∈ K. With these three integer sets, the weight of a factor k ∈ K for the semantic cache block j ∈ J in mobile node i ∈ I is denoted by wi,j,k . The general weight of the semantic cache j ∈ J in mobile node i ∈ I for all considered K factors is represented by Wi,j . B. The Weights of Mobile Nodes The general weight Wi,j of the semantic cache j ∈ J in mobile node i ∈ I is defined as a linear combination of the weights of all individual factors considered for the semantic cache j in the mobile node i: Wi,j =

K ∑ k=1

αi,k × wi,j,k , i ∈ I, j ∈ J,

(7)

5

where ∀k ∈ K, αk ∈ R is a non-negative scalar coefficient for the weight wi,j,k . In our LDD query optimization for semantic cache, the weights at k = 1, 2 and 3 are defined as the semantics weight wi,j,1 , the time (life) weight wi,j,2 , and the location weight wi,j,3 , respectively. They are expressed as wi,j,1 wi,j,2 wi,j,3

= Fi,j

Ci,j , · Si,j

K ∑

∗ αi,k × wi,j,k , i ∈ I, j ∈ J,

(12)

k=1

with K ∑

∗ ∗ ∗ αi,k ̸= 1, αi,k ∈ R, αi,k ≥ 0, i ∈ I, k ∈ K.

(13)

k=1

= L, [ ] 1 |V | (tx − tc )2 + (tf − tx )2 . = 2

(9) (10)

C. Standardization of Scalar Coefficients {αi,k }k∈K Theorem 1: A general weight derived from a linear combination of multiple weights can be modelled by Eq. (7) with the coefficients αi,1 , αi,2 , · · · , αi,K satisfying αi,k = 1, αi,k ∈ R, αi,k ≥ 0, i ∈ I, k ∈ K.

(11)

k=1

Q11

Applying the following transformations to Eq. (12) αi,k = ∑

∗ ∗ αi,k Wi,j ∑ , W = i,j ∗ ∗ k∈K αi,k k∈K αi,k

(14)

gives a model in Eq. (7) that satisfies Eq. (11). Remark 1: Theorem 1 gives a standard linear combination model to describe the general weight from multiple weights. This simplifies theoretical investigations into the semantic cache problem. Remark 2: The proof process of Theorem 1 provides a way to standardize the coefficients of individual factor weights. Remark 3: When K = 3 for three factors of semantics, time and location, Theorem 1 becomes a refined form of Theorem 1 in our preliminary work in a brief conference paper [14]. This implies that Theorem 1 in our preliminary work [14] is a special case of the scenarios investigated in this paper. D. Game Balance and Nash Equilibrium With the cost model given in Eqs. (7) to (11), it is critical for a system to achieve a trade-off among multiple factors. The concepts of game balance and Nash equilibrium are introduced to derive such a trade-off. A game involves a number of players and deciding strategies. Consider a game of two players in Table IV, in which (value 1, value 2) are the gains or benefits for players 1 and 2, respectively. In this particular example, if a player chooses strategy M for monopolization, no gain will be achieved regardless what strategies the other player selects. So, the only solution that gives a win-win trade-off between both players is obtained when both of them choose to cooperate. In this case, the balanced gains are (cos θ1 , cos θ2 ). Geometrically, θ1 and θ2 are the angles of the balanced solution vector to player 1 as a mathematical vector and player 2 as another mathematical vector, respectively. The real win-win game balance will be achieved when cos θ1 = cos θ2 , implying that θ1 = θ2 . TABLE IV G AME BALANCE OF TWO PLAYERS WITH THEIR

P1

STRATEGIES AND GAINS .

Q1j

Q

Q1m

∗ Wi,j =

(8)

These three weights describe different relationships in a sensor grid system. The semantics weight wi,j,1 is the relationship among the query frequency, cost and size. The life weight wi,j,2 is the live period. The location weight wi,j,3 is the dynamic location relationship between query origins (e.g., cars) and query objects (e.g., restaurants). The three weights are treated to be independent. Thus, the three vectors corresponding to the three weights are orthogonal, enabling a neat mathematical treatment in a linearly independent condition. The physical meanings of the life and location weights are illustrated in Fig. 1. For example, a car with a client passes a series of n locations denoted by a set Φ = {P1 , ..., Pi , ..., Pn }, n ∈ N ; where Pi means a specific Location. At each location i, the car makes LDD queries {Qi,1 , · · · , Qi,j }, where Qi,j means the jth query at location i. For the series of n locations Φ that the car has passed, the whole set of LDD queries is formed as Ψ = {Q1,1 , ...Q1,j , ...Q1,m ; · · · ; Qi,1 , ...Qi,j , ...Qi,m ; · · · ; Qn,1 , ...Qn,j , ...Qn,m }, n, m ∈ N . When the car passes location Pn , the semantic cache blocks for location P1 are probably too old, e.g., cached a week ago, and thus can be replaced by new semantic cache blocks.

K ∑

Proof: Consider a model of the following form

Ca r

Qi1 P Qnj

Qn1 Pi

Qim

Q

Player 1: M (Monopolization) Player 1: C (Cooperation)

Player 2: M (Monopolization) (0,0) (1,0)

Player 2: C (Cooperation) (0,1) (cos θ1 , cos θ2 )

Qij Pn

P Qnm

Fig. 1. Illustration of the life and location weights.

Q

When a player chooses a strategy with complete information of the strategy the other player is choosing, the game is a complete information game. The game balance in this case is Nash equilibrium. It has been proven that if mixed strategies are allowed, then every game with a finite number of players

6

in which each player can choose from finitely many pure strategies has at least one Nash equilibrium. The concept of game balance is directly relevant to our semantic cache systems. Each factor considered in the semantic cache system corresponds to a player in game theory. Choosing a value for the weight and its coefficient is the selection of a deciding strategy. The multiple factors are multiple players engaged in a game. They also coexist in a system, which comprises common working, mutual interactions and constraints. Thus, they balance with respect to one another [21], [22]. This requires a synthetic consideration of the semantics, life, location and other factors in the cost model in order to draw a delicate balance among the multiple factors [21]. From this perspective, the relationship among the weight vectors of the semantics, life, location and other factors is in Nash equilibrium. For Nash equilibrium, the angle between the overall general weight vector to each of the individual weight vectors is the same. This has been shown in Fig. 2 of our preliminary conference paper [14] for the special case of three factors. For multiple factors, each of the mobile devices chooses scalar coefficients to balance its multiple weight vectors for cache replacement at Nash equilibrium. This leads to a solution with a trade-off between the costs in caching data and timeliness in responding to queries. E. Nash Equilibrium of Multiple Factors Theorem 2: Given K linearly independent vectors  Wi,∗,1 = [wi,1,1 , · · · , wi,j,1 , · · · , wi,J,1 ] ,    W i,∗,2 = [wi,1,2 , · · · , wi,j,2 , · · · , wi,J,2 ] ,  ··· ,    Wi,∗,K = [wi,1,K , · · · , wi,j,K , · · · , wi,J,K ] , K ∑

k=1

j=1

Proof: From Eqs. (16) and (17), we have (K ) ∏K ∑ ∥Wi,∗,k ∥ αi,k Wi,∗,k Wi,∗,1 k=1 = ··· ∥Wi,∗,1 ∥ k=1 (K ) ∏K ∑ ∥Wi,∗,k ∥ = αi,k Wi,∗,k Wi,∗,k k=1 = ··· ∥Wi,∗,k ∥ k=1 ) (K ∏K ∑ ∥Wi,∗,k ∥ αi,k Wi,∗,k Wi,∗,K k=1 . = (21) ∥Wi,∗,K ∥ k=1

(15)

Then, it follows from those relationships and Eq. (11) that ∑K ∥Wi,∗,k ∥ − ∥Wi,∗,k ∥ 1 αi,k = · k=1∑K , k ∈ K. (22) 2 k=1 ∥Wi,∗,k ∥ Substituting

and their linear combination Wi,∗ =

With Theorems 1 and 2, theoretical results can be established to set coefficients αi,1 , · · · , αi,k , · · · , αi,K to achieve Nash equilibrium for any given weights of the considered factors. They are summarized in the following theorem. Theorem 3: For K linearly independent vectors in Eq. (15) and their linear combination in Eq. (16) with the coefficients satisfying the condition in Eq. (11), the relationships in Eq. (17) for the Nash equilibrium point are achieved when setting v v u J K u∑ ∑ u J u∑ 2 2 t wi,j,k − t wi,j,k j=1 1 k=1 j=1 v αi,k = · , k ∈ K. (20) u 2 K J ∑ u∑ 2 t wi,j,k

v u J u∑ ∥Wi,∗,k ∥ = t w2

i,j,k

αi,k × Wi,∗,k , ∀αi,k ∈ R, αi,k ≥ 0,

(16)

k=1

the Nash equilibrium point of Wi,∗,1 , Wi,∗,2 , · · · , Wi,∗,K is Wi,∗ Wi,∗,k Wi,∗ Wi,∗,K Wi,∗ Wi,∗,1 = ··· = = ··· = . (17) ∥Wi,∗,1 ∥ ∥Wi,∗,k ∥ ∥Wi,∗,K ∥ Proof: The Nash equilibrium point is the point at which the angles between the vectors Wi,∗ and Wi,∗,k are the same ∀k ∈ K. The angle θk between Wi,∗ and Wi,∗,k satisfies cos θk =

Wi,∗ Wi,∗,k , k ∈ K. ∥Wi,∗ ∥ · ∥Wi,∗,k ∥

(18)

Letting cos θ1 = cos θ2 = · · · = cos θK

(19)

for Nash equilibrium and using Eq. (18) give the results in Eq. (17). This completes the proof. Remark 4: When three factors are considered for semantic, time and location, e.g., K = 3, Theorem 2 gives a refined form of Theorem 2 in our preliminary work in a brief conference paper [14]. Therefore, Theorem 2 in our preliminary work [14] treated a special case of the scenarios studied in this paper.

(23)

j=1

into Eq. (22) gives the result in Eq. (20). Remark 5: The expression in Eq. (22) is a compact vector form of Eq. (20) in Theorem 3. Remark 6: When three factors are considered for semantics, time and location, we have a special case K = 3. For K = 3, the results derived from Theorem 3 are more compact and elegant in mathematics than those from our preliminary work in a brief conference paper [14]. IV. A LGORITHM D ESIGN This section discusses how the theoretic results on semantic cache for LDD query are implemented concretely in algorithms. The implemented algorithms include Network Manager Algorithm, Query Generator Algorithm, and Semantic Cache Manager Algorithm. The Network Manager Algorithm manages the network among the clients and servers. The Query Processor Algorithm generates and processes queries via semantic caches and servers. The Semantic Cache Manager Algorithm calculates semantic cache parameters and manages how to store and replace semantic cache blocks.

7

A. Network Manager Algorithm Managing the network among the clients and servers, the Network Manager Algorithm implements three functions: Expressing Moving Node, Building Routing Table, and Searching Routing Table. It is depicted in Algorithm 1. The Function for Expressing Moving Nodes is shown in Lines 1 to 4 in Algorithm 1. This is implemented at each time instant for the whole time horizon considered in the system (Line 3). In Line 4, the changes in the coordinates (X, Y ) of the moving objects are used to characterize the movement behavior of the objects, including the location, movement speed and movement direction. The Function for Building Routing Table is implemented in Lines 5 to 11. Each mobile device has a coverage area of wireless communications. This coverage area is characterized by a communication radius, e.g., 100 meters in a typical application. If the distance between two nodes are within this radius (Line 7), then these two nodes are connected through wireless communications (Line 8). Otherwise, they are not connected (Line 10). Each of the wireless nodes in the system records in its routing table all nodes connected to it in 1, 2 and 3 hops (Line 11). The Function for Searching Routing Table shown in Lines 12 to 14 of Algorithm 1 is responsible for searching a node in routing tables. This is implemented hierarchically in two levels: the first-level local search (3.1 in Line 14) and the second-level remote search (3.2 in Line 14). The local search

Algorithm 1: Network Manager Algorithm Input: Nodes (empty) Output: Nodes (full) 1 2 3 4

5 6 7 8 9 10 11

12 13 14

1). Function for Expressing Moving Nodes: begin for T = 0 to Duration − 1 do Express the movement of the clients according to the changes in their coordinates (X, Y ). 2). Function for Building Routing Table: begin if The distance of two nodes ≤ Radius then The connection of the two nodes is TRUE; else The connection of the two nodes is FALSE; Each node records in its Routing Table all nodes connected to it in 1, 2 and 3 hops. 3). Function for Searching Routing Table: begin /* A recursive process, which calls different layers in the Routing Table according to different thresholds for hop 2. */ 3.1). Local search: Search the node in the Routing Table of its own node. 3.2). Remote search: Search the node in the Routing Tables of neighbouring nodes.

looks into the routing table of its own node, while the remote search deals with the routing tables of neighboring nodes. Implemented recursively, the search process is executed for different layers in the routing tables according to different thresholds for hop 2. B. Query Processor Algorithm Responsible for generating and processing queries, the Query Processor Algorithm is implemented in four main steps: query generator, query in the located node, query in a neighbor node, and query in the system server. The whole process is depicted in Algorithm 2. The algorithm starts with Query Generation (Lines 1 to 3). It generates a query at time t with a number of attributes such as query node number, type, location, scope, price and time. The query generation is treated as a random process because the query time, query node, and query attributes (e.g., query type, location, scope and price) are randomly generated from their respective value spaces. The system does not know which node will generate what query next with what attributes until the query is generated. For the generated queries, some of them are cached in the local database of the node, while all of them are submitted to the system server. Maintaining a complete copy of the query database, the system server acts as a query database server. How to express a query? In each node object, a DataTable structure is generated as a database structure, and each record in the database has an SQL query structure filled with

Algorithm 2: Query Processor Algorithm Input: Query, Query-Result (empty) Output: Query, Query-Result (full) 1 2 3

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

1). Query Generator. begin Generate a query at time t with a number of attributes, e.g., node number, type, location, scope, price, time, etc. This is treated as a random process. 2). Query Processing. begin Query in the located node first Search in the routing table of the node; if A query answer is found locally then Put satisfied records into Query-Results in the Semantic Cache of the located node; else 3). Query in neighbouring nodes. Search in the routing tables of neighbour nodes; if A query answer is found then Judge whether or not the data are repeating; else 4). Query in the server. Search in the routing table of the server; Output query results;

8

Boolean conditions for detail query information. For example, a Boolean condition is to clarify what types of hotels are queried: the location within a radius of 100 meters; or what levels of hotel rates are sought: High, Middle, Low; or what points of interests are searched: Hotels, Schools, Car Parking. After a query is generated, the algorithm processes the query in Lines 4 to 18. The semantic cache is located on the client side. When a query is requested, the algorithm tries to process it locally for an answer (Lines 6 to 9). If an answer is found locally (Line 8), then put the record into Query-Result in the semantic cache of the node (Line 9). Otherwise, the algorithm looks into the neighboring peers of the node for a remote answer (Lines 11 and 14). If the neighboring peers cannot answer the query as well, the query is sent to the system server for an answer (Lines 16 and 17). Finally, the algorithm outputs the query result (Line 18). C. Semantic Cache Manager Algorithm The core part of our algorithms is the Semantic Cache Manager Algorithm. It calculates semantic cache parameters and manages to store and replace the semantic cache. The algorithm is shown in Algorithm 3. The algorithm begins with calculation of general weights Wi,∗,k for each of the semantic cache blocks, k ∈ K (Lines 1 to 6). For semantic weight, Wi,∗,1 = Fi ∗ Ci /Si , which requires to find the begin and end addresses (i), the size Si , frequency Fi , and cost Ci , i ∈ I. For time weight, Wi,∗,2 = T , which needs to find the begin time, end time, and time 1 weight. The location weight [ ] is calculated as Wi,∗,3 = 2 · |V | · 2 2 (tx − tc ) + (tf − tx ) , which uses the information about time T , velocity V and location coordinates (Lx , Ly ), etc. Then, the algorithm calculates the available space and executes store/replace operations in Lines 7 to 25. If there is enough space to fit in the query results (Line 10), then store the query result in a number of steps (Lines 11 to 18): sort weights Wi,∗,k (Line 13), find the location to insert the query result (Line 14), store weights Wi,∗,k (Line 15), calculate coefficients αi,k in Line 16 from Eq. (20), calculate Wi,∗ in Line 17 from Eq. (16), and finally sort address weights. This completes the store operation. If the available space is not enough to store the query results, some of the semantic cache blocks have to be replaced (Lines 20 to 25). For this purpose, the number of the blocks to be replaced are calculated first (Line 22). Then, find the blocks and remove them (Line 23). After that, store the new block. Finally, sort general weights in Line 25. This completes the operation for semantic cache replacement. V. E XPERIMENTAL P ERFORMANCE E VALUATION This section evaluates the performance of the presented location-dependent query optimization scheme for semantic cache in sensor grid databases through simulation studies. It starts with a design of performance evaluation criteria. This is followed by experimental design and system specifications. After that, experimental results are described and analyzed. While K ≥ 3 factors are considered, a special case of three factors, i.e., semantics, time and location, is considered in the

experiments. This special case has been investigated in our preliminary study [14]. A. Criteria for Performance Evaluation Wireless sensor grid applications are fast growing in a massive scale. On November 11 in 2014, Alibaba took 38 min to break a 10 billion business turnover in Chinese currency. The total turnover of that day in Alibaba was as high as over 57 billion in Chinese currency, and 42.6% were through wireless communications. On the day of the Chinese New Year Eve in 2015 (February 18), the red packets as new year gifts through WeChat accounted a total number of over 1 billion times mostly through wireless mobile phones. In CCTV’s Chinese New Year TV program, the WeChat interactive ShakingShaking totaled over 11 billion times with the peak of 810 million times per minute. Chinese New Year greetings passed 3 trillion kilometers over 185 countries. With such a huge amount of wireless communication traffic over such a broad physical area, a fast query response, efficient cache and database operations, and significant wireless bandwidth saving are critical for a high-performance query

Algorithm 3: Semantic Cache Manager Algorithm Input: Semantic-Cache-Index (empty) Output: Semantic-Cache-Index (full) 1 1). Calculate weights Wi,∗,k ∀ k ∈ K. 2 begin 3 Calculate semantic weight: Wi,∗,1 = Fi ∗ Ci /Si ; 4 Calculate time weight: Wi,∗,2 = T ; 5 Calculate location weight: Wi,∗,3 ; 6 Calculate all other weights: Wi,∗,4 , · · · , Wi,∗,K ; 7 8 9 10 11 12 13 14 15 16

17

18 19 20 21 22

23 24 25

2). Calculate Available Space and then Store/Replace. begin Calculate the available space; if the available space ≥ the query result size then 3). Store the query result in the following steps: begin Sort general weights; Find the location for insertion; Store weights Wi,∗,k ∀ k ∈ K; Calculate coefficients αi,k ∀ k ∈ K from Eq. (20); Calculate the general semantic weight ∑K Wi,∗ = k=1 αi,k Wi,∗,k from Eq. (16); Sort address weights. else 4). Replace cache blocks in the following steps: begin Calculate the numbers of blocks to be replaced; Find the blocks and Remove them; Store the new blocks; Sort general weights;

9

system. This becomes obvious when we recall the 810 million WeChat interactive shaking-shakings per minute over wireless mobile phones. How fast a query is responded can be measured in query response time, while how efficient the cache and database are operated to answer queries are characterized by peer hit ratio. As the system server is generally far from most wireless clients in a practical sensor grid system, reduced communications with the server through an increased number of queries to peers implies a significant saving of the overall wireless traffic (as well as a reduced query response time). Thus, wireless bandwidth saving is also characterized by peer hit ratio. Therefore, two performance criteria are used in our simulation experiments to quantify the performance of the presented location-dependent query optimization scheme: Average Response Time and Average Peer Hit Ratio. The Average Response Time is calculated from the average of the response times for all queries over a period of time. The response time of a query is measure by the elapsed time from the time instant that a query is issued at a mobile device Mi to the time instant that Mi gets all the answers [23]. A shorter response time means better performance. The Average Peer Hits Ratio is derived from the concept of storage hits for queries originating from mobile clients. A storage hit occurs when a desired semantic cache block is found to respond to a query before the system server is requested. Therefore, there are two types of storage hits: storage hits on the local node, and storage hits on the peers. Out of the total number of requested cache blocks, the percentage of the number of cache blocks found in the local cache is the local (storage) hit ratio. Similarly, the percentage of the number of cache blocks that are not found in the local cache but are retrieved from the cache of the peers gives the peer (storage) hit ratio. For various semantic cache methods, the local hit ratio does not have much difference, while the peer hit ratio may differ noticeably. Therefore, more attention is paid to the peer hit ratio performance in this paper. A higher peer hit ratio shows better performance as it implies more short-range communications with neighboring nodes and consequently more overall wireless bandwidth savings [24] and lower costs of communications with the system servers [10], [25], [26]. B. Experimental Design In order to evaluate the two performance metrics, i.e., average response time and average peer hit ratio, a sensor grid system is designed with two servers and multiple mobile clients. It is interconnected through a wireless network. The mobile clients move in a rectangular area. They submit queries with location, time, and other attributes and requirements. Each of the mobile nodes also maintains its own semantic cache. When a mobile client submits a query, it is processed locally first through its own semantic cache. If the query is not fully answered, then it is sent to the neighboring nodes for further processing. If a full answer is still not available, the query is forwarded to the system server for a result [27]. In the mobile grid system, either or both of the servers maintain a full copy of the database and act as database servers. Each of the mobile clients is equipped with a number of function modules. The modules include network manager,

query processor, and semantic cache manager. The network manager maintains wireless network communications with other mobile clients and the central servers. The query processor generates and processes queries via the semantic cache. The semantic cache manager manages the store and replacement operations of the semantic cache. The program for simulation experiments is designed with two objects: QUERY and NODE. A QUERY in a NODE is the data source in the work flow of the simulation. Each NODE is composed of four sub-objects to store data: QUERY-RESULTS, TEMP-SEMANTIC-CACHEINDEX-BLOCK, SEMANTIC-CACHE-INDEX-ARRAY and SEMANTIC-CACHE. The names of those sub-objects explain the functions of the sub-objects. As shown in Fig. 2, the work flow of the simulation consists of the following steps: 1) At each time instant t, object QUERY generates a query requirement to specify the QUERY-NODE and QUERYTYPE. 2) From the query requirement, NODE acquires the information about NODE itself, neighbor NODEs and servers. Then, it stores this information in NODE.QUERY-RESULT. From NODE.QUERYRESULT, NODE extracts or calculates query parameters such as query type, size, cost, and begin and end addresses. It returns those parameters to object QUERY. 3) QUERY stores the query parameters into NODE.TEMPSEMANTIC-CACHE-INDEX-BLOCK. 4) Also from NODE.QUERY-RESULT, NODE calculates semantic cache variables including Space weight, T ime weight, Semantic weight, and other variables. Then, it stores the results into NODE.TEMPSEMANTIC-CACHE-INDEX-BLOCK as well. 5) From NODE.TEMP-SEMANTIC-CACHE-INDEXBLOCK, NODE searches system information in NODE.SEMANTIC-CACHE-INDEX-ARRAY, and decides which STORE, REMOVE and REPLACE operations should be performed. 6) From the information in NODE.SEMANTIC-CACHEINDEX-ARRAY, NODE performs NODE.SEMANTICCACHE operations. The core steps of above work flow are steps 4) and 5). The performance of the presented semantic cache scheme will be evaluated against the two performance criteria: average response time and average peer hit ratio. It will be compared with that of CSDS and B&BGDSF methods. The CSDS method has considered the dynamic location factor but not the time and semantics factors. The B&BGDSF method has addressed the time and semantics factors but not the dynamic location factor. Our semantic cache scheme has considered semantics, time and dynamic location factors as well as other factors. The CSC method is not demonstrated in our simulation results as it behaves with much poorer performance than CSDS, B&BGDSF and our semantic cache scheme. C. System Specifications and Scenarios The system settings of our simulation experiments are specified in Table V. Each of the mobile nodes moves in a

10

rectangular grid with the two sides being randomly generated between 50 m to 1 km and 100 m to 2 km, respectively. Its moving speed is between 1 unit/sec and 10 units/sec. The total number of mobile nodes varies between 100 and 5000, yielding a moderate-scale data set [23], [28], [29]. The wireless Radius is 100 m and may occasionally change between 100 m and 10 m. The wireless network bandwidth is set to be a typical value of 19.2kbps. The Maximum number of 2 hops for forwarding a routing message is chosen for attainment of good cache effects without high additional costs [6], [30], [31]. Two size parameters are specified below. Database relations, cached semantic segments and cache maintenance data are all physically stored in pages with the page size of 1, 000, 000. Client Cache defines the size of the memory cache on the client side, and this size is set to be 10, 000 in our experiments. Two time parameters are set as follows. The simulation Period is chosen between 100 sec and 5, 000 sec. This long simulation period is evenly divided into many time steps. In each of those time steps, the Average Response Time, Average Local/Peer Hit Ratio, and other parameters are computed. The time step is set to be 50, 100, 150 or 200 sec depending on how long the actual simulation period is. At the beginning of the simulation, all data are stored in the servers, and no mobile devices store any data in their local

storage. Then, at each time instant t, the system randomly picks us a mobile device to issue a query. Depending on the location of the mobile device, the query specifies the query type, price, time, and other attributes. The experiments consider various scenarios to evaluate the performance of the presented semantic cache scheme. Three types of queries are considered in the experiments: Restaurant, Hotel and Parking Lots. The experimental scenarios are characterized by two variables: semantic degree and location degree. The location degree indicates the degree of location deviation from the location center. It has the unit of percentage of its maximum possible location deviation. In our experiments, it takes 10 values in the range from 10% to 100% with a fixed increment of 10%. At every time t, the value of the location degree for a mobile device is randomly set from those 10 values. Then, the coordinates of the current location of the mobile device are uniquely determined by adding to its reference location the following deviation: (the maximum possible deviation) times (location degree in percentage). The semantic degree takes four values, from which a value is randomly chosen at every time instant for query generation. It is specified below for the types of queries generated: Semantic Restaurant Hotel Parking lots degree queries queries queries 1 33% 33% 33% 2 50% 30% 20% 3 70% 20% 10% 4 80% 10% 10%

6(0$17,&&$&+( ,1'(;$55$