capstone project document

MINISTRY OF EDUCATION AND TRAINING

CAPSTONE PROJECT DOCUMENT INDOOR POSITIONING SYSTEM

IPS Team

Group Members

Supervisor

Nguyễn Nam Hải Ngô Đăng Tùng Nguyễn Mạnh Quân Trần Tấn Nhật Giang Việt Hưng Phan Duy Hùng

Hanoi, April 10, 2014

01481 01524 01529 01561 01734

TABLE OF CONTENTS 1

INTRODUCTION AND MOTIVATION ................................................................... 2

2

RELATED WORK ................................................................................................. 4

3

RESEARCH OBJECTIVE ........................................................................................ 5

4

RESEARCH APPROACH ....................................................................................... 5

5

FOUNDATION .................................................................................................... 6

5.1 5.2 6

WI-FI SIGNAL ............................................................................ 6 INDOOR LOCALIZATION METHODS ............................................. 6

IMPLEMENTATION .......................................................................................... 12

6.1 6.2 6.3 6.4

EXPERIMENTAL SETUP ............................................................. 12 COLLECTING DATA .................................................................. 12 PROCESSING DATA ................................................................... 14 PROPOSED TECHNIQUES TO HANDLE DIFFERENCE OF MACS AND DEVICE SENSITIVITIES ........................................................................................... 23 7

EVALUATION RESULTS ..................................................................................... 26

7.1 7.2 7.3

SIGNAL STRENGTH CHARACTERISTIC ...................................... 26 EVALUATION CRITERIA FOR POSITIONING ACCURACY ............ 39 POSITIONING ACCURACY RESULT ............................................ 41

7.3.1 Pilot experiment (14-01-2014) ................................................................ 41 7.3.2 Extended experiment (27/02/2014) ....................................................... 42 Real size experiment......................................................................................... 44 7.3.3 ...................................................................................................................... 44 8

CONCLUSIONS ................................................................................................. 48

9

APPENDIX MATHEMATICAL BACKGROUND ..................................................... 49

9.1 9.2 9.3 10

PROBABILITY DENSITY FUNCTION ............................................ 49 HISTOGRAM METHOD ............................................................... 50 KERNEL FUNCTION ................................................................... 51

APPENDIX JAVA, MATLAB, C# CODE, EXCEL, DEMO APPLICATION ................... 54

10.1 10.2 10.3 BY SIZE

SOURCE CODE .......................................................................... 54 DEMO APPLICATION ................................................................. 54 DETAIL COMPARE BETWEEN MINIMUM SAME MAC AND NORMALIZATION 55

10.3.1 10.3.2 10.3.3 10.3.4 10.3.5 10.3.6 11

Euclidean norm ....................................................................................... 56 Gaussian ................................................................................................. 56 Kernel ...................................................................................................... 57 Histogram ............................................................................................... 58 Alignment ............................................................................................... 58 Summary ................................................................................................. 59

REFERENCES .................................................................................................... 61

PAGE 1

1

Introduction and Motivation

With the development of technology, the need in doing business and the need of human daily life, the capability of locating the positions of people, equipment, things, animals… is very important. Locating position still get a sparse attention from scholars and engineers over the world though it is not a new topic. By doing a literature research on internet, we can see that mobile device location awareness has long been an objective of mobility research for decade and now people are still working on it. In the context of location-based services or context-aware computing research area, locating position is a fundamental requirement for novel applications. Those applications are such as using positions of people, the system can provide some important information about the area they are locating (e.g. direction, guide, warning…), using positions of products on supermarket’s self in on-the-self product management system. However, some simple applications that just tells people where they are is enough. Basically, to determine positions we have to care about what position sensors can be introduced or might already available. As described in (Kjærgaard, 2010), some position sensors such as mobile feature phones, mobile smartphones, laptops, PDAs or a Global Positioning System (GPS) receivers (National Coordination Office for Space-Based Positioning, 2014), are good for determining position in case people are able to carry. Besides, the type of sensors such as a Radio-Frequency Identification (RFID) tag, an ultrasound tag or an ultra-wide band tag, can be attached to animal or equipment. In case we do not want people to carry the sensors or attach the sensor to animal or equipment but still able to determine position of a target (e.g. human, animals, cars...) sensors are attached to walls, columns… of buildings, this way also supports determine positions well. With any types of sensor that used to determine position, there is always a challenge that we cannot avoid, the impact of environment. GPS, in our knowledge is the best solution for outdoor positioning. GPS gives great results with outdoor environment. Even though working with outdoor environment means that the system has to cover a huge area which contains of high number of obstruction, and the weather is also a factor of noise, GPS overcomes them all. “Realworld data from the FAA show that their high-quality GPS Standard Position Service (SPS) receivers provide better than 3 meter horizontal accuracy (National Coordination Office for Space-Based Positioning, 2014). Unfortunately, GPS has a poor result with indoor environment, in other words, the reliability of GPS with indoor environment is very low and therefore, it is not a solution for indoor positioning. So, what is the solution for indoor positioning? Over the last decade the wireless communication technology has gained many great achievements, making mobile tools and mobile devices to be more powerful with its functionality. In the area of positioning, wireless technology enables engineers to develop location aware applications and hardware. The problem of positioning with indoor environment now haves many solutions and using Wi-Fi is considered to be one of directions of research for indoor positioning system solutions. Nowadays, mobile devices and many buildings, both commercial and residential are already equipped with off-the-shelf IEEE 802.11 wireless Ethernet, a popular and inexpensive technology (Andrew, et al., 2002). If a reliable indoor positioning system based on WiFi signal only is developed, then the system could be deployed to other systems, buildings and places. With a device that works with Wi-Fi, specifically IEEE 802.11 PAGE 2

wireless Ethernet, and that device could measure the signal strength of received signal, then this value could be used as a parameter for indoor positioning system. This direction is also our direction of research to solve the problem of positioning with indoor environment. There are also many challenges we have to pass through. Indoor environment is even more complicated than outdoor environment. The unpredictability of signal propagation through indoor environment makes it difficult to generate an adequate likelihood model of signal strength measurements (Brian, et al., 2006). Signal is impacted by obstructions significantly therefore, the received signal strength varies unpredictably. The “traffic” of indoor environment changes significantly and it is difficult to handle the influence of traffic to signal and the indoor positioning system. The structure, material, furniture inside a buildings are challenges to implement any methods of determining or tracking positions. Until now, there are many indoor positioning system has been developed and deployed. Existing approaches to do indoor positioning could be divide into two main categories. The first approach relies on an infrastructure that was designed especially for indoor positioning system. Systems such as the Active Badge, the Cricket, the Bat and the Ekahau positioning engine (EPE). However, those systems require specialized hardware combined with structure plan to build an indoor positioning system and those systems will not work when deploying to other places. It is very clear that this approach becomes very expensive and is not able to apply to a large-scale deployment. The second approach does not rely on any infrastructure, it is infrastructure-less approach. This approach takes advantages of wireless local network, in this case, we mean WiFi network and already stated above. This approach again can divide into two main categories as described in (Brian, et al., 2006). “The first class of techniques assumes knowledge about the locations of access points and then model the propagation of signals through space to determine the expected signal at any location based on distance from an access point (Bahl & Padmanabhan, 2000) (LaMarca, et al., 2005) (Siedel & Rappaport, 1992) (Ville, et al., 2009) (Yan, et al., 2013) (Youngsu, et al., 2012).” This technique is also called proximity, but the result is poor because of challenges of signal propagation. “The second class of techniques compute measurement likelihoods using location-specific statistic extracted from calibration data. These local statistics include histograms, Gaussians, or even raw measurements (Bahl & Padmanabhan, 2000) (Howard, et al., 2003) (Andreas, et al., 2004) (Andrew, et al., 2002).” This approach gives a better results than proximity. In this research, we follow second class, compute measurement likelihood. We build a grid models of our indoor testing environment. The area is divided into a coordinate grid and we attempt to map a user position to a point on that grid. The content of this work is organized as follow. Section 2 describes some related works, from which we gained knowledge about doing this research. Section 3 and 4 states about the objective and approach of this research. Section 5 provides the foundation knowledge about Wi-Fi signal and methods of localization, states the reason we choose location fingerprinting method. Section 6 describes the implementation of our experiments. This section provides knowledge about collecting Wi-Fi signal and how to process those data to then use those processed data to estimate position. We also propose two new techniques for positioning accuracy and one method to implement indoor positioning. Section 7 describes our analysis and PAGE 3

evaluation from the results of experiments. In this section, we also propose out new methods to handle the difference of RSS vector. Section 8 is about our conclusion about this research.

2

Related Work

Indoor positioning is not a new topic for scientists and engineers, there are many researches in this area and there are also some commercial products has been launched. Researches in this area apply many technologies such as wideband radio, ultrawideband radio, infrared, laser… and many algorithm types such as least squares, Bayesian, Euclidian, hidden Markov Models, Gaussian, and Kernel Method…Systems based on Wi-Fi signal strength measurements have received a lot of attention of researchers and engineers. This makes Wi-Fi signal strength measurements method to be one of the first approach when working with indoor positioning system. RADAR (Bahl & Padmanabhan, 2000) was one of the first system used Wi-Fi signal as a key to determine position. This system had to use many deterministic mathematical models to calculate the position in coordinates of a mobile device. Later, many researches have used Wi-Fi signal as the main input to determine position (Andreas, et al., 2004) (Brian, et al., 2006) (Hernandez, et al., 2009) (Howard, et al., 2003) (Kamol & Krishnamurthy, 2004) (Kjærgaard, 2010) (LaMarca, et al., 2005) (Lina, et al., 2013) (Ville, et al., 2009) (Yan, et al., 2013) (Youngsu, et al., 2012). Most of the researches used location fingerprinting as their main method. In Lina, et al., 2013, they pointed out that significant temporal variation of received signal strength is the main responsible for the positioning error. In their works, they tried to solve the problem of generating an accuracy Wi-Fi fingerprint database by proposing doublepeak Gaussian distribution algorithm. Their research concluded the positioning accuracy when applying new algorithm improved by up to 40%. Another research (Yan, et al., 2013) integrated human-centric collaborative feedback within a baseline Wi-Fi fingerprinting-based, their final result showed that in their best case, the confidence level reach approximately 90% and the position error is less than 5 meters, the result was improved from 50% of confidence level to 90%. Ville, et al., 2009 is a comparative survey of WLAN of location fingerprinting, they presented the mathematical formulation of the location fingerprinting methods, applies Bayesian and Kalman filters. This paper also provides the result of different algorithms of indoor positioning. There were many things introduced in this paper, the basic mathematical formulas and algorithms were described shortly, they provides a good experimental results but their implementation was not so clear. Andreas, et al., 2004 created a topological map for localization. In this work, they used Bayesian inference and the 802.11 wireless network protocol to produce a high-precision topological location inference technique. Because some applications does not need to provide an indoor positioning system with one to two precision for mobile device. That was the reason they used topological model, it means that the building, their indoor environment, was divided into cells, each cell map to a region of building, a cell could be a room, or a part of a room, a hall way or a part of hall way… Their positioning accuracy is about the percent of correct mapping between position and cell instead of a point, and they did not care about meters precision. Brian, et al., 2006 is a research about implementing Gaussian processes with signal strength. This paper focused only on PAGE 4

Gaussian process, the direction is using probabilistic mathematical model. Bayesian filter was introduced. Bayesian was used to estimate the posterior over a person’s location, conditioned on all measurements which obtained through a period of time. Particle Filter which is an implementation of Bayesian filter, was used for implementing tracking position. The result of this paper is very good, they showed that over 3 km of test data, the average error was just 2.12 meters. Location fingerprinting is considered as a choice that most of researcher and engineer prefer but Ali & Omar, 2008 and Ryan, et al., n.d. did research in different direction. They tried to take advantage of basic properties of signal and apply physic equation to solve the problem. By synchronizing the “clock” between transmitter (e.g Access Point) and receiver (e.g mobile device), they have the time of arrival, then combine with speed of radio wave and other factors they could get the distance between receiver and transmitter. This is a good step to determine position. However, this has many chance of error, because it is very difficult to synchronize, and a small error would lead to a significant error in positioning. Until now, there are already some commercial indoor positioning systems. Systems such as INSIDE (INSIDE, 2014), indoo.rs (indoo.rs, 2013), Ekahau (Ekahau, 2013). INSIDE is a product of INSIDE Start-up Company. This company comes from Israel. Their system, INSIDE provides a location-based services with sub meter accuracy, the positioning is very impressive. From many sources and not officially, they determine position by taking advantages of some features in smart phone. Using camera to locate position, then using sensors to track where users go… Indoo.rs is different, in term of technology. It uses Wi-Fi signal to locate position. It is not clear about its positioning accuracy but it already launched and is working. Ekahau is an older company compare to INSIDE and Indoo.rs. If INSIDE and Indoo.rs just works with smart phone, Ekahau provides different kind of devices to do positioning such as RFID tag, badge…

3

Research Objective

The objective of this research is to study indoor positioning from existing works to understand how to determine position with indoor environment. Doing research and testing experiments with our conditions to analyze the capability of indoor positioning methods. After that, make a conclude about implementing methods to see which method bring the best result and how the experimental results changes with different time periods, different devices, different area of testing.

4

Research Approach

The first research approach of this report is to study existing works about indoor positioning to gain the foundation knowledge. After that, the basic and most simple implementation of indoor positioning based on Wi-Fi signal will be applied to test the chance of success then a scope of research is be created. Another approach is that research group asks research questions, provides evidences then answers questions. This approach is applied in (Kjærgaard, 2010). “Evidences” are provided by evaluating measured data. To do this research, the main approach is to do experiments to get results then based on those results to evaluate a hypothesis or anything. At first, PAGE 5

experiments are set up based on observation of existing works, then make changes to set up our own experiments.

5 5.1

Foundation Wi-Fi Signal

Wi-Fi signal is one type of radio signal and it is also an electromagnetic wave. In addition, electromagnetic waves are classified by their wavelengths. Radio waves have wavelengths around 103 meters. Radio signal propagation is considered as a big concern when working with positioning. In vacuum, radio signal propagate at the speed of light, in this environment the speed will remain stable so combine with time delay between transmitter and receiver, and the distance between transmitter and receiver can be measured easily. However, with indoor environment which contains many things that can affect the speed, direction … of signal propagation significantly, the propagation signal speed is not able to use as a factor to locate a position. The typical range of an access point is up to 46 m indoors. The further to the access point the better Wi-Fi signal. IEEE 802.11b is a networking technology that is used widely for mobile devices such as laptops and smartphones, it operates in the 2.4 GHz band. It subdivides the used radio spectrum into a set of 13 channels. A beacon frame which stores information about the base station (Access Point) such as SSID, RSSI, MAC … and this beacon is sent out on 1 of 13 channels of IEEE 802.11b. To get information of a base station, mobile clients have to scan through 13 channels to receive the beacon, this is passive scanning, or mobile clients actively send a request to base station and wait for the response of the beacon from base station, this is active scanning. IEEE 802.11b does not have a standard for clients about how to measure signal strength. The standard just provides signal strength index with an integer value in range of 0 and 255 without association unit. Therefore, IEEE 802.11b client manufacturers are free to decide their signal strength values with an association unit, mostly in dBm, and different manufacturers has different mappings from signal strength index of 0 to 255 to dBm. Now, the value of receive signal strength is in range of 0 dBm to -100 dBm. Besides, different hardware has different sensitivity about receiving the same signal in the same place and at the same time, so received signal strength values are not the same between devices. 5.2

Indoor Localization Methods

The foundations knowledge in the context of localization are about how the current position is presented and how the current position is determined. Position Representation - Place: denotes a geographic site in the real world - Location: refers to as logic/sematic location. It describes the sematic of an area, e.g. home or Ba Dinh Square, in front of library, in the great hall…

PAGE 6

- Position: refers to exact point in the Euclidian Space or two- or three-dimensional coordinate. This term is indispensable for application requiring precise location information. In this report, we study about methods to determine the position in an indoor environment, more specifically it is spatial positions. Spatial positions based on reference system which contains a coordinate system.

Proximity Figure 1: Proximity model (Kushki, et al., 2012) To implement proximity sensing methods, the positions of all access points must be known. Proximity sensing is actually to compare value among the signal strength of the current position to all access points. Then with the highest signal strength value we can determine the current position is nearest to which AP. For example, the form of data when implementing proximity is S = {RSSI, AP, x, y} Si = {RSSIi, APi, xi, yi } i ∈ {1,2,..N} N is number of AP RSSIi is the value of signal strength of the APi at the current position APi is the AP ith xi is the x coordinate of AP ith yi is the y coordinate of AP ith From set S, we find the biggest RSSI value then we also get the AP and the coordinate of the AP. Now, it is clearly where the current position is nearest to. However, there are problems with this method. The accuracy depends on density of APs and it is also very difficult to distinguish between floor levels. PAGE 7

Lateration Lateration method estimates the distance between the mobile unit and the AP, the position of the AP is already known. This method has a big concern with Wi-Fi propagation properties. Distance is calculated by converting RSSI measurements to distance or applying TOA measurement. In the first technique, in an empty space it is working well because the signal strength is not effected by obstacles and interferences however, indoor environment is complicated enough to cause huge error in distance estimation. Therefore, applying this techniques need many efforts to handle environment factors. TOA measurement it also strongly depends on environment and propagation characteristic. TOA measurement measures the time difference between the sending of Wi-Fi signal request and the receiving confirmation. This shows the time travel between AP and mobile unit. Because the speed of Wi-Fi wave is already known the distance could be calculated simply. However, in practice, signal does not propagate directly from AP to mobile unit, it is affected by environment and might travel through many paths before reaching the mobile unit. To measure time difference, the clock of transmitter or AP and the clock of receiver or mobile unit need to be synchronized, because of the speed of radio wave, this synchronous must be down to nanosecond. A slightly error in measuring time can make the estimation of distance goes wrong dramatically. There are two approaches for positioning by means of lateration are circular lateration and hyperbolic lateration.

Figure 2: Circular lateration model (Kushki, et al., 2012) Circular lateration uses absolute distances. This approach requires at least three APs to be involved. With three absolute estimated distance d1, d2, d3 from a mobile unit PAGE 8

to three APs, three circle with radius is d1, d2, d3 are drawn, and the intersecting area is the position of mobile unit that we want to determine.

Figure 3: Hyperbolic lateration model (Kushki, et al., 2012) Hyperbolic lateration uses relative distances. It means that instead of using distances directly, the difference between distances is used. This approach uses TDOA measurements to measure the differences in propagation time from two APs to the receiver (mobile unit). Angulation The angulation method estimates the angles between signals. As shown in the figure x, the line connecting two APs is referred as an internal reference. With the angles 𝜃1 𝑎𝑛𝑑 𝜃2 the position of the mobile unit can be estimated.

PAGE 9

Figure 4: Angulation model (Kushki, et al., 2012) Pattern Recognition This method tracks the position of mobile unit by applying an algorithm to process pattern matching. A database of patterns in the deployment area is needed, those patterns are collected by camera that captures a sequence of images of the environment. Therefore, when a mobile unit that is equipped with a camera, than the sequence of images that is provided by mobile unit is used to track the position. Dead reckoning This method uses user’s movement history to predict his next position. To apply this method, many input parameters are required such as velocity (speed and direction), acceleration and elapsed time as well as measurements of environmental qualities including frictional forces. Location Fingerprinting Literally, a fingerprint is an impression left by the fiction ridges of a human finger (Wiki, 2014). Fingerprint is a technique of human identification. This technique is through a sequence of process. Firstly, fingerprint of people must be collected and each fingerprint sample is store in database. Fingerprint is unique for each people. Secondly, to identify a person by provided fingerprint they compare the provided fingerprint with all fingerprints in database, when it finds a matched one, the identification of that person is reveal. In location-awareness, that idea is applied. Every physical location has a fingerprint in Wi-Fi signal strength. We can observe it with the figure below.

PAGE 10

Figure 5: Wi-Fi signal strength fingerprint - x axis is order of MAC address of SSID, e.g x = 1 means that MAC address number 1, here we do not care about what that MAC address is yet. There are 215 MACs in this experiment. - y axis is the RSSI in a physical position. The location fingerprinting method relies on a database of Wi-Fi signal strength fingerprint, the position of each fingerprints is already known. This database is collected manually. To determine the current position of a mobile unit, the mobile unit scans Wi-Fi signal strength, the result is also a fingerprint, and then this fingerprint is compared with all fingerprints in database. This process finds the most similar fingerprint in database, once it is done, suppose that we get fingerprint 100 that has the coordinate (x, y) is (20, 30), we conclude that current position is next to (20, 30). The advantageous of this method is that it provides a good result in positioning and does not have to care about the nature of the propagation environment. However, its disadvantageous is depends on the structure, architecture of the indoor environment, once there are changes about structure fingerprint database need to be recollected. Besides, collecting data process is time consuming. Difference of device sensitivities is an issue. However, in this research we focus on this method for some reasons. This method is simple to understand and is easy to implement. We do not have to care much about radio propagation characteristic.

PAGE 11

6 6.1

Implementation Experimental Setup

Experimental Area There are about 40 Cisco Access Points (AP) in the first floor of Detech building, those APs are equipped and managed by FPT University, however we are not able to gain a detail information about those Access Point. There are no map of APs because they were equipped without plan. By observing manually, we can see that, there is one access point per room. With a large room and large area such as library, main hall, there are up to two access points. There is one access point in small hall way. Because the area of the first floor of Detech building is not so large, it is about 70m x 50m, therefore the density of equipped access points here is considered to be high and good for our experiments. In the first phase of research, we focus only on the main hall area in the first floor of Detech building instead of testing in the whole area, it means we are doing experiment in an area of 18 x 18 meter square. We are doing this for some reasons. Collecting data in the whole area would take more time. Testing the positioning accuracy in a small area, with small number reference points, so if the positioning result way bigger than 10m, we have to reconsider about a better methods for this research or even change the topic in the worst case. Furthermore, this area is small but it has an advantage is that it is free to access. Because of the current conditions of FPT University, there is no laboratory, no working area, no discussing area and we must do experiments for a long time, it means there should be no obstacle on access authority neither availability issue to use “experimental area”. Because the result of the research in a small area like this do not show much value about indoor positioning research, therefore, in the next phase, the area of experiment would be extended to the whole area of the first floor of Detech building. However, before implementing phase 1, IPS group already did a pilot which prove the chance of success in this topic for our instructor. The pilot research was implemented in a smaller area than phase 1 area, it is a half smaller. The more detail about experiments and their results are introduced in 7.3 section. Device overview in this experiment, we mainly use laptop HP Probook 450 G0 with Ratlink RT3290 802.11bgn WIFI-ADAPTER. 6.2

Collecting Data

Collecting data is also called training data which measures RSSI in the experimental area. The whole area is considered as a flat rectangular grid consists of small unit squares of size 2 meters. Conventionally, the set of nodes of the grid is also the set of Reference Points, that is to say, every Reference Point is distanced by 2 meters from their four adjacent neighbors connected by the side of unit squares. Regarding to collecting devices, in order to ensure the consistence of collected data, we are using only one device to measure RSSI for both Reference Points and Test Points. In term of collecting software, we customized InSSider software that written in C# language, running in a laptop with Wi-Fi supported, which enable us to manually set up collecting configuration. PAGE 12

The points in the grid are determined in a 2D coordinate originated at left-bottom corner of the grid. Therefore, the horizontal line is directed from left to right and the vertical line is directed from bottom to top. Each point in grid is represented by a pair of value (x, y). This convention would be employed in computation for the location of Test Point. The wireless signal strength is measured from all available Access Points. Each Access Point has a distinct MAC address and feature only one wireless ID. These are two of five fundamental factors for processing data includes SSID name, MAC address, RSSI value, Captured order in number and the Last time the signal is seen. All information are exported by InSSider and located in a csv file which is readable format for many platforms.

Figure 6. Grid and 2D coordinate.

Figure 7. Data collected in a csv file – exported by InSSider. PAGE 13

In addition, choosing appropriate measuring strategy in different physical condition is also need to be considered. We segregate the conditions into two cases: obstruction and non-obstruction. - Obstruction: in this case we keep the measurer’s front side towards to the entrance. By this way we mitigate the influence of the preventer on constrained wireless signal. - Non-obstruction: in this case we turn the front side of measurer towards to inside the area, the back side head to the entrance. By this way we ensure the received signal strength is the strongest possible signal. 6.3

Processing Data

After collecting data, we now construct a data structure representing the set of Reference Points (RP) called radio map. Geometrically, a radio map is a rectangular map divided into smaller identical squares that we call unit square or unit cell. We divide the radio map so that every center of unit squares is identical with RPs and there is only one RP in each cell. This implies that the number of RP is equal to the number of cell of radio map. We denote the radio map as B and number of cell is M. The ith radio map has the form 𝐵𝑖 = {𝑃𝑖,

{𝐴𝑖𝑗 | 𝑗 = 1, … , 𝑁𝑗}}

𝑖 = 1, … , 𝑀

where 𝑃𝑖 is the center of cell 𝐵𝑖, 𝐴𝑖 is the vector of RSSI of all MAC address, 𝐴𝑖𝑗 is the collected RSSI values of MAC address j and 𝑁𝑗 is number of captured signal from jth MAC address. Each center point Pi in the radio map has a unique coordinate – a pair of value (x, y) stands for the corresponding location of that point in horizontal and vertical axis. Therefore, the form of ith cell in radio map can be written as 𝐵𝑖 = {𝑃𝑥, 𝑃𝑦,

{𝐴𝑖𝑗 | 𝑗 = 1, … , 𝑁𝑗}}

𝑖 = 1, … , 𝑀

For each cell Bi, its RSSI set 𝐴𝑖𝑗 is a 2-dimensional vector of which the first dimension has string type represents the identity of MAC address and the second dimension is an array of integer value represents the strength of received signal. For example, a unit square contains 5 different MAC addresses can be seen as a table of 2 columns and 5 rows: MAC address id 00:3A:98:AE:3C:D4 00:3A:98:AE:3C:D7 70:8D:09:9A:E1:FE C8:D7:19:1B:11:8C 64:66:B3:94:CC:2C

RSSI -70, -75, -63, -60, -50 -84, -83, -98, -60, -44 -37, -28, -66, -52, -30 -60, -73, -47, -22, -50 -59, -74, -42, -22, -70 PAGE 14

We assume that in each cell the distribution of wireless signal is uniform distribution. The measurement vector of each center point Pi will be representative for the whole area of cell Bi in the radio map. In other words, the location of cell Bi will be interpreted as the location of Bi, and every point in cell Bi other than Pi will have identical measurement vector with Pi. We define volume of cell Bi is the number of RP in cell Bi as |Bi| then we have |Bi| = 1 for all i = 1…M. Theoretically, the condition of uniformly distribution in each cell of radio map is not binding: one cell can have more than one RP. However in that case, we must define for each RP its own area centered at that RP, so that the signal in the area is uniformly distributed. This will make the computation more complex. For simplicity, both data of RSSI in RPs and TP is represented by its mean value. In actual implementation, each point of radio map has following characteristic data fields:  The mean and variance of RSSI values of each MAC address  The histogram of RSSI values of each MAC address  The 2-dimension coordinate (x, y) position on the radio map Our target is to compute the coordinate of Test Point (TP). In general, we employ the following computation model:  Input: + Set of RPs + Mean of RSSI values of TP  Output: + Coordinate of TP We denote the location of TP is X, the RSSI values of TP is Y and the vector of mean value of each MAC address is 𝑌̂. We aim to compute value of X base on the data of radio map and values in Y. For each MAC address j of point Bi, we compute:  Mean value: ̂ 𝐴𝑖𝑗 =

̂𝑘 ∑|𝐴𝑖𝑗| 𝑘=1 𝐴𝑖𝑗 |𝐴𝑖𝑗|

 Variance value:

𝜎𝑖𝑗2

̂𝑗 𝑘 − ̂ ∑|𝐴𝑖𝑗| (𝐴𝑖 𝐴𝑖𝑗) 2 = 𝑘=1 |𝐴𝑖𝑗| − 1 PAGE 15

Where |Aij| is Number of elements in vector Aij  Histogram of RSSI of each MAC address: for each MAC address, its collected RSSI values can be represented as a histogram (Honkavirta, 2008). We use histogram to compute density estimation in probabilistic framework. Each histogram is defined by its maximum value MaxAij, minimum value MinAij, the value of unit height HeightAij and width of each bin WAij. We denote the histogram of jth MAC address in cell Bi as HAij. Cell Bi now has the form: 𝐵𝑖 = {𝑃𝑥, 𝑃𝑦, {𝐻𝐴𝑖𝑗 | 𝑗 = 1, … , 𝑁𝑗}} 𝑖 = 1, … , 𝑀 Because a histogram must cover all value in its data, the left-most bin must cover the minimum value and the right-most bin must cover the maximum value. In addition, all bins have the same width therefore the number of bins must satisfy the following mathematical relation: 𝑀𝑎𝑥𝐴𝑖𝑗 − 𝑀𝑖𝑛𝐴𝑖𝑗 𝑁𝐴𝑖𝑗 = ⌈ ⌉ 𝑊𝐴𝑖𝑗 Because the histogram represents the distribution of probability of RSSI values, it must strictly follow the rule that the sum in area of all bins equal to 1. Because of that, the unit height of histogram can be computed as: 1 𝐻𝑒𝑖𝑔ℎ𝑡𝐴𝑖𝑗 = 𝑁𝐴𝑖𝑗 ∗ 𝑊𝐴𝑖𝑗 To deduce an appropriate value for binwidth of histogram, we follow a heuristic by taking value of binwidth from a sequential values starting from 0.1 and examining the average measurement error until we obtain our acceptable value. Each successive value in the sequence differs exactly 0.1: the first value is 0.1; the second value is 0.2; the third value is 0.3 and so on. In fact, for our system we applied binwidth = 1 as it provides the most accurately result. Distance function In order to compute the “distance” between received data, we consider each point as a point in multi-dimensional space with the number of dimension is number of available MAC address. The distance between two points A, B is defined as: D(A, B) = ‖𝐴 − 𝐵‖ Where ‖𝑥‖ is norm of vector x.

PAGE 16

The p-norm is defined as: Norm p-norm

‖𝑥‖ 𝑛𝑥

1

(∑|𝑥|𝑝 )𝑝 𝑖=1

We implemented p-norm with p = 2 (Euclid-norm). Replace p = 2 and x = A – B, we have: 2 D(A, B) = 2√∑𝑛𝑥 (6.3.0) 𝑖=1(𝐴𝑖 − 𝐵𝑖) Where nx refers to the number of MAC address exists in RSSI values of A and B. Estimation algorithm In this section we use K-nearest neighbor (KNN) algorithm for estimation phase. The main idea is to choose K “nearest” RPs in term of minimal distance function between TP and RPs and compute coordinate of TP as a linear combination of RPs. We now define for each RP Bi a weight G. Coordinate vector of Test Point P is represented in following formula: 𝑃 = ∑𝑀 𝑖=1 𝐺𝑖 ∗ 𝑃𝑖

(6.3.1)

Here we consider two frameworks to compute weight G: Deterministic framework and Probabilistic framework. In Deterministic framework, every state is considered as a non-random set of RSSI values. That is to say, for a pre-determined state, the RSSI value of each MAC address is always determined. It also means that each possible value of RSSI of each MAC address is associated with one possible state. In Probabilistic framework, both state and RSSI value are treated as continuous random variable. In this case, in each point Bi in radio map the RSSI samples distribute in a domain of value. We need a clarity model of the relationship between RSSI values of each MAC address. For that purpose, we can assume that the values of RSSI vary around a known value defined by fingerprint (Honkavirta, 2008) and thus make the following measurement model reasonable: Y = h(x) + v(x) Where Y is vector RSSI values h(x) is assume to be a known measurement vector defined by fingerprint of state x v(x) is measurement noise PAGE 17

Because the characterization of RSSI values is the mean value, we take h(x) equal to the vector of mean values𝐴̂𝑖 . For the ith MAC address we have: ̂ + 𝑣𝑖 𝑌𝑖 = 𝐴𝑖 ̂ 𝑣𝑖 = 𝑌𝑖 − 𝐴𝑖

Thus (6.3.2)

Since the mean value is already determined after training phase, it is obvious that the probability of measured RSSI takes value 𝑌𝑖 is equal to the probability of measurement noise takes value 𝑌𝑖 − ̂ 𝐴𝑖. Deterministic framework with Non-weighted KNN Here non-weighted implies that every RPs has equal weight G = 1. Denote KNNSET as the set of nearest neighbor, the most common choice as location estimator X is the average of all nearest RPs (Honkavirta, 2008): ∑𝐾 𝑖=1 𝐾𝑁𝑁𝑆𝐸𝑇𝑖 𝑋= 𝐾 Where KNNSETi is the nearest neighbor RPs. In non-weighted KNN, the value of distance function contributes only in comparison step. In other words, every nearest RPs has equal degree of importance in the computation result. In fact, the closer RP has superior importance in determining state of TP. Therefore, we continue processing data with a more accurate algorithm called Weighted K-Nearest Neighbor. Deterministic framework with Weighted KNN Since the interest area can be considered as a convex-hull contains TP, we usually estimate state X as a convex combination of all Pi: X = ∑𝑀 𝑖=1 𝐺𝑖 𝑃𝑖 Where values of G must satisfy: ∑𝑀 (6.3.3a) 𝑖=1 𝐺𝑖 = 1 We define for each Reference Point a non-negative value W which is the inverse of the distance function: 1

Wi = 𝐷(𝑌,𝐴𝑖 ̂)

(6.3.3b)

From (6.3.2a) and (6.3.2b) we can write 𝐺𝑖 as: 𝐺𝑖 =

𝑊𝑖 ∑𝑀 𝑗=1 𝑊𝑗

(6.3.3)

From the equation (6.3.3), we take into account only K biggest weights and take the others as zero which called Weighted K-nearest neighbor (WKNN) method. By that, PAGE 18

each nearest neighbor RP has its own weight and they perform different contribution to computation result. In comparison to KNN method, it has improved significantly in accurateness while still maintain the same computation complexity. The complexity of KNN (WKNN) algorithm consists of three parts: ̂𝑗 is the average number of available MAC addresses. The cost for  Denote 𝑁 computing distance from all RPs to TP: O(M * ̂ 𝑁𝑗 )  The cost for sorting all above distances: typically we use built-in sorting function of programming language (in this case, Java and MatLab) which runs in average O(M * log(M))  The cost for electing K “nearest” neighbor and computing the final result: O(K) In overall, the total complexity is O(M * ̂ 𝑵𝒋 + M * log(M) + K) (6.3.4) Here O(x) stands for computational Big-O notation in average case. Theoretically, K can be any arbitrary number from 1 to the size of vector data. In practical we often chose K ranging from 1 to 5 with K = 3 and K = 4 often provide the best result (HONKAVIRTA, et al., 2009). In the simplest case K = 1, the algorithm is nearest neighbor (NN). If the number of RP is large enough, using NN can produce an acceptable measurement error. In deterministic framework, we use only the mean of RSSI values for computation and the rest of data is unused. Because the mean value is not always completely represents the whole data, especially in case the variance is large, the final result may not be efficient enough. The probabilistic approach being introduced below can be more efficient because all values of RSSI collected at calibration phase are employed in computation step and because of that, it can produce more reliable result. Probabilistic framework In the probabilistic framework, both state (location) of points on map and RSSI of each Access Point are considered as continuous random variable, distributed in the whole interested area. Because the distribution is continuous, the weight of ith RP now will be the value of density posterior for event: we are in location of ith RP given measurement vector Y. Denote probability density function as pdf , we have: 𝐺𝑖 =

𝑝𝑑𝑓(𝑖 |𝑌 ) ∑𝑀 𝑗=1 𝑝𝑑𝑓(𝑗|𝑌)

Using Bayes’ rule we have: 𝑝𝑑𝑓(𝑌 |𝑖 ) 𝑝𝑑𝑓(𝑖) 𝑝𝑑𝑓(𝑖|𝑌) = 𝑝𝑑𝑓(𝑌)

(6.3.5)

(6.3.6) PAGE 19

Where: 𝑝𝑑𝑓(𝑌|𝑖) is the likelihood 𝑝𝑑𝑓(𝑖) is the prior 𝑝𝑑𝑓(𝑌) is a normalizing constant Because the desired location can be any point on the map, that is if we randomly choose a location on map then each RP has the equal probability, we can write pdf(X) as: 1 𝑝𝑑𝑓(𝑖) = 𝑀 (6.3.7) To compute 𝑝𝑑𝑓(𝑌|𝑖), we observe that because every Access Points are independent, the components of the random vector 𝑣𝑖 are independent. Apply equation (6.3.2), we have: 𝑛𝑦 𝑝𝑑𝑓(𝑌|𝑖) = ∏𝑖=1 𝑝𝑑𝑓(𝑌𝑗| 𝐴𝑖𝑗) (6.3.8) There are multiple methods can be applied for computing the likelihood 𝑝𝑑𝑓(𝑌|𝑖) like Histogram method, Histogram comparison method or Kernel function method. We applied Histogram and Kernel function methods in our report: Method Histogram Kernel

Formula apply for vij pdf(x) = Haij(x) 𝑝𝑑𝑓(𝑥) =

𝑥− ̂ 𝐴𝑖𝑗𝑘 ) ℎ |𝐴𝑖𝑗|ℎ

∑|𝐴𝑖𝑗| 𝑘=1 𝐾(

The chosen Kernel function is Gaussian: exp(− 𝐾(𝑥) =

𝑥2 ) 𝜎𝑖𝑗2

√2𝜋𝜎𝑖𝑗2 It has been proved (Anon., n.d.) that the optimal choice for bandwidth of Kernel function h is: 5 4𝜎𝑖𝑗

1

h = (3|𝐴𝑖𝑗|)5 Replace equation (6.3.6) and (6.3.7) to (6.3.5), we have: 𝑝𝑑𝑓(𝑌|𝑖 ) 𝐺𝑖 = ∑𝑀 𝑝𝑑𝑓(𝑌|𝑗)

(6.3.8)

𝑗=1

Using this weight, we can compute the estimate of location X. We applied two estimations: +) Maximum-a-posterior estimation (MAP): This estimation is applicable only in case the posterior (and also the weight) has the maximum value in only one cell, otherwise the estimation would be ambiguous. Let’s call that cell is Bi with center point is Pi, the center of that cell will be the result of estimate: PAGE 20

𝑋𝑀𝐴𝑃 = 𝑃𝑖 From equation (6.3.6) and (6.3.7), because both 𝑝𝑑𝑓(𝑖) and 𝑝𝑑𝑓(𝑖) are constant values, we can conclude the relationship between the posterior and the likelihood is increasing. Therefore, choosing maximum posterior value is equivalent to maximum likelihood estimation. +) Mean of posterior estimation: 𝑝𝑑𝑓(𝑖 |𝑌)

𝑋𝑀𝐸𝐴𝑁 = ∑𝑀 𝑖=1 ∑𝑀

𝑗=1 𝑝𝑑𝑓(𝑗|𝑌)

∗ 𝑃𝑖

(6.3.9)

The probabilistic framework has higher computational complexity than deterministic framework because we need to traverse all elements of RSSI values in each MAC address instead of caring only the mean values. This can be illustrated by the formula of Kernel function applied for equation (6.3.8): we go through all elements in vector Aij, compute and pass parameter to Kernel function. This increase the cost for ̂𝑗 ). This number can be as computing the weight of each RP from O(M) to O(M * 𝑁 large as the capacity of mobile unit i.e. the maximum number of wireless signals that a mobility device can recognize. The complexity of Probabilistic framework can be written as:  Denote 𝐴̂𝑖𝑗 is the average number of RSSI at each available MAC addresses The cost for computing weight of all RPs: O(M * ̂ 𝑁𝑗 * 𝐴̂𝑖𝑗 )  The cost for sorting all above distances: as above, utilized built-in sorting function of programming language (Java and MatLab) runs in average O(M * log(M))  The cost for electing K biggest weights and computing the final result: O(K) ̂𝒊𝒋 + M * log(M) + K) In total: O(M * ̂ 𝑵𝒋 ∗ 𝑨 (6.3.10) We observed that Kernel method often provide higher precision than Histogram method. This comes from the fact the Histogram can only represent a finite data range divided into smaller intervals. If the data from calibration phase is incomplete due to the calibration time is not long enough or the environment factors like obstruction, varying direction when collecting data, we may end up process some incorrect gaps bins with zero probability. Kernel method overcame this limitation by derive the discontinuous collected data into continuous function that handle the entire space and make it able to fill the inadequate data. Analysis complexity by the size of floor Since value of K – number of “nearest neighbor” often much smaller than M – number of RPs, we can ignore this factor in formula (6.3.4) and (6.3.10). These two formulations now have the form: Deterministic framework with Euclid: O(M * ̂ 𝑁𝑗 + M * log(M)) (6.3.11) PAGE 21

Probabilistic framework: O(M * ̂ 𝑁𝑗 + M * log(M))

(6.3.12)

Assume our floor has total area is S Because our floor consists of nothing but a set of non-overlap unit squares, the total in area of all unit squares must be equal to the area of the floor. Denote the length of edge of unit square is R, we have: R*R*M=S Thus

𝑀=

𝑆 𝑅2

(6.3.13)

Often the value of S is not known in advance, in order to adjust value of M we can only adjust value of R. In fact, to choose an appropriate value of R we need to consider following factors:  Accurateness: our assumption is that the wireless signal will be uniform distribution in the whole unit square, but by examining the data we observed that this condition hold true for only a small area. As we expanding the size of unit square, the signals become more inconsistence. Because of that, increasing the size of unit square i.e. expanding the distance between each RPs will decrease the accurateness of computation and vice versa.  Performance: intuitively we want to choose R as small as possible, however from equation (6.3.13) we conclude that decreasing R will lead to increase number of RPs and consequently decelerating our system’s performance. In this case, we must take into account the scalability of system. Our expectation is for a TP lies completely inside a unit square, the estimate error will not exceed the size of that square i.e. the Euclid distance between the estimate result and the actual location is less than or equal to the length of square’s edge. For this reason we will choose value of R equal to our expectation of measurement error. If we take an upper bound for acceptable error is 2 (meter), then we set R = 2. To appraise the efficiently of this value, we computed value of M for R = 2 with assumption that our system runs for largest area building in the world (wiki, n.d.): Building New Century Global Centre, China

Total floor area 1,760,000 m2

Value of M for R = 2 440,000

For M up to 440,000 and all data processing are handled by computer server (not by mobile device), we can achieve an acceptable performance which still far from the upper bound of acceptance. We must also consider the factor ̂ 𝑁𝑗 and ̂ 𝑁𝑗 ∗ 𝐴̂𝑖𝑗 in each method which are need to be limited appropriately. If this value is large e.g. up to 100 and in the worst case, the complexity is up to O(440 * 1E6) and it will takes around 4 second to complete the task which is extremely inefficient. PAGE 22

6.4

Proposed techniques to handle difference of MACs and device sensitivities

Let’s look at the weight-calculated formula using Euclidean norm for example: 𝑊𝑖 =

1 𝑛𝑖 (∑𝑗=1 (𝑦𝑗

2 1/2 − 𝑎̂ 𝑖𝑗 ) )

where 𝑛𝑖 is the number of the same MACs between measurement vector y and reference vector 𝑎𝑖 . Every weight-calculated formula was designed with one purpose: the closer the reference point to the test point, the bigger the weight is. In case of Euclidean norm, the closer the reference point the less different between y and 𝑎𝑖 is, therefore the bigger the weight. However, as the reference point go further, 𝑛𝑖 also has the trend to decrease and make the weight become bigger. In a small area and a few of access points, 𝑛𝑖 is almost the same for all reference points, but with larger area and hundreds of access points, the above phenomenon would significantly influence measurement results. Not only Euclidean norm but also all other methods have that kind of problem (see section 7.3.3 Real size experiment for more details). In order to deal with this problem, we propose two improvement methods: Minimum same mac and Normalization by size and one estimation method: Alignment 

Minimum same MAC

This method is based on an assumption that the closer reference points to test points, the bigger the number of same mac between them. If 𝑛𝑖𝑗 refer to the number of same mac between test point 𝑇𝑖 and reference point 𝐵𝑗 , with every test point we can set a lower bound 𝑙𝑖 that if 𝑛𝑖𝑗 < 𝑙𝑖 reference point 𝐵𝑗 will be consider “out of scope” and it will not be included in the estimation. Different choosing of 𝑙𝑖 will have different impact to the estimation result. In this paper, we take 𝑙𝑖 as the 20th biggest 𝑛𝑖𝑗 . 

Normalization by size

As saying above, different in 𝑛𝑖𝑗 make it difficult for comparison the weights of reference points. Thus, instead of comparing sum/composition of each reference point, we will use average/geometric mean. Now, formula for our estimation method becomes: - Euclidean norm: 𝑊𝑖 =

1 1 𝑛𝑖 2 1/2 𝑖𝑗 ) ) 𝑛𝑖 (∑𝑗=1(𝑦𝑗 − 𝑎̂

PAGE 23

- Gaussian, Kernel and Histogram 𝑛𝑖

1 𝑛𝑖

𝑊𝑖 = (∏ 𝑝(𝑦|𝑥𝑖 )) 𝑗=1



Alignment method

The idea of this method starts with an observation: with different devices, the values of RSS vector may be different but the order between MAC address itself seems to remain. If we can calculate the similar between order of MAC addresses (sort by RSS values) of test points and reference points, not only we can determine which reference points is closer, but also eliminate the problems cause by different devices. In order to calculate the similar between the orders of MAC address, NeedlemanWunsch’s algorithm (Needleman & Wunsch, 1970) was chosen. This one was the foundation of many algorithms used in bioinformatics for DNA comparison. Below is the detail of this algorithm: Consider two strings 𝐴 = 𝑎1 𝑎2 … 𝑎𝑛 and 𝐵 = 𝑏1 𝑏2 … 𝑏𝑚 𝑠(𝑎, 𝑏) is similarly function between two elements a, b (match or mismatch) A matrix 𝐻(𝑖, 𝑗) with (0 ≤ 𝑖 ≤ 𝑛; 0 ≤ 𝑗 ≤ 𝑚) was set up: 𝐻(0,0) = 0 𝐻(𝑖, 0) = 𝐻(𝑖 − 1,0) + 𝑠(𝑎𝑖 , −) 𝐻(0, 𝑗) = 𝐻(0, 𝑗 − 1) + 𝑠(−, 𝑏𝑗 ) 𝐻(𝑖 − 1, 𝑗 − 1) + 𝑠(𝑎𝑖 , 𝑏𝑗 ) 𝑀𝑎𝑡ℎ/𝑀𝑖𝑠𝑚𝑎𝑡𝑐ℎ 𝐻(𝑖, 𝑗) = 𝑚𝑎𝑥 {

𝐻(𝑖 − 1, 𝑗) + 𝑠(𝑎𝑖, −) 𝐼𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛

}

𝐻(𝑖, 𝑗 − 1) + 𝑠(−, 𝑏𝑗 ) 𝐷𝑒𝑙𝑒𝑡𝑖𝑜𝑛 ‘−‘ denotes for empty element 𝐻(𝑛, 𝑚) is the result represents for similarity of two strings A, B The return value of 𝑠(𝑎, 𝑏) maybe vary from problem to problem. In this paper, we take: 𝑚𝑎𝑡𝑐ℎ = 2, 𝑚𝑖𝑠𝑚𝑎𝑡𝑐ℎ = 𝑖𝑛𝑠𝑒𝑟𝑡𝑖𝑜𝑛 = 𝑑𝑒𝑙𝑒𝑡𝑖𝑜𝑛 = −1.

PAGE 24

Let’s look at one example: we have two string: AAUGCCAUUGACGG and CAGCCUCGCUUAG The matrix 𝐻(𝑖, 𝑗) is like: 0

-

C 1

A

1

A

G G

-

-

0

3

-

0

3

6

5

4

3

2

2

1

0

-

2

5

5

4

3

2

1

1

3

-

-

-

1

4

7

6

5

4

4

3

2

2

-

0

3

6

6

5

4

6

6

5

4

10 11 12 13 14

-

2

5

5

8

7

6

5

5

7

-

1

4

4

7

7

6

5

7

6

0

0

3

6

6

9

8

7

6

6

-

2

5

8

8

8

7

6

8

-

1

4

7

7

7

7

6

8

1

5

-

6

2

3

-

1

0

1

U 11 8 8 5 2 1

A 12 9 6 6 3 0

3

4

9

C

1

-

-

1

6

5

1 0

-

-

0

C 9

5

4

2 1

-

-

-

8

4

3

G -

2

3

8

A

-

-

7

G

2

-

2

-

2

C 7

3

1

2

6

U

3

-

-

-

0

U 6

2

1 -

-

-

0

0

C 5

1

4

5

U

4

0

1

-

-

A

1

4

3

4

C -

1 -

-

C

-

3

2

3

C

G 13 10 7 7 4 1 1 2

1

-

-

G

U 10 7 7 4 1 2

2

G -

1

2 U

A -

1 -

2 -

6

2 -

3 -

7

3 -

4 -

8

5

9 10 11

1 2

6

2 -

3 -

7

4

8

1

1

5

1 -

2

2

Red numbers in table show the trace of the transformation. According to this, two string have similarly score of 8 and the longest common subsequence of them as below: AAUGCCAUUGAC - - GG CA - GCC - UCG - CUUAG

PAGE 25

7 7.1

Evaluation Results Signal Strength Characteristic

After collecting and analyzing signal, we recognized some characteristics about signal strength. Firstly, in general, the bigger distance from the access point, the weaker the signal strength is. But it is not absolutely right that signal strength of the further point is always weaker than signal strength of a nearer point. We can see it clearly in figure

9. Figure 9. The signal strength of an access point in test area. Figure 9 is signal strength map of an access point (FU-Staffs 00-24-97-76-5e-f4) is located in point (1, 5). The point (1, 5) is not the point has the strongest signal strength even it is the nearest point from access point. And you can see many examples show that signal strength of a nearer point is not always stronger than signal strength of a further point. The signal strength of a point not only depends on the distance from access point but also depends on the environment. In this area, we have trees, pillars… The obstacles can change signal strength at one point. Signal distance (the differential of signal strength between 2 points) is not direct proportion with the real physical distance between 2 points. It shows that convert the signal distance between 2 points to the real physical distance between them is not easy and could lead to big error in calculation.

PAGE 26

Secondly, with two signals are transmitted from an access point. The signal strength at one point is different but the distribution is almost same.

Figure 10. The signal strength of the network FU-Guest is transmitted from access point 00:24:97:76:5e

Figure 11. The signal strength of the network DH-FPT is transmitted from access point 00:24:97:76:5e Figure 10 and 11 are the signal strength maps of two networks that are transmitted from same access point located at position (1, 5). We can see that the signal strength of a same point is quite different but the distribution is same. Thirdly, we consider the distribution of Wi-Fi signal at 1 point. These following data is collected 100 times.

PAGE 27

Data at 1 point from 1 device but different access points (These data were collected at the same time).

00:3A:98:AE:3C:D0 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 12. The distribution of signal from access points 00:3A:98:AE:3C:D0 (Mean = -67.9836 dBm)

00:24:97:82:BF:70 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 13. The distribution of signal from access points 00:24:97:82:BF:70

PAGE 28

(Mean = -76.6154 dBm)

00:3A:98:AE:3D:50 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 14. The distribution of signal from access points 00:3A:98:AE:3D:50 (Mean = -62.2472 dBm)

00:3A:98:AE:3D:C0 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

PAGE 29

Figure 15. The distribution of signal from access points 00:3A:98:AE:3D:C0(Mean = -62.7765 dBm)

00:23:04:5B:8C:70 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 16. The distribution of signal from access points 00:23:04:5B:8C:70 (Mean = -77.3333 dBm) In figures from 12 to 16, we can see that the distribution of Wi-Fi signal is not like any distribution. But the distribution is quite centralized. The signal strength only distribute on a small interval (under 10 dBm) that makes the mean of this data become reliable when we choose the mean to represent for each data. Data at 1 point from 1 access point (00:23:33:A3:A4:00) but different devices (These data were collected at the same time)

HP Probook G0450 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 17. The distribution of signal from laptop HP Probook G0450

PAGE 30

(Mean = -64.6800 dBm)

Dell N4030 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 18. The distribution of signal from laptop Dell N4030 (Mean = -62.6552 dBm)

Dell Inspiron 7250 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 19. The distribution of signal from laptop Dell Inspiron 7250 (Mean = -62.9691 dBm)

PAGE 31

Vaio VPCEJ 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 20. The distribution of signal from laptop Vaio VPCEJ (Mean = -62.2300 dBm)

Lenovo P770 60 50

Times

40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 21. The distribution of signal from cell phone Lenovo P770 (Mean = -71.0392 dBm)

PAGE 32

Figure 22. The distribution of signal from tablet Kindle Fire HD (Mean = -59.2800 dBm) The next problem is that the signal strength from dissimilar devices are different. In figures from 17 to 22, we can see that not only the distribution but also the mean of Wi-Fi signal which are collected from dissimilar devices are quite different. Especially, there is a big strength difference between the data that is collected by laptop and the data that is collected by cell phone (about 9 dBm). Moreover at a point, the number of collected signal between dissimilar devices is also different:      

Laptop HP Probook G0450: 51 signals Laptop Dell N4030: 91 signals Laptop Vaio VPCEJ: 50 signals Laptop Dell Inspiron 7250: 72 signals Cell Phone Lenovo P770: 37 signals Tablet Kindle Fire HD: 77 signals

The reason of difference is that the standard of dissimilar network cards are not the same. Moreover, the structure of devices is also a reason cause the strength difference. Data at 1 point from 1 access point but different time.

PAGE 33

Laptop (Vaio VPCEJ):

08:54 AM 70 60

Times

50 40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 23. The distribution of signal from laptop Vaio at 08:54 AM (Mean = -62.7300 dBm)

09:02 AM 70 60

Times

50 40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 24. The distribution of signal from laptop Vaio at 09:02 AM

PAGE 34

(Mean = -63.3700 dBm)

09:16 AM 70 60

Times

50 40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI


09:24 AM 70 60

Times

50 40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI


PAGE 35

Cell Phone (Lenovo P770):

09:12 AM

70 60

Times

50 40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 27. The distribution of signal from cell phone at 09:12 AM (Mean = -68.8200 dBm)

09:24 AM 70 60

Times

50 40 30 20 10 0 -58

-60

-62

-64

-66

-68

-70

-72

-74

-76

-78

-80

-82

-84

RSSI

Figure 28. The distribution of signal from cell phone at 09:24 AM (Mean = -70.2200 dBm)

PAGE 36

Tablet (Kindle Fire HD):

09:17 AM 70 60

Times

50 40 30 20 10 0 -57

-59

-61

-63

-65

-67

-69

-71

-73

-75

-77

-79

-81

-83

-85

RSSI

Figure 29. The distribution of signal from tablet at 09:24 AM (Mean = -59.2800 dBm)

09:23 AM 80 70 60

Times

50 40 30 20 10 0 -57

-59

-61

-63

-65

-67

-69

-71

-73

-75

-77

-79

-81

-83

-85

RSSI

Figure 30. The distribution of signal from tablet at 09:24 AM (Mean = -58.6500 dBm) In figures from 23 to 30, we can see that the signal strength is different at different time. This data is collected at the same condition, same device but different moments. The mean of these data is a little bit different between collected time and this problem appears in all devices (laptop, tablet, cell phone). That is the reason causes we found it is very difficult to determine whether user is standing or moving if we use only WiFi signal data to positioning.

PAGE 37

The impaction of human to signal strength.

8 people 70 60

Times

50 40 30 20 10 0 -57

-59

-61

-63

-65

-67

-69

-71

-73

-75

-77

-79

-81

-83

-85

RSSI

Figure 31. The distribution of signal from laptop in a room with 8 people (Mean = -62.7381 dBm

21 people 70 60

Times

50 40 30 20 10 0 -57

-59

-61

-63

-65

-67

-69

-71

-73

-75

-77

-79

-81

-83

-85

RSSI

Figure 32. The distribution of signal from laptop in a room with 21 people (Mean = -62.8370 dBm)

PAGE 38

38 people 70 60

Times

50 40 30 20 10 0 -57 -59 -61 -63 -65 -67 -69 -71 -73 -75 -77 -79 -81 -83 -85

RSSI

Figure 33. The distribution of signal from laptop in a room with 38 people (Mean = -64.0213 dBm) In figures from 30 to 33, we can see the impact of human to signal strength. When the number of people in the room increases (from 8 people to 21 people and to 38 people), the mean of data decreases (from -62.7381 dBm decrease to -62.8370 dBm and decrease to -64.0213 dBm). That mean although you keep standing at a point but the data (and the mean of data) that is collected at that point always change. That is also a reason causes we cannot determine whether user is standing or moving if we use only Wi-Fi signal data to positioning. 7.2

Evaluation Criteria for Positioning Accuracy

Like all others evaluation processes, before we take the comparative survey between estimation methods, we have to decide the criteria, by which we can determine one method is better than another. The most common criteria used in articles is average errors between test points and their estimated points, or sometimes maximum and minimum errors. However, our team see some problems in that kind of criteria. First, let us assume there are two estimation methods, A and B, and the following table shows the estimation errors (in meters) of 5 test points: Test point T1 T2 T3 T4 T5 Average error Min error/ Max error

Method A 1.2 3.2 0.5 8.3 4.0 3.44 0.5 / 8.3

Method B 1.0 2.5 0.6 12.0 3.0 3.82 0.8 / 12

According to the tradition criteria, method A with small result in all three criteria (average, min and max errors) is definitely better than method B. However, if we look closer in every test points, we can see that method A only better than method B in 2 out of 5 test points. In those 2 test points, the T3 is only slightly better, and T4 is PAGE 39

actually out of scope, in indoor positioning, an error of 8 meters has no more meaning than an error 12m because both are outside of the building. Thus, for practice purpose, method B will provide better performance. Second, consider two methods A, B which give us following estimate result for the same test point: If we take the average location of 5 reference point with biggest weight (K Nearest Neighbors), method A will have better result according to the term of error between test point and estimated point. However, as you can see method B naming 5 reference points which are so much closer to test point than method A. By another word, the better estimated result of method A is just an “accident”. In order to avoid above problem, in this article, beside of three traditional criteria (average, min and max errors) we also calculate two more criteria: Percentages of errors which are smaller than a threshold Total error: sum of distance between 5 biggest-weight reference points to the test point With these new criteria, result of above example will look like:

Method A With: blue circle: reference points red circle: test point black circle: estimated point yellow circle: reference point with biggest weight

Method B

PAGE 40

Ex1: 6 5 4 3 2 1 0 0.5

0.6

1

1.2

2.5

3

Method A

3.2

4

8.3

12

Method B

As you can see, in the “meaningful” area (