experiences with experiments in ambient ... - Semantic Scholar

5 downloads 35761 Views 330KB Size Report
The development of activity recognition techniques relies on the availability of ... the beginning and the end of each activity using custom software developed.
IADIS International Conference Informatics 2009

EXPERIENCES WITH EXPERIMENTS IN AMBIENT INTELLIGENCE ENVIRONMENTS Piero Zappi1, Clemens Lombriser2, Elisabetta Farella1, Luca Benini1, Gerhard Tröster2 1

Department of Electronic Informatics and System (DEIS), University of Bologna, Bologna 2 Wearable Computing Laboratory, ETH Zurich, 8092 Zurich

ABSTRACT The development of activity recognition techniques relies on the availability of datasets of gestures to train and validate the proposed methods. In this work we introduce and describe a new dataset for activity recognition. The dataset is made up of 8 scenarios from everyday life and includes 17 activities composed of a total of 64 gestures. Each scenario has been repeated 10 times by 2 users. All activities and gestures are labeled. 5 different sensing modalities are implemented by using body worn and environmental sensors and smart objects. The paper describes our considerations in setting up the testbed and performing the experiments to record the dataset, our experiences with recording the data and discusses possible research questions to be tackled with the dataset.

1. INTRODUCTION Ambient intelligence envisions a world where a dense mesh of sensors is integrated into stationary objects, artifacts, and clothing and is able to sense its own physical state, the presence of people, their state and current activity. With this information, the environment itself can provide context-aware services to support its inhabitants (Ramos, 2008). Both the design and the validation of activity recognition techniques require large datasets that must be obtained through time-consuming and expensive test sessions. Thus, several datasets that include i.e. statistic on populations, diseases, plants, handwritten characters (Asuncion, 2007), images (Benjamin, 2007), and activities in smart environments (Intille, 2006) have been released. In this paper, we describe our considerations and experiences with collecting data from a sensorized environment with the end goal of producing a high-quality, freely available reference dataset for benchmarking activity recognition algorithms on different abstraction levels. Our experiments include 5 sensing modalities and up to 12 wireless sensors recording at the same time. The dataset is constructed out of 8 different scenarios of everyday life, which include 17 activities composed of 64 gestures. The activities have been performed by two test subjects 10 times each. During the time a subject performed the activities, the experiment supervisor recorded time markers to identify the start and duration of each activity. The dataset is available (http://www.wearable.ethz.ch/research/groups/sensor_nets/dataset) for research purposes. It includes the raw sensor data, hand-corrected labels, and synchronized and resampled data together with Matlab scripts to load it. To the best of our knowledge, compared to other available datasets, this is the first that includes several repetition of activities labeled in detail down to individual gestures and recorded from body worn and environmental sensors as well as smart objects. We present our recording setup and summarize the content of our dataset. We hope that others will contribute to the dataset to make it grow to a commonly useful resource.

2. EXPERIMENT SETUP The 8 scenarios of the dataset involve cooking a soup in the kitchen (scenario Kitchen), assembling a shelf with three boards (Shelf assembly), then attaching a metal crossbar to it (Crossbar assembly), three sets where the subject is working on its desk reading, writing, and using the computer (Relaxing 1, Relaxing 2 and Working), a set where two subjects collaboratively assemble the shelf (Collaboration), and a last set where

171

ISBN: 978-972-8924-86-7 © 2009 IADIS

the subjects perform activities which are not to be recognized, but may cause false positives in the recognition algorithms, such as scratching the head or using a mobile phone (Distractions). The activities were recorded by body-worn sensors featuring accelerometers at both wrists and on the left leg right above the knee and bend sensors monitoring the extension of the fingers of the right hand. Further accelerometers were placed on 12 objects and tools the subjects interacted with and on a shelf leg, a shelf board, and a chair. Additional 8 light sensors were placed into drawers and cupboards to monitor whether they have been opened by the test subject. Work on the computer was sensed by recording the number of key presses and mouse movements. A pyroelectric infrared (PIR) motion sensor recorded when the subject entered and left the room after each recording. Finally, a camera filmed the room during the experiment. The raw data samples from the sensors have been collected through wireless communication to a laptop PC where a supervisor labeled the beginning and the end of each activity using custom software developed for this project. Only the PIR was connected by a serial cable. For synchronization, a timestamp was added at the reception of every message on the recording PC.

2.1 Activities and Labeling Our goal is to have a detailed record of gestures performed during our experiments. Thus, we defined 64 gestures, such as picking up a screwdriver or turning a screw, which should allow identifying which of the 17 composite activities the subject has been performing at that time, such as fixing a crossbar on the shelf. Table 1 lists the different scenarios and the number of different composite activities and gestures they include. Some gestures and composite activities occur multiple times during a recording, and some distracting gestures, such as scratching the head, were occasionally inserted. The average number of labels to be set during a recording is indicated in the last column of Table 1. All recorded data is annotated with the beginning and the ending of each activity and gesture, such that it can be analyzed and used for training and testing of classifiers. One option would have been to record all data without labels and to add the activity and gesture information after the experiments by inspecting the recorded films. However, we expected this to take considerably more time than online labeling and manually check the labels later on. The drawback of this approach is that, due to the large number of different activities and gestures, their sequence needs to be fixed, such that the supervisor can find them in a list in useful time. We designed a simple user interface which displays the sequence of activities and gestures to be performed by the subject (see Fig. 1). The experiment supervisor can select the activity and gesture and conveniently start and stop the time during which it was performed. After a short training session, a supervisor is able to efficiently use this interface. The software adds the time-stamped event to the recorded data flow. An important addition is a sensor health indication showing which sensor is active. The sensor identifiers are colored green when data has been received during the last second. When data was missing, the corresponding identifier turns red, such that the experiment supervisor is alerted and can decide whether or not to stop the recording. We experienced several times during the experiment that sensors failed to deliver data only for a few seconds and then quickly recovered the transmission. By monitoring the network status we were able to stop the experiment if critical sensors were not responding for an extended amount of time. Table 1. Number of composite and atomic activities and total number of activity occurrences Scenario

Figure 1. Labeling software with a) atomic label sequence, b) composite label sequence c) start/stop button, and d) sensor health status

172

Relaxing 1 Relaxing 2 Crossbar assembly Kitchen Shelf assembly Working Collaboration Distractions Totals

composite activities 3 2 4 8 5 5 6 6 17

atomic activities 9 9 12 37 14 17 17 10 64

average occurrences 27 18 27 68 99 31 87 60 400

IADIS International Conference Informatics 2009

Table 2. Overall experiment message loss CATEGORY

POSITION

Infrastructure

PIR Computer Hammer Screw Driver Scissors Knife Book 1 Book 2 Phone Stirring spoon Drill Wrench Small Wrench Big Pen

Tools

MESSAGE LOSS (%) 38.0 0.0 0.9 2.1 0.0 10.9 12.7 9.8 13.8 3.7 0.6 23.7 15.3 0.7

CATEGORY POSITION Furniture

Body worn

Shelf board Chair Food cupboard Dish cupboard Cutlery drawer Garbage Pot drawer Shelf leg Tool Drawer Desk drawer Glove Wrist Left leg

Average

MESSAGE LOSS (%) 3.4 4.3 2.0 2.1 2.6 0.6 1.8 5.9 0.1 2.6 0.5 0.6 10.2 8.7

In a post-processing step, the labels were inspected and corrected manually by cross-checking against the video recording of the experiment. The accuracy of the online labeling by the experiment supervisors is evaluated, such that it can be compared to automatic context recognition algorithms that will be tested on the dataset. For the evaluation, we accept a human-set label as true positive if it is overlapped at least on one sample with the ground truth label. This definition does not allow reporting events early or late. Labels that have no match on ground truth are reported as false positives while ground truth labels without matching event as false negatives. Multiple matches between labels and ground truth were all counted as true positives. It is interesting to note that even the "gold reference" labeling performed by the human brain is not perfect. The accuracy of the human labeling was determined to be 96.75%, with a precision of 97.83% and a recall of 98.87%. Most errors came from mixing up two different activities or setting the label too late.

2.2 Wireless Communication Data from the sensors has been collected through different media by a single laptop PC. The PIR sensor readings were gathered using a serial cable, data from the right wrist and the bend sensors were sent using a Bluetooth radio, all other sensor nodes are based on Tmote platforms and use the TinyOS wireless stack based on IEEE 802.15.4. The accelerometers on the sensor nodes were sampled at 50 Hz on three axes, requiring a sensor data throughput of 2.4 kbit/s. As expected (Shnayder, 2005), early tests showed that at this data-rate we suffer for high message loss when more than 3 nodes are streaming on a single channel, thus we decided to use multiple parallel channels and assign only 2 sensors to each. During the experiment we experienced nodes failures or communication loss which reduced the quality of the acquired streams. Messages losses vary from 0% up to 38.0% (PIR sensor) with an average of 8.7% (see Table 2). Higher packet losses are due to sensor that failed at the beginning of an experiment and whose failure was not recognized until the end of the test, while lower data losses are mainly due to temporary occlusion due to movements of the user or to interference between sensors streaming on the same channel. The high data loss on the PIR sensor is not a real loss of packets, but it is simply due to the fact that this sensor was not available during all experiments. This analysis shows that we succeeded in reducing data loss by using multiple channels. Low power communication is inherently unreliable and depends on body positions and movements (Kusserow 2009), which activity recognition algorithms for ambient intelligence environments have to take into account.

3. BENCHMARK FOR CONTEXT RECOGNITION The dataset we built is beneficial for activity recognition research in topics such as: • Comparison of Approaches. Typical activity recognition techniques rely on sensors that are either placed on user body, or on objects, or in the environment. As a consequence, different algorithms are tested on different datasets making their comparison difficult. We believe that the possibility to test

173

ISBN: 978-972-8924-86-7 © 2009 IADIS

different approaches on a common benchmark will allow a better understanding of the benefit provided by different techniques. Furthermore, since sensor networks are dynamic systems, a researcher can compare different scenarios where the user may or may not be equipped with smart garments or performs his activity with or without smart objects and moves within or outside a smart environment. • Distributed vs. centralized recognition. The authors of (Amft, 2007) have shown how distributed recognition of activities can be performed. An evaluation of a centralized recognition algorithm, which has all the data available, vs. a distributed recognition algorithm, which recognizes activities locally on the individual sensors while a centralized node only fuses their results, can be performed. There are as well some intermediate solutions where sensor data on different sensors may be correlated. • Hierarchical Activity Recognition. In a multilevel hierarchical approach to activity recognition, researchers can develop activity recognition techniques specifically for each level. By feeding back their recognition results to the dataset, higher- level or lower level algorithm designers can investigate the influences of different approaches on other levels on the overall performance. Furthermore they can test the benefit of cross-level information exchange. • Context-aware activity recognition. The knowledge of the higher level activities may be used to restrict the search space of the lower-level activities to improve their recognition accuracy. As classifiers have then to only discriminate a limited subset, an improvement of their performance can be expected. For example, micro activity detection can benefit from the knowledge that the user is currently mounting a shelf board by restricting its detection to relevant micro activities and omitting the detection of activities related to cooking.

4. CONCLUSION This paper summarizes our experiences setting up and running a diverse set of scenarios within an ambient intelligence environment. The experiments included multiple sensing modalities on sensors mounted on the body, embedded within tools used by the subject, and sensors in the environment. We described how to perform efficient labeling and ensure a good performance of the wireless channel. The resulting dataset is publicly available and we hope that it may support different researchers to engage into the research challenges we have outlined.

ACKNOWLEDGEMENT This work was partly supported by the SENSEI project (www.sensei-project.eu) of the European 7th Framework Programme, contract number: 215923, and by the ARTISTDESIGN network of excellence founded under the European 7th Framework Programme, contract number: 214373.

REFERENCES Amft, O. et al, 2007. Recognition of user activity sequences using distributed event detection. Proceedings of the 2nd European Conference on Smart Sensing and Context, Kendal, UK, pp. 126–141. Intille S. S. et al, 2006. Using a live-in laboratory for ubiquitous computing research. Proceeding of Pervasive Computing. Dublin, Ireland, pp. 349 – 365. Kusserow, M. et al, 2009. Bodyant: Miniature wireless sensors for naturalistic monitoring of daily activity. Proceedings of the 4th International Conference on Body Area Networks, Los Angeles, USA Ramos, C. et al, 2008. Ambient Intelligence-the Next Step for Artificial Intelligence. IEEE Intelligent Systems, Vol. 23, No. 2, pp. 15–18. Shnayder, B. et al, 2005. Sensor networks for medical care. Technical report, TR-08-05, Division of Engineering and Applied Sciences, Harvard University. Asuncion, A. and Newman, D.J., 2007. UCI Machine Learning Repository. Irvine, CA: University of California, School of Information and Computer Science. http://www.ics.uci.edu/~mlearn/MLRepository.html. Benjamin, Y et. al, 2007. Introduction to a large scale general purpose ground truth dataset: methodology, annotation tool, and benchmarks. EMMCVPR, 2007.

174