MicroFilters: Harnessing Twitter For Disaster Management - IEEE Xplore

0 downloads 0 Views 698KB Size Report
Abstract—As social media grows more rapidly each day, new ways to harness ... As much as the crisis response to Supertyphoon Pablo was a success, Twitter ...
MicroFilters: Harnessing Twitter For Disaster Management Andrew Ilyas Waterloo Collegiate Institute Waterloo, Ontario, Canada [email protected]

Abstract—As social media grows more rapidly each day, new ways to harness worldwide connectivity are being continually discovered. The role of social media in disaster management emerged in 2012; social media data can yield rescue and aid opportunities for humanitarians. Immediately after a natural disaster, an overwhelming amount of this data floods social workers. Unfortunately, the majority of this data carries no value to disaster responders, who are only interested in location and severity of damage. MicroFilters is a system designed to take advantage of image data by scraping tweets and the links therein for images, then using machine learning to classify them. This classification will eliminate images that do not show direct damage and therefore are not useful to rescue efforts. This paper outlines the development of the MicroFilters system from start to finish, including key technical problems involved such as data sparseness, feature engineering, and classification. The experimental evaluation validates the proposal and shows the efficiency of our techniques (average 88% recall and 70% precision). We also discuss opportunities for future development of the MicroFilters system.

I. I NTRODUCTION Twitter’s fast and concise way of spreading ideas makes it favourable for survivors trying to reach out to loved ones and aid after a natural disaster. Tweet data is very useful in the hands of humanitarians [1]. With the growing amount of tweets, manually sifting through for relevant information is becoming impractical, and in cases, prohibitively expensive. Humanitarians need methods to quickly and efficiently extract tweets showing only relevant, mappable, and on-the-ground damage [2]. In December of 2012, Supertyphoon Pablo struck the Philippines, leaving disaster management rescue teams just hours to get to disaster-affected victims [3]. However, to get to the people in need, rescue workers from organizations such as the UN needed to know which areas were affected. The Philippines is one of the most social network-active countries in the world, with 9.5M Twitter users and growing in 2012 [4], and with around 83% of the Filipino population having a social media account in 2008 [5]. With this information, the UN called upon a new kind of social worker: the digital humanitarian. And while organizations realized the potential of the mountains of tweets, they also knew that only a fraction of these tweets would be useful. Sorting these tweets was the task given to the Digital Humanitarians. They manually sorted through all the tweets and tagged them according to whether damage was

978-1-4799-7193-0/14/$31.00 ©2014 IEEE

417

being shown. From the efforts of these humanitarians and many volunteers, the first-ever UN social media crisis map was created [3]. As much as the crisis response to Supertyphoon Pablo was a success, Twitter is constantly growing, with a rising number of tweets per day. Manual search for specific objectives is quickly becoming obsolete. This was the birth of automated tweet classification, programs that would use machine learning in order to sort tweets by relevancy. However, despite the rapid growth of machine learning within the field of disaster management, a solution directed particularly at image classification has not been implemented. An image approach to disaster management would allow for richer and contextual information by accompanying text solutions, and can also be used to reduce false positives due to misleading texts. For example, if the text of a tweet reads “Huge tornado in Oklahoma with lots of damage to buildings” and a link attached, a text classifier may classify this tweet as relevant to the cause, even if the picture(s) in the link referenced are simply weather maps or pictures of the disaster from space. These two advantages of an image solution in accompaniment with current text solutions was the motivation for the MicroFilters system, which will allow the further acceleration of disaster management and provide richer and contextual information to rescue teams. II. R ELATED W ORK The field of disaster management technologies is an quickly growing one. Recently, there have been many technologies released in order to accelerate, enrich, and help disaster management. For example, the Tweet4Act [6] system is a period detection system and tweet filtering technology for disaster management. The system filters out irrelevant tweets based on n-gram data and then classifies them based on period (before, during, or after a disaster). Although the Tweet4Act system reports results on par with the MicroFilters system, Tweet4Act uses solely text data to make classification decisions, and uses the text data to classify based on period. Not only does this fulfill a different goal as MicroFilters, but Tweet4Act can actually be used in conjunction with the MicroFilters system to provide disaster responders with relevant images sorted into during and post-disaster time periods. Another immensely popular system recently released into disaster management technologies is AIDR: Artificial Intelligence for disaster response [7]. AIDR sorts tweets, with an AUC accuracy score almost identical to that of MicroFilters. Once again, AIDR uses solely text data

IEEE 2014 Global Humanitarian Technology Conference

to give information, and therefore serves an entirely different purpose than MicroFilters: sorting tweets into specific userdefined categories. AIDR serves as an information manager, whereas MicroFilters serves as a rich data manager. Because of these differences, the MicroFilters system can actually be combined with the AIDR system to further enhance disaster management, providing disaster response teams important textual information through AIDR and relevant graphical information through the MicroFilters system. Lastly, many other systems for information extraction and disaster management enhancement have been introduced, however none offer image classification and extraction. III. T ECHNICAL C HALLENGES Building MicroFilters has two main technical challenges, namely sparseness of the training data and classifier selection. These challenges and their solutions are discussed in this section. A. Data Sparseness and Feature Engineering Since computers cannot understand images directly, a machine learning classifier has to first extract numerical features from the images, each number representing a different aspect. The data sparseness problem is a common problem in machine learning that limits the number of features a classifier can extract from data when presented with a fixed amount of training data. The premise behind the data sparseness problem is that as more features are extracted from the data (in this case, images), more dimensions are added to the classification space (see Figure 1). This allows for increased separability of the data, however more training data is required to support the classification. In order to minimize training data, the MicroFilters system was limited to a single image aspect, and therefore choosing this aspect well is a necessity. This section aims to describe the three features that were experimented with for use in MicroFilters. The methods behind Color Histogramming, Image Segmentation, and Haralick Features will be discussed, as well as their advantages and disadvantages, and their incorporation into the MicroFilters system. 1) Color Histogram: A colour histogram is one of the simpler methods used to depict the colour quantities and ratios in an image. The method involves sorting each pixel in each channel in an image into a bin based on its shade. For example, in an RGB image, 3 bins might be defined (0-85, 86-170, 171255) for each color channel (R, G, and B), then the pixels would be sorted into these bins. If a pixel had the values (13, 98, 65), then its position in the color histogram would be (1, 2, 1) on the R, G, and B axes respectively. Because of the various advantages and disadvantages of the colour histogram, its usage in Machine Learning are very specific. A colour histogram is very easy to implement, and is an extremely fast operation, making it popular for trivial applications, such as differentiating between winter and summer scenes, or for applications where the object in the image is known (For example, if classifying red chairs and green chairs, but all the data concerns a chair). However, for more complex applications, the colour histogram is often not indicating of

Fig. 1: An illustration of the data sparseness problem. The data shown cannot be separated using one feature (on one axis), but the combination of features allows for more separability. However, the data is also less dense, which means that additional training data is needed.

the result, because of its lack of edge, texture, and semantic data incorporation. For example, a colour histogram will not distinguish between a blue sky with a red ground (a flower patch) and a blue ground with a red sky (lake and sunset). The second drawback present in colour histograms is that of light obstruction and noise effect. Often, with colour histograms, if the photo is taken in sub-optimal photography conditions (quite a common case in disaster management), the conditions in which the image was taken greatly affects the histogram formed from that image. Because of these drawbacks, colour histograms were not used in MicroFilters. 2) Image Segmentation: Image segmentation is yet another feature used in image classification: the idea of the method is to split an image into several ”regions” based on pixel similarity. Although there are a multitude of possible image segmentation algorithms, the one that will be discussed here is the Felzenswalb-Huttenlocher [8], as this was the algorithm experimented with for this system. The Felzenswalb segmentation algorithm is a graph-based fixed point algorithm. It converts the target image I into an undirected graph G(V, E), where V is the set of vertices vi and E is the set of edges (vi , vj ). The algorithm defines the weight of an edge w(vi , vj ) to be the dissimilarity between vi and vj , determined by a user-provided parameter, such as colour or intensity. Now, a segmentation S of the set of vertices V semantic G is defined as a method to divide V into a list of components C = C1 , C2 . . . Cn . In order to segment the image, the objective is to minimize weights within the same component, and to maximize weights in different components. To do this, the segmentation is quantified using two functions Int(C) and Dif (C1 , C2 ), representing the intra-component and inter-component weights, respectively:

Int(C) = max w(e), e ∈ M ST (C, E) Dif (C1 , C2 ) = min w(vi , vj ), vi ∈ C1 , vj ∈ C2 Using these two predicate functions, the algorithm starts with a completely segmented image, that is, each pixel is considered its own segment, and then for each segment, if the inter-component weight between it and any of the adjacent components is less than the intracomponent weight for that segment, it joins the two segments into one. The algorithm stops once a complete loop of the image has been made without modifications. [8] Image segmentation proves to be extremely useful in object extraction from images (for example, finding a ball). However, as a feature, many aspects of the segmentation can be used as inputs, such as average size of segments, standard deviation of segment size, geometrical themes in the segmentation, etc. With regards to the MicroFilters project, image segmentation proved to be useful at times, yet debris and rubble in the images often led to useless segmentation results, which led to the elimination of image segmentation as a feature. 3) Haralick Features: In the case of automatic classification, the goal is to make the computer ”see” what a human can. A common human impulse when classifying an object is to feel it; the texture of an object is often very indicative of its type. This is analogous to image classification: although a colour histogram seems like a reasonable approach since humans see colour, texture features can often provide even greater direction as to the class of image being dealt with. This section will briefly outline the mechanism by which texture, and specifically haralick texture, can be extracted from the image. Unlike three-dimensional objects, digital images do not have a physical texture. Instead, a computer will evaluate “peaks” and “valleys” by colour intensity as opposed to elevation. This is achieved by computing a “co-occurence matrix” for the image, a matrix that represents the similarity of pixels to their direct neighbours. An example is shown in Figure 2. Using this method, the computer can now extract texture data from the image. In 1979, Robert M. Haralick proposed 14 features from this co-occurence matrix. These features are all shown in Figure 3. Haralick features are quite commonly used, because they are usually very indicative features. However, they are more expensive to calculate than simpler features such as color histograms. Through feature experimentation it was found that Haralick texture features tended to be the most useful features in disaster image classification. This may be the result of the distinctive texture of non-disaster images such as weather maps, or a result of the scattered texture of disaster images because of debris. Haralick features were chosen for the MicroFilters project. The 14 coefficients were averaged to minimize the number of numerical features to be used by the classifier. B. Classifier Selection Upon extracting the data from the images, the next stage in MicroFilters calls for a classifier to label the images. Two

Fig. 2: An example of a horizontal co-occurrence matrix. The image is shown on the left and the resulting matrix is shown to the right

Fig. 3: The 14 haralick coefficients computed from the cooccurence matrix

well-known classification models, Naive Bayes and Support Vector Machines, will be outlined in this section, along with loss and gain, and the application into MicroFilters for each. 1) Naive Bayes Classifier: The Naive Bayes Classifier (NBC) is a popular statistical classifier for its ease of implementation and effectiveness with little training data. NBC assumes that all attributes are statistically independent, and uses Bayes’ Theorem for the conditional probabilities of two (B|A) events A and B: P (A|B) = P (A)P . The probabilP (B) ity of a hypothesis H being correct given the features is computed as P (H|A)whereA = {a1 ...an }, the set of all features. The correct H is the one that maximizes the probability P (H|A) and therefore the probability P (H)P (A|H), which equals P (H)P (a1 |H)...P (an |H). Therefore, H = arg maxH P (H)P (a1 |H)P (a2 |H) . . . P (an |H) P(H) is acquired through prior experiments and P (an |H) is determined from the data. The NBC is particularly effective when training data is scarce. Also, prior data can be used, lessening the need for training data; for a cancer classifier, if 2% of people in the world have cancer, then P(H) = 0.02. However, the downfall of the NBC comes with increasingly complex and inseparable data, frequently found in MicroFilters, which led to the NBC being discarded as a classification option. 2) Support Vector Machine: The key behind the SVM is what is called the kernel trick [9] (Figure 4), which uses a mathematical function (kernel) to re-project non-linearly separable data. Linearly separable data is simply data that can be separated with a line based on feature values. The advantage of the SVM is that it can classify complex data. The SVM, however, does not give confidence values for its predictions, and uses more resources than the NBC. However, because of its effectiveness, the SVM was chosen for MicroFilters.

disaster. This noted, there are improvements continuously being made to improve the accuracy and speed of MicroFilters. A. User Experience When a user visits microfilter.cs.uwaterloo.ca, they are immediately presented with the home page, which contains information and usage details/instructions. From the home page, they can navigate to the scraping interface, shown in Figure 5. From the scraping interface, a user uploads a commaseparated values (CSV) file of a particular format1 . Since tweets often contain retweets, the user is given the option to remove retweets and duplicate tweets from their file. An option to only extract images from English tweets is also present, but in beta. The user then selects the size(s) of images they wish to extract, Small (200×200 px – 300×300 px), Medium (300×300 px – 500×500 px), or Large (500×500 px and above). The user is also asked to enter their email, where they are sent a link to download the list of images extracted along with TXT, CSV, and JSON output files, which may be sorted into chronological order if the user so chooses. Once the user receives the email, they can take the image folder, and create three sub-folders: Yes, No, and Test. The Yes and No folders should contain approximately 200-250 labeled images total, with the rest of the unlabeled images inside the Test sub-folder. The main folder is then compressed and re-uploaded to the MicroFilters machine learning interface (Figure 6), where the user enters in their email again, then receives a text file with the name of each image in the Test sub-folder and its relevancy. MicroFilters at this stage is completely usable and ready for deployment, however small interface changes and usability improvements are continuously being added. B. Implementation

Fig. 4: An illustration of the kernel trick

IV. T HE M ICRO F ILTERS S YSTEM MicroFilters has been implemented and is now a working system, hosted at microfilter.cs.uwaterloo.ca and ready to be taken to the large-scale deployment phase come a natural

Regular expressions were used to extract links from tweets. The open-source Python web scraping framework Scrapy was used to crawl pages that the links referred to. The server at microfilter.cs.uwaterloo.ca was granted for MicroFilters by the University of Waterloo. Another Python framework known as Scrapyd was used to host the scraping server at the MicroFilters domain. XPath filtering was used to look for tags in the page’s source code. To avoid downloading images of ads or banners, all images contained in image tags were downloaded except those which were also contained in , or link, tags. Once all the images are downloaded, they are put through a size test. To avoid downloading styling resources or profile pictures, images less than 200×200 px were discarded. To avoid banners or other irrelevant pictures, a final aspect ratio test is done on the images, discarding any images with an aspect ratio greater than 2:1 or less than 1:2. Once this is complete, the program returns the set of candidate images in an email to the user. 1 Detailed

on the MicroFilters website at microfilter.cs.uwaterloo.ca

Fig. 5: The scraping interface of the MicroFilters System

Fig. 6: The machine learning interface of the MicroFilters System

Once the user has sorted and re-uploaded the folder of images, the classifier program uses the Python feature extraction library Mahotas [10] in order to extract 13 Haralick texture features from each image. The Python library Milk, for machine learning in Python, was used to learn from the training data and classify the new images. The classifier implemented used filtering normalization, feature selection, and an RBF (Radial Basis Function) kernel for the Support Vector Machine classifier. V. T ESTING AND DATA C OLLECTION The full MicroFilters testing workflow is shown in Figure 8. In order to assess the effectiveness of the MicroFilters system, the Typhoon Yolanda and Oklahoma Tornado datasets were used. Images were extracted and uploaded to CrowdCrafting, an online microtasking platform, in order to get ground truth (or “golden”) data (Figure 9). Volunteers were asked to tag images for whether they contained direct damage, which was defined as infrastructural, vehicular, or property damage. About

200 images from each disaster are used as training data, and the rest is classified automatically and compared to the ground truth data. To test scraping, images are manually extracted from random web pages and compared to automatic extraction. Standard measures such as precision, recall, and F1 were used to evaluate scraping and classification. VI.

R ESULTS

A. Scraping In order to test the scraping portion of the MicroFilters system, a set of random webpages were chosen, and then manually scraped for images. The pages were then scraped again with the MicroFilters scraping mechanism described in Section IV-B, and a recall of almost 99% was measured. B. Machine Learning Classifier In order to test the classifier, we obtained image datasets for the Oklahoma Tornado and Hurricane Sandy disasters. 200-250

Fig. 7: A control flow diagram detailing the MicroFilters system.

Fig. 8: The MicroFilters testing workflow

training images and around 500-600 testing images were used for each disaster, all groundtruthed through crowdsourcing or manually. Standard machine learning measures for the classifier were calculated (Table I), and a receiving operating characteristic (ROC) graph [11] was computed (Figure 10). By taking the Area Under the Curve (AUC) of the ROC graph, we get an estimate of the classifier’s accuracy. The standard measures were compared to the AIDR method of tweet classification that depends only on text data and were found highly comparable. MicroFilters’ AUC score, 78%, rivals that of AIDR, cited as a “maximum classification quality (in terms of AUC) up to 80%” [7]. MicroFilters also compares to AIDR in training data used (200-250 for MicroFilters vs 200 for

AIDR). AIDR does not provide any other comparable metrics, leaving MicroFilters on par with other disaster management systems with the added benefit of image data. Note that a comparison of MicroFilters and Tweet4Act is impossible, as the filtering mechanism (no training) in Tweet4Act is based on crisis-relatedness [6] whereas MicroFilters (machine learning) is based on strict contribution to the rescue cause through an identifiable portrayal of direct damage. In addition, because MicroFilters’ fills the specific and novel niche of classifying images, it can be combined with other automated disaster management solutions that are text or metadata based in order to further enhance disaster management.

Name Average

Precision 0.70

Recall 0.88

F1 0.78

AUC 0.78

FN % 0.039

TABLE I: The results table for the classifier portion of the MicroFilters system

into account. All of these improvements will help MicroFilters become an even more effective disaster management tool, nevertheless MicroFilters is ready to be deployed as it stands.

Fig. 9: The CrowdCrafting interface used to obtain the ground truth data for testing MicroFilters

IX. C ONCLUSION MicroFilters is an effective novel disaster response tool, saving both time and resources in many disaster management cases. MicroFilters provides rich and relevant image data through extraction and classification shown to be on par with other systems. Overall, MicroFilters highlights the importance of machine-human collaboration for disaster management, particularly in this age dominated by social media. By delegating time-consuming repetitive tasks to computers and complex tasks to humans, MicroFilters enhances disaster management significantly, leaving the all-important job of saving lives to us. ACKNOWLEDGMENT Thanks to my mentor Dr. Patrick Meier, his crisis management expertise, to E. Liebster, B. Caldwell, and I. Morrison for tagging 1000 images each, and to the University of Waterloo for providing the server at microfilter.cs.uwaterloo.ca. R EFERENCES [1]

Fig. 10: The receiving operating characteristic graph of MicroFilters’ performance

[2] [3]

VII.

I MPACT OF M ICRO F ILTERS

MicroFilters was used as a layer of the MicroMappers platform, a suite of disaster management tools, during the Pakistan Earthquake and Typhoon Haiyan. The MicroMappers platform as a whole was used by the United Nations Office for Humanitarian Affairs (UNOCHA) in order to accelerate disaster management and to provide rich image data to disaster responders. VIII.

F UTURE W ORK

To further expand and improve the MicroFilters software, usability and interface improvements can be made, such as an online interface for labeling images, a more extensive help guide, and further integration of the scraping and machine learning interfaces. On the backend, effeciency improvements such as semi-supervised learning and active learning are being tested in order to minimize the amount of training required. In addition, semantic data regarding the image can be taken

[4]

[5] [6]

[7]

[8]

[9]

[10]

S. Vieweg, “Microblogged contributions to the emergency arena: Discovery, interpretation and implications,” in CSCW, 2010. P. Meier, “What is Big (Crisis) Data?” iRevolution, June 2013. [Online]. Available: http://irevolution.net/2013/06/27/what-is-big-crisis-data/ ——, “How the UN Used Social Media in Response to Typhoon pablo (updated),” iRevolution, December 2012. [Online]. Available: http://irevolution.net/2012/12/08/digital-response-typhoon-pablo/ P. Montecillo, “Philippines has 9.5m twitter users, ranks 10th,” Inquirer.net, 2012. [Online]. Available: http://technology.inquirer.net/15189/philippines-has-9-5m-twitterusers-ranks-10th U. McCann, “Wave 3: Power to the people social media tracking,” Technical Study, March 2008. S. R. Chowdhury, S. Amer-Yahia, C. Castillo, M. Imran, and M. R. Asghar, “Tweet4act: Using incident-specific profiles for classifying crisis-related messages,” in 10th International ISCRAM Conference, 2013. M. Imran, C. Castillo, P. M. Ji Lucas, and S. Vieweg, “Aidr: Artificial intelligence for disaster response,” in 23rd international conference on World Wide Web, 2014. P. Felzenswalb and D. Huttenlocher, “Efficient graph-based image segmentation,” Intl. Journal of Computer Vision, vol. 59, no. 2, pp. 167–181, 2004. K. Singh, A. Kumar, and H. Chandra, Forecasting Techniques in Agriculture. Indian Agricultural Statistics Research Institute, ch. 16, pp. 171–179. L. P. Coelho, “Mahotas: Open source software for scriptable computer vision,” Journal of Open Research Software, 2013.

[11]

T. Fawcett, “An introduction to roc analysis,” Pattern Recognition Letters, vol. 27, pp. 861–874, 2006.