ERCIM News 111

24 downloads 0 Views 2MB Size Report
Ghosh, I. Shibpur, I. K. P. Gummadi, and K. Kara- halios. Quantifying search bias: Investigating sources of bias for political searches in social media. In. CSCW ...
Research and Innovation

best represent the topic and its aspects. The Profile Generator takes as input the user population U and the set of protected attributes P, and produces a set of user profiles appropriate for testing whether the OIP discriminates over users in U based on the protected attributes in P (e.g., gender). The Result Processing component takes as input the results from the OIP and applies machine learning and data mining algorithms such as topic modelling and opinion mining to determine the value of the differentiating aspects (e.g., if a result takes a positive stand). Central to the system is the Ground Truth module, but obtaining the ground truth is hard in practice. Finally, the Compute Bias component calculates the bias of the OIP based on the subject and object bias metrics and the ground-truth. Since bias is multifaceted, it might be difficult to quantify it. In [L6] we propose some subject and object bias metrics. Further, obtaining the ground truth and the user population are some of the most formidable tasks in measuring bias, since it is difficult to find objective evaluators and generate large samples of user accounts for the different protected attributes. Also there are many engineering and technical challenges for the query generation and result processing components that involve knowledge representation, data integration, entity detection and resolution, sentiment detection, etc. Finally, it might be in the interest of governments to create legislation that provides access to sufficient data for measuring bias, since access to the internals of OIPs is not provided. Links: [L1] https://kwz.me/hLc [L2] https://kwz.me/hLy [L3] https://kwz.me/hLH [L4] https://futureoflife.org/ai-principles/ [L5] https://en.oxforddictionaries.com/definition/bias [L6] http://arxiv.org/abs/1704.05730 References: [1] J. Stoyanovich, S. Abiteboul, and G. Miklau. Data, responsibly: Fairness, neutrality and transparency in data analysis. In EDBT, 2016. [2] J. Kulshrestha, M. Eslami, J. Messias, M. B. Zafar, S. Ghosh, I. Shibpur, I. K. P. Gummadi, and K. Karahalios. Quantifying search bias: Investigating sources of bias for political searches in social media. In CSCW, 2017. [3] M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In WWW, 2017. Please contact: Irini Fundulaki, Foundation for Research and Technology – Hellas, Institute of Computer Science, Greece, [email protected]

The Approximate Average Common Submatrix for Computing the Image Similarity by Alessia Amelio (Univ. of Calabria) The similarity between images can be computed using a new method that compares image patches where a portion of pixels is omitted at regular intervals. The method is accurate and reduces execution time relative to conventional methods. To date, researchers have not quite solved the problem of automatically computing the similarity between two images. This is mainly due to the difficulty of filling in the gap between the human visual similarity and the similarity which is captured by the machine. In fact, it requires two important objectives to be fullfilled: (i) to find a reliable and accurate image descriptor that can capture the most important image characteristics, and (ii) to use a robust measure to evaluate similarity between the two images according to their descriptors. Usually a trade-off is needed between these two objectives and the execution time on the machine. An important challenge, therefore, is to achieve computation of an accurate descriptor and a robust measure whilst reducing execution time. In this paper, I present a new measure for computing the similarity between two images which is based on the comparison of image patches where a portion of pixels is omitted at regular intervals. This measure is called Approximate Average Common Submatrix (A-ACSM) [1] and it is an extension of the Average Common Submatrix (ACSM) measure [2]. The advantage of this similarity measure is that it does not need to extract complex descriptors from the images to be used for the comparison. On the contrary, an image is considered as a matrix, and the similarity is evaluated by measuring the average area of the largest sub-matrices which the two images have in common. The principle underlying this evaluation is that two images can be considered as similar if they share large patches representing image patterns. Two patches are considered as identical if they match in a portion of the pixels which are extracted at regular intervals along the rows and columns of the patches. Hence, the measure is an easy match between a portion of the pixels. This concept introduces an approximation, which is based on the “naive” consideration that two images do not need to exactly match in the intensity of every pixel to be considered as similar. This approximation makes the similarity measure more robust to noise, i.e., small variations in the pixels’ intensity due to errors in image generation, and considerably reduces its execution time when it is applied on large images, because a portion of the pixels does not need to be checked. Figure 1 shows a sample of match between two image patches with an interval of two along the columns and one along the rows of the patches. Figure 2 depicts the algorithm for computing the A-ACSM similarity measure.

50

ERCIM NEWS 110 July111 October 2017

Figure 3 shows how to find the largest square sub-matrix at a sample position (5,3) of the first image.

Figure 1: A demonstration sample of match between two patches belonging to images 1 and 2. Each image has four colours which are also numbered from 1 to 4. An interval of Δx=2 and Δy=1 is set respectively along the columns and rows of the patches. Accordingly, the match is only verified between the elements in the red circles. The elements are selected as in a chessboard. In this case, the two image patches perfectly match because all the elements in the red circles correspond to one another.

The A-ACSM similarity measure, as well as its corresponding dissimilarity measure, has been extensively tested on benchmark image databases and compared with the ACSM measure and other well-known measures in terms of accuracy and execution time. Results demonstrated that A-ACSM outperformed its competitors, obtaining higher accuracy in a lower execution time. The project of the average common submatrix measures is in its early stage at the Georgia Institute of Technology, USA, and is currently in progress at DIMES University of Calabria, Italy. Future work will extend the submatrix similarity measures with new features and will evaluate the similarity on different types of data, including documents, sensor data, and satellite images [3]. The recent developments in this direction are in collaboration with the Technical Faculty in Bor, University of Belgrade, Serbia.

Figure 2: Flowchart of the A-ACSM algorithm.

References: [1] A. Amelio: “Approximate Matching in ACSM dissimilarity measure”, Procedia Computer Science 96: 1479-1488, 2016. [2] A. Amelio, C. Pizzuti: “A patchbased measure for image dissimilarity”, Neurocomputing 171:362-378, 2016. [3] A. Amelio, D. Brodić: The ε-Average Common Submatrix: Approximate Searching in a Restricted Neighborhood, IWCIA: 7-11, (short comm.) arXiv:1706.06026, 2017. Please contact: Alessia Amelio, DIMES University of Calabria, Italy [email protected]

Figure 3: The largest square sub-matrix at position (5,3) of the first image. All the sub-matrices of different size along the main diagonal at (5,3) are considered to match inside the second image. In this case, the sub-matrix of size 3 has no match inside the second image. Hence, the sub-matrix of size 2 is considered. Because it has a correspondence at position (3,2) in the second image, it is the largest square sub-matrix at position (5,3). ERCIM NEWS 111 October 2017

51