This is the author copy of Tool Independence for the Web Accessibility Quantitative Metric. Disability & Rehabilitation: Assistive Technology 4(4), 248-263. Informa Healthcare. Available at http://informahealthcare.com/doi/abs/10.1080/17483100902903291 Note that there might be some inconsistencies between this and the above publication so use this copy at your own risk.
Tool Independence for the Web Accessibility Quantitative Metric
Markel Vigo1, Giorgio Brajnik2, Myriam Arrue1 and Julio Abascal1 1 University of the Basque Country, Informatika Fakultatea, 20018 Donostia, Spain. markel, myriam,
[email protected] 2
Dipartimento di Matematica e Informatica, Università de Udine, 33100 Udine, Italy
[email protected]
Abstract. The Web Accessibility Quantitative Metric (WAQM) aims at accurately measuring the accessibility of web pages. One of the main features of WAQM among others is that it is evaluation tool independent for ranking and accessibility monitoring scenarios. This paper proposes a method to attain evaluation tool independence for all foreseeable scenarios. After demonstrating that homepages have a similar error profile than any other web page in a given web site, 15 homepages were measured with 10000 different values of WAQM parameters using the automatic evaluation tools for accessibility EvalAccess and LIFT. Similar procedure was followed with random pages and with several test files obtaining several tuples that minimize the difference between both tools. 1449 web pages from 15 web sites were measured with these tuples and those values that minimized the difference between the tools were selected. Once WAQM was tuned, the accessibility of 15 web sites was measured with two metrics for web sites concluding that even if similar values can be produced, obtaining the same scores is undesirable since evaluation tools behave in a different way. 1 Introduction In recent years a great deal of research has been carried out in the field of metrics for web accessibility assessment, that is, how to measure the accessibility level of a web site. Measuring the accessibility level is necessary since more and more scenarios are based on such levels (see examples below), and in many cases they require high accuracy in order to rate and assess the accessibility of web sites. The most broadly used metric for accessibility is the qualitative metric proposed by the Web Accessibility Initiative1 (WAI) in the context of Web Accessibility Content Guidelines 1.0 (Chisholm et al., 1999) which are used for measuring the conformance of a webpage with the aforementioned guideline set. A web page satisfying all checkpoints obtains a “AAA” rating, if it satisfies all priority 1 and priority 2 checkpoints it gets “AA”, the “A” rating is obtained if only all priority 1 checkpoints are satisfied,
1
The Web Accessibility Initiative, WAI. Available at http://www.w3.org/WAI/
and finally the web page is “non conformant” otherwise. Notice how by just violating a single priority 1 checkpoint the page becomes non conformant. This metric (that leads to ordered symbolic values {AAA, AA, A, NC}) is not precise enough in order to rate and classify web applications according to their accessibility level. For example, a web page fulfilling all priority 1 checkpoints would obtain the same accessibility value as another web page fulfilling all priority 1 checkpoints and almost all priority 2 checkpoints: both of them would get the A level conformance. This criterion seems to be based on the assumption that if a web page fails to accomplish one of the checkpoints in a level, it is so inaccessible as if it failed to fulfil all of them. Although this is possible, several scenarios exist that require a higher resolution for the metric that go beyond a small ordered scale of accessibility scores. Quality Assurance (QA) within Web Engineering Web Engineering defines specific methodologies, models and techniques for web applications development. Since its final goal is to obtain high quality web applications, a Quality Assurance process is of a paramount importance. This entails the application of metrics, methods and quality models throughout the development process. As a consequence, measurement of web usability and accessibility should be performed during the different stages of the lifecycle and should be capable to yield precise and standard results; precision is needed in order to be able to distinguish levels of accessibility that may be close to each other, and standardization is needed in order to be able to compare different versions of a web site. In this sense, quality models such as 2QCV3Q (Mich et al., 2003) and WebQEM (Olsina and Rossi, 2002) have been defined. The characteristics of web applications and the necessary metrics for their evaluation are included in these models. In both quality models, web accessibility is an attribute which should be measured in order to guarantee the quality of the product. For this reason, quantitative metrics are essential in order to accurately measure usability and accessibility properties since both are needed for QA. In addition, ranking prototypes of a product according to their accessibility level is useful in order to assess the impact of changes, updates in functionalities, etc. in any iterative development process. Therefore, quantitative metrics for measuring and ranking prototypes according to their accessibility level would be useful. Considering Web Accessibility in Information Retrieval Ivory et al. (2004) conducted a study with visual impaired users in order to determine the factors which would improve search engine results for those users. It is concluded that some users would like to know additional details about search results, such as whether retrieved pages are accessible to them or not, and the paper recommends sorting results according to accessibility or usability criteria. Re-‐ranking results according to users’ visual abilities would improve their search experience. In this sense, Google has launched “Google Accessible Search”2 where results are ordered according the criteria stated in their FAQ: “pages with few visual distractions and pages that are likely to render well with images turned off”. However, this is not a 2
Available at http://labs.google.com/accessible/
comprehensive approach since only some guidelines for visually impaired users are being considered and users with other type of disabilities are not taken into account at all. Arrue et al. (2008) propose a model to adequately combine results provided by traditional IR systems, such as search engines for the WWW, with accurate accessibility measurement. Results are re-‐ranked according to their accessibility level or they are shown in the order provided by the search engine but labelled with their accessibility score. The former modality attaches importance to the accessibility of results, while in the latter preference is given to the relevance of the results for a given query. In any case, these modalities can only be achieved if accessibility can be measured. Accessibility Monitoring Once a web site has been developed, keeping track of the evolution of its accessibility level has a paramount importance since it may be framed by legal restrictions. Due to the nature of the WWW, updates in web sites are quite frequent and little control can be exerted on its content (for example, consider collaborative web sites like Flickr or a news web site like CNN.com). Since updates can decrease the accessibility level, some web sites find themselves in limbo situation which can result in legal issues and possibly administrative fines. Therefore, monitoring of the evolution of web accessibility requires metrics that are precise enough in order to avoid those circumstances. These metrics should be used for ranking purposes, so ordinal values may be enough. Accessibility monitoring processes may be also helpful for public institutions in order to keep track of their e-‐government web sites’ accessibility level. In addition, it may be useful for comparing the accessibility level of different web sites and creating ranking lists as an accessibility observatory. Vigo et al. (2007) report a study on the behaviour of the Web Accessibility Qualitative Metric (WAQM) when evaluating the accessibility of web sites with different tools, EvalAccess and LIFT. Using automated tools for testing accessibility is seen by many as one of the most effective way to cope with accessibility, since only by using tools (possibly in addition to human judgment) the solutions outlined in previous scenarios can be made viable. In the study, 1363 web pages from 15 web sites were evaluated against WCAG 1.0 using both tools and measured using WAQM. The conclusion was that values produced by WAQM on the basis of data obtained by the two different tools are completely different, but there exists a strong correlation; Spearman’s correlation test on the accessibility index produced by WAQM leads to ρ(1363)=0.719 with a high significance level (p