The BenToWeb Test Case Suites for the Web Content Accessibility ...

1 downloads 12640 Views 73KB Size Report
The Web Content Accessibility Guidelines 1.0 are a de facto standard for the accessibility ... support this transition by developing new tools and test case suites for WCAG 2.0. The suites .... monitoring the completeness of a test suite,. • tracking ...
The BenToWeb Test Case Suites for the Web Content Accessibility Guidelines (WCAG) 2.0 Christophe Strobbe1, Johannes Koch2, Evangelos Vlachogiannis3, Reinhard Ruemer4, Carlos A. Velasco2 and Jan Engelen1 1

Katholieke Universiteit Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee-Leuven {Christophe.Strobbe, Jan.Engelen}@esat.kuleuven.be 2 Fraunhofer-Institut für Angewandte Informationstechnik (FIT), Schloss Birlinghoven, D53757 Sankt Augustin, Germany {Johannes.Koch, Carlos.Velasco}@fit.fraunhofer.de 3 University of the Aegean, Voulgaraktonou 30, GR11472 Exarchia, Athens, Greece [email protected] 4 University of Linz "integriert studieren - integrated study" (i3s3), Altenbergerstrasse 69, A-4040 Linz, Austria [email protected]

Abstract. This paper presents work carried out under the umbrella of the EUfunded project BenToWeb to develop XHTML test case suites for three drafts of WCAG 2.0 (June 2005, April 2006, May 2007). These suites of test cases demonstrate pass and failure examples for WCAG 2.0 and its accompanying Techniques document. The test cases were validated during the BenToWeb project and are currently being migrated to the WAI Test Sample Development Task Force, where the work will be continued. Keywords: web accessibility, test case, test suite, Web Content Accessibility Guidelines, WCAG, BenToWeb.

1 Introduction The Web Content Accessibility Guidelines 1.0 are a de facto standard for the accessibility of web sites: they have become the basis for legislation in various countries (for example, BITV in Germany 1 ), for evaluation methods and accessibility labels, and for many tools, ranging from browser toolbars 2 to enterprise-level tools that can be plugged into content management systems. However, WCAG 1.0 dates from 1999 and is meant to be succeeded by WCAG 2.0. As a consequence, many evaluation methodologies and tools will need to be updated. The BenToWeb project 3

1

Verordnung zur Schaffung barrierefreier Informationstechnik nach dem Behindertengleichstellungsgesetz (Barrierefreie Informationstechnik-Verordnung) (BITV): http://www.bmi.bund.de/cln_012/nn_122688/Internet/Content/Gesetze/B/BITV.html 2 See for example the Web Accessibility Tools Consortium (WAT-C): http://www.wat-c.org/ 3 http://www.bentoweb.org/

(September 2004-September 2007, co-funded by the European Commission) aimed to support this transition by developing new tools and test case suites for WCAG 2.0. The suites of test cases developed by BenToWeb have several goals. First, they support the Web Accessibility Initiative’s (WAI) working groups in the development of support documents, such as technology-specific techniques, for WCAG 2.0. The test files prove whether techniques can be implemented, and help find out ambiguities and loopholes in WCAG 2.0 documents. Second, test case suites can be used to benchmark accessibility evaluation and repair tools (ERT), where previous attempts ([1,8,9,10]) did not have a validated basis for comparing such tools [13]. The BenToWeb test case suites were developed for three drafts of WCAG 2.0: June 2005 [2], April 2006 [3] and May 2007 [4]. They focus on XHTML and CSS, and use other technologies where necessary (JavaScript, images, audio, video, applets). The three versions of the test case suite are publicly available on the BenToWeb website 4 : each version has its own home page with a listing of the test cases; for each test case, there is a link to the XML metadata and an HTML view of the metadata; the HTML view of the metadata contains links to the test files. The third version of the test case suite is being migrated to the WAI Test Samples Development Task Force 5 , a joint task force of the Evaluation and Repair Tools Working Group (ERT WG) and the Web Content Accessibility Guidelines Working Group (WCAG WG), in which BenToWeb partners are also represented.

2 Test Suite Development

2.1 The Structure of a Test Case For each WCAG 2.0 success criterion, BenToWeb created at least two test cases. In the XHTML test suites, a test case is an XHTML file (sometimes multiple files) that either passes or fails the success criterion, and an accompanying test case description that contains all relevant metadata about the XHTML file. These metadata are specified in an XML vocabulary specially developed for this purpose: Test Case Description Language (TCDL) [14] 6 . For each individual test case, the metadata specify the following: • formal metadata, such as author, date, title and a short description; • technologies used in the test case (HTML, XHTML, CSS, ECMAScript, etcetera), possibly identifying specific features used of these technologies; • test mode (that is, validation through end-user testing, expert evaluation or tool-based evaluation) and test scenarios for end-user evaluation; • the “rules” (that is, WCAG 2.0 success criteria) that are tested, possibly with references to relevant WCAG 2.0 techniques or failures; 4

http://www.bentoweb.org/ts http://www.w3.org/WAI/ER/tests/Overview 6 See also the RDDL file at http://www.bentoweb.org/refs/TCDL1.1/ for more technical details. 5



namespace mappings for XPath expressions used elsewhere in the metadata file.

2.2 The Life-Cycle of a Test Case The test case author first creates the test file or files and writes the metadata sections for formal metadata, technologies, “rules” and namespace mappings; he can also make a proposal for the test section. A member of the evaluation team then reviews the test case and adds or completes the test-related information. The reviewer uses an expert system (“Parsifal”) to edit test-related sections and to add and translate test scenarios. At this stage, he or she can send the test case back to the author and request changes. Test cases that require end-user evaluation are loaded into the web-based user-testing framework (“Amfortas”) and presented to end users with a matching user profile. After end-user testing, the test results are summarized and fed back to the development team. (The tools used in the evaluation process of the BenToWeb test suites were discussed in [7]. Amfortas, the web-based back-end of the user testing framework, will be made available as open source code.) 2.3 The Evolution of the Test Case Suites The BenToWeb test case suites were developed for three drafts of WCAG 2.0: June 2005 [2], April 2006 [3] and May 2007 [4]. The first version contained 477 test cases (for a WCAG 2.0 draft with 67 success criteria), the second version contained 602 test cases (for 56 success criteria), and the third version 609 test cases (for 56 success criteria). After the publication of the last two drafts, the test cases needed to be remapped from the success criteria in the preceding draft to those in the newer draft. The remapping of test cases took one of the following forms: • if several success criteria were folded into a single success criterion, the test cases were renamed to match the single resulting success criterion; • if a success criterion was split into several success criteria, the test cases were divided over the resulting success criteria depending on the purpose of the test cases; • if a success criterion was removed, the corresponding test cases were deleted, unless they could be remapped to another relevant success criterion, • if a success criterion was removed or changed (including a level change), the test cases were renamed to match the resulting success criterion; • if a success criterion was added, it initially had no test cases. The remapping always reduced the number of test cases: from 477 to just over 450 during the first migration, and from 602 to 480 during the second migration. Each of the remaining test cases needed to be reviewed for relevance to the matching success criterion, and for accuracy of the test purpose and the fail/pass statement if the success criterion had changed. After this review, new test cases were created. All test cases then went through the review process outlined above (and in chapter 4 of [12]).

2.4 Coverage of WCAG 2.0 and “Techniques and Failures for WCAG 2.0” For each WCAG 2.0 success criterion, the XHTML test suite contains at least one test case that fails and at least one test case that passes. Some success criteria, however, have more than fifty test cases. The variability in the number of test cases per success criterion is often related to the number of XHTML elements or attributes that can be used to pass or fail a success criterion: for example, this number is much higher for Success Criterion 1.3.1 (“Information and relationships conveyed through presentation can be programmatically determined or are available in text, and notification of changes to these is available to user agents, including assistive technologies.” [4]) than for Success Criterion 3.1.4 (“A mechanism for finding the expanded form or meaning of abbreviations is available.” [4]). On the one hand, HTML contains many presentational elements that can be abused to suggest structure (for example large font size for headings) and semantic elements that can be abused for their presentational effect (for example, the blockquote element for indented text), hence the large number of test cases for Success Criterion 1.3.1. On the other hand, HTML has only two elements for marking up abbreviations and acronyms, hence the lower number of test cases for Success Criterion 3.1.4. In the second version of the test case suite, authors added references to relevant WCAG 2.0 techniques and failures; in the third test case suite, this was done systematically and in a machine-readable way. This allowed the project to monitor coverage of the techniques and failures for WCAG 2.0 [5]. In the last version of the test case suite (for the May 2007 working draft), 370 out of 609 test cases contain a mapping to one or more WCAG 2.0 techniques or failures, covering 117 out of the 220 techniques and failures for WCAG 2.0. Not all test cases can be mapped to a WCAG 2.0 technique or failure because many test cases illustrate techniques or failures that are not covered in the WCAG documents. “Techniques for WCAG 2.0” [5] does not aim to be exhaustive; in fact, the version of May 2007 has no failures for 29 out of the 56 success criteria [17]. During its work, the development team ran into several issues, which were reported back to the WCAG Working Group (see examples in [13] and [15]).

3 Guidelines for the Development of Test Suites The experience with developing and evaluating test case suites gathered during the BenToWeb project also resulted in a set of guidelines [16]. These guidelines cover test case metadata, types of test cases, scope of a test suite, criteria for completeness of a test suite, the validation process and aspects of end-user evaluations. 3.1 Guidelines Regarding Scope and Completeness of the Test Case Suite It is important to define the scope of the test case suite, in other words, what exactly it is meant to cover. A test case suite for a set of accessibility guidelines can contain different types of test cases, depending on the number of files and technologies (HTML, CSS, JavaScript, etcetera) used in a test case, and on the number of success

criteria or checkpoints it refers to. When developing an HTML or XHTML test case suite, it is important to define if and how other technologies such as CSS, JavaScript, video and audio can be used. Since there are always multiple success criteria that can be made relevant to every test case, it is also important to define how this will be handled. For example, TCDL 1.1 allows a distinction between “primary” success criteria and “secondary” success criteria: the primary success criteria were those against which test cases were reviewed and evaluated, whereas references to secondary success criteria are optional and merely informational. It is also necessary to define which human languages will be covered, especially for those sections in the accessibility guidelines that address language issues, such as Guideline 3.1 in WCAG 2.0. The choice of languages also has implications for the validation process, especially if this process involves end-user testing. There are several ways to define “completeness” of a test suite. A test case suite for WCAG 2.0 can be complete or incomplete with reference to different criteria, for example: • for each success criterion, there must be a test case that covers each relevant features of HTML, with at least one “fail” and one “pass” example; or • for each WCAG 2.0 technique or failure that applies to HTML, there must be at least one test case. A test suite can be complete in the former sense without being complete in the latter, and vice versa. 3.2 Guidelines Regarding the Validation of Test Cases A validation or review process defines the phases that a test cases needs to go through before it is finally accepted as a valid test case. This process defines several states, and input and output channels for each state. The validation process needs to be shown to be complete and unambiguous. It is good practice to test the validation process by running a few example test cases through the complete process. (For BenToWeb’s validation process, see chapter 4 in [12].) An evaluation process can also involve end users. There may be several reasons for doing this, for example for comparative evaluation of test cases that use different techniques to address the same accessibility issue or WCAG success criterion, for evaluating test cases that are too “exotic” to be covered in a technical specification, or for evaluating test cases that may lead to different outcomes depending on the support by user agents and assistive technologies. The decision to involve end users affects the metadata format and the tools that are used to develop the test cases. For example, in BenToWeb, the metadata format TCDL [14] defined several types of questions (yes-no questions, open questions, Likert scale, …) that can be presented to users as part of a “scenario”, it allowed test case validators to define the disabilities and the combinations of user agents and assistive technologies that are needed to go through the scenario, and the validation process defined a state (tracked in TCDL) in which feedback from end-user testing could be integrated into the test case.

3.3 Guidelines Regarding Metadata Metadata, regardless of how they are stored (in a database, as XML, as RDF, etcetera), need to support the development and validation process of the test suite. A good metadata format will support a number of important tasks such as: • monitoring the completeness of a test suite, • tracking the status of test cases, • evolving the metadata format to support changing needs or to better support existing needs. The W3C Quality Assurance Working Group has published a note that defines 14 metadata elements [6]. BenToWeb’s TCDL 1.1 implements all but one of these elements, but also goes beyond that set because of additional requirements such as end-user testing. It is also important to define the purpose and meaning of each metadata element. For example, BenToWeb’s TCDL format uses elements such as ‘title’, ‘description’ and ‘purpose’ and defines the following rules for these elements. • The title should be sufficiently descriptive to allow a quick identification of the issue illustrated by the test case. It does not need to be unique. It must not be formulated as a guideline (for example: “All data tables must have a summary”). • The description contains a summary of the test materials and how they are to be sued in the test case. It indicates to the accessibility expert/scenario author what to expect and what will happen when a user interacts with the test materials. The importance of accurate descriptions can not be underestimated because bad descriptions cause many requests for clarifications, invalid scenarios and other problems that slow down the validation process. • The purpose contains a description of the intention of the test materials. The purpose contains an explanation of the expected evaluation result in regard to the relevant rules, checkpoints or success criteria. It does not need to repeat the “rule” or success criterion for which the test case is developed. When a metadata format needs to support the development of several test case suites or multiple versions of a test case suite, one should expect the requirements of the metadata format to evolve. If the metadata format is defined in XML, one will need to consider the following questions. • Should the schema (DTD, XML Schema or other format) be updated or should it be replaced with a new one? • If the schema does not need to change drastically and is just updated, what is the impact on existing metadata, both those in the current test case suite(s) and in previous test case suite(s)? • If the schema is replaced with a new one, will metadata for previous test case suites be migrated to the new format or not? • What is the impact of the changes to the tools (including XSLT) that support the metadata format?

4 Future of the Test Case Suites In order to support the WCAG Working Group and the transfer of test cases from BenToWeb to the Web Accessibility Initiative, it was necessary to set up a structure that enables this transfer of materials. In 2006 Web Content Accessibility Guidelines Working Group (WCAG WG) and the Evaluation and Repair Tools Working Group (ERT WG) set up a joint task force, the WCAG 2.0 Test Samples Development Task Force (TSD TF). The objective of this task force is “to develop test samples for WCAG 2.0 Techniques (content examples that demonstrate correct or incorrect implementation of WCAG 2.0 Techniques)” [19]. The task force started meeting in July 2006, with considerable participation by BenToWeb partners. Before starting work on test cases (or “test samples”), the task force needed to define a metadata format, a review process and a test case management system. BenToWeb’s TCDL 1.1 was adapted for this purpose by removing conventions and values specific to BenToWeb, incorporating more Dublin Core metadata and EARL pointers (HTTP Vocabulary in RDF [11]), and by adding more support for internationalization and extensibility. This resulted in TCDL 2.0 [18], a subset of which is being used by the task force [21]. The task force has also defined a review process [20]. Before the publication of a new working draft of WCAG 2.0 on 11 December 2007, participants of the BenToWeb project migrated almost 200 test cases to the task force’s repository. The publication of newer working drafts of WCAG 2.0 affects the migration process of the remaining test cases, and requires co-ordination between the task force and the WCAG WG in order to prioritize the migration of test cases for those success criteria that are most stable. The test samples may later also help in the identification of “accessibility supported technologies”, a concept introduced in the 17 May 2007 draft of WCAG 2.0 as a replacement for “baseline”. Acknowledgments. This work was undertaken in the framework of the project BenToWeb — IST-2-004275-STP — funded by the IST Programme of the European Commission.

References 1. Brajnik, G.: “Comparing accessibility evaluation tools: a method for tool effectiveness.” Univ Access Inf Soc 3: pp. 252-263 (2004). DOI: 10.1007/s10209-004-0105-y 2. Caldwell, B., Chisholm, W., Slatin, J., Vanderheiden, G., White, J. (eds.): “Web Content Accessibility Guidelines 2.0. W3C Working Draft 30 June 2005.” (2005) http://www.w3.org/TR/2005/WD-WCAG20-20050630/ 3. Caldwell, B., Chisholm, W., Slatin, J., Vanderheiden, G. (eds.): “Web Content Accessibility Guidelines 2.0. W3C Working Draft 27 April 2006.” (2006) http://www.w3.org/TR/2006/WD-WCAG20-20060427/ 4. Caldwell, B., Cooper, M., Guarino Reid, L., Vanderheiden, G. (eds.): “Web Content Accessibility Guidelines 2.0. W3C Working Draft 17 May 2007.” (2007) http://www.w3.org/TR/2007/WD-WCAG20-20070517/ 5. Caldwell, B., Cooper, M., Guarino Reid, L., Vanderheiden, G. (eds.): “Techniques for WCAG 2.0: Techniques and Failures for Web Content Accessibility Guidelines 2.0. W3C

Working Draft 17 May 2007.” (2007) http://www.w3.org/TR/2007/WD-WCAG20-TECHS20070517/ 6. Curran, P., Dubost, K. (eds.): “Test Metadata. W3C Working Group Note 14 September 2005.” (2005) http://www.w3.org/TR/2005/NOTE-test-metadata-20050914/ 7. Herramhof, S., Petrie, H., Strobbe, C., Vlachogiannis, E., Weimann, K., Weber, G., Velasco, C. A.: “Test Case Management Tools for Accessibility Testing.” In: Miesenberger K. et al (eds). ICCHP 2006, LNCS, vol. 4061, pp. 215-222. Springer, Heidelberg (2006) 8. Ivory, M. Y., Sinh, R. R., Hearst, M. A.: “Empirically validated web page design metrics.” Proceedings of the Conference on Human Factors in Computing Systems (Seattle, WA, March), pp. 53—60. ACM Press, New York, NY (2001) 9. Ivory, M. Y., and Chevalier, A.: “A Study of Automated Web Site Evaluation Tools.” Technical Report UW-CSE-02-10-01. University of Washington, Department of Computer Science and Engineering (2002). ftp://ftp.cs.washington.edu/tr/2002/10/UW-CSE-02-1001.pdf 10.Ivory-Ndiaye, M. Y.: “An Empirical Approach to Automated Web Site Evaluation.” Journal of Digital Information Management, 1 (2), June 2003, pp. 75-102. 11.Koch, J., Velasco, C. A., Abou-Zahra, S. (eds.): “HTTP Vocabulary in RDF. W3C Working Draft Note 23 March 2007.” (2007) http://www.w3.org/TR/2007/WD-HTTP-in-RDF20070323/ 12.Petrie, H. (ed.): “Evaluation and Validation Report for Test Suite (Second Iteration).” BenToWeb deliverable D 3.7b (2007). http://webcc.fit.fraunhofer.de/downloads/projects/bentoweb/deliverables/BenToWeb_D3.7b _rev.pdf 13.Strobbe, C., Herramhof, S., Vlachogiannis, E., Koch, J., Velasco, C. A.: “The BenToWeb XHTML Test Suite for the Web Content Accessibility Guidelines 2.0” In: Miesenberger K. et al (eds). ICCHP 2006, LNCS, vol. 4061, pp. 172-175. Springer, Heidelberg (2006) 14.Strobbe, C., Herramhof, S., Vlachogiannis, E., Velasco, C. A.: “Test Case Description Language (TCDL): Test Case Metadata for Conformance Evaluation” In: Miesenberger K. et al (eds). ICCHP 2006, LNCS, vol. 4061, pp. 164-171. Springer, Heidelberg (2006) 15.Strobbe, C., Engelen, J., Koch, J., Velasco, C. A., Vlachogiannis, E., Ortner, D.: “The BenToWeb XHTML 1.0 Test Suite for the Web Content Accessibility Guidelines 2.0 - Last Call Working Draft.” HCI International 2007, LNCS, vol. 4556, pp. 160-166. Springer, Heidelberg (2007) 16.Strobbe, C. (ed.): “Guidelines for the Development of Test Suites.” BenToWeb deliverable D4.6 (2007) http://webcc.fit.fraunhofer.de/downloads/projects/bentoweb/deliverables/BenToWeb_D4.6. pdf 17.Strobbe, C.: “WCAG 2.0 success criteria without failures.” http://lists.w3.org/Archives/Public/public-wai-ert-tsdtf/2007Dec/0019.html 18.Strobbe, C. (ed.): “Test Case Description Language 2.0: Specification and Guide.” http://www.bentoweb.org/refs/TCDL2.0.html 19.“WCAG 2.0 Test Samples Development Task Force (TSD TF) Work Statement.” (2006) http://www.w3.org/WAI/ER/2006/tests/tests-tf 20.“WCAG 2.0 Test Samples Development Task Force (TSD TF) Review Process.” http://www.w3.org/WAI/ER/tests/process 21.“WCAG 2.0 Test Samples Metadata.” http://www.w3.org/WAI/ER/tests/usingTCDL