Interact 97 - Semantic Scholar

13 downloads 51999 Views 145KB Size Report
of a User Interface for A Cellular Phone ... ment process compared with desktop software sys- tems. ... recommended to use quantitative goals for usability.
Using Quantitative Usability Goals in the Design of a User Interface for A Cellular Phone

Timo Jokela*, Jani Pirkola Nokia Mobile Phones P.O. Box 50 FIN-90570 Oulu Finland (timo.jokela, jani.pirkola)@nokia.com * Also: University of Oulu, Dept. of Information Processing Science We experimented the use of quantitative usability goals in our project where we designed a user interface concept for a mobile phone. A special challenge was that the usability evaluation had to be done at an early milestone of the project – at a stage where no working prototypes were available - so that the set of keys and type of display could be decided early. We used two usability attributes (average efficiency, overall usability). We defined the usability goals in relation with a reference product based on these two attributes, and measured the level of usability of our design in the end of the project through expert evaluation and keystroke level analysis. The experience was mainly positive. The use of quantitative usability goals gave a clear direction to the work in the project, and evaluation of our design was relatively easy and fast. The evaluation results indicate that we well met the targets. The main open question is the validity of the evaluation results. Even if we feel that we did achieve good usability in our design, we can be sure of that only after the design is evaluated with true usability tests with working products. Keywords: usability, case study, measurements, efficiency

1. INTRODUCTION Small size is characteristic to the user interface of cellular phones. Small size means tough restrictions on the design of the user interface. The display is small, and the means for input is limited. Another characteristics of cellular phones are that they are mechanical devices. While phones are manufactured in very big quantities, there are very high requirements for the quality of mechanical design. In order to give time for industrial and mechanical design, one has to make a decision about two elementary components of user interface - the set of keys and the type of display - very early in the development life cycle. Especially the early decision

about the keys makes a difference in the development process compared with desktop software systems. A design issue that comes typically in the detailed design phase in the development of the user interface is something that needs to be decided very early in the development of cellular phones. The need for this kind of early decision making sets special requirements for the user interface development process. The decision about the set of keys and type of display should be made at an early milestone of the development project when there are no working prototypes, nor software available. In a typical case, we do not even have time to build mock-up simulations that run on a computer. Thus, the challenge is that we need to make a decision

about fundamental characteristics of the user interface concept at a phase of development where no extensive usability tests can be made in a traditional way. It is generally regarded as a good quality and project management practice to have quantitative goals for product characteristics (e.g. Wesner, 1994). Clear goals give a project a clear direction. It is also recommended to use quantitative goals for usability. For example, the evolving standard (ISO 13407, 1999) states that one should specify ‘clear statement of human-centred design goals and provide measurable benchmarks in the requirements phase’. Traditionally usability has not been among the quantitative goals in the product development programs at Nokia Mobile Phones. Either it has not been regarded relevant to define usability goals, or it has been experienced too difficult to do that. In a previous user interface development project, the usability goals were defined qualitatively. The experience, however, was negative: the evaluation of the design was found very problematic. Qualitative goal setting meant that the evaluation was practically done based on subjective opinions.

main usability attributes: learnability, efficiency, errors, remembering, and satisfaction. The main restriction was that the goals should be of high level enough. This was a requirement from the management board of the project. Typically, there are five quantitative goals set to a project. While the usability goals are not only the ones in a project, the number of usability goals was limited to be less than four. In addition, there were a number of different factors that we had to take into account when selecting the attributes: •

-

Consequently, this time there was a clear demand from the management board of the project for quantitative usability evaluation to help decision making. "There needs to be quantitative quality goals for the project, to give the project the right direction and to make the objective evaluation of the quality of the concept feasible". In this paper, we describe how we used quantitative usability goals as a part of this kind of development process. We discuss how we defined quantitative usability goals for the milestone where the decision about the set of keys and display had to be made; how we defined the usability attributes and goals based on these attributes, how the measuring was done and what were the lessons learnt.

As the first step, we needed to define what are the relevant usability attributes, how to define unambiguously the usability goals. We had to select a set of appropriate attributes, from the many potential ones. For example, Nielsen proposes (Nielsen, 1993) five

calling by entering digits manually calling to a person from phone book using quick dialling to call inserting a new entry into phone book editing name of number in the phone book answering a call answering to a waiting call swapping between two calls.

In addition, when we take into account all the other applications that are typical to an advanced cellular phone - e.g. voice messaging, text messaging, different settings (ringing tone, volume) and modes of operation, call transfer and forwarding functions, calendar, the new data applications - the number of tasks is very high. The high number of tasks means that measuring usability cannot be based on a limited set of tasks only; it would not give valid data about the total usability of the phone. •

Another characteristic of cellular phones is that they are products that are used frequently, many times a day. This means that efficiency - every day use - is a critical usability attribute. This aspect should be taken into account when selecting the attributes.



The usability attributes should be such that they can be practically measured early in the life cycle. We should be able the have usability measured in a situation where we do not have a functioning prototype of the product.

2. DEFINING THE USABILITY ATTRIBUTES 2.1 Requirements for the Usability Attributes

A big number of different end user tasks is characteristic to cellular phones. For illustration, one can identify many tasks related to the basic phone functions:



The attributes should be defined so that measuring at this stage is feasible.

well as the reference concept would be evaluated in the same scale.

The goals should be relevant for the business case. They should direct the design of the UI to the right way to support business. This means that special attention should be paid on applications that are critical business wise.

With these two different attributes, we hoped to get a balanced picture of the total usability. Different attributes and different evaluators would bring credibility to the evaluation results, especially if their conclusions were in line with each other.

2.2 Selection of Attributes In the selection of attributes, our first decision was to set the attributes in relation with a reference product. This made the goals setting easier and the measuring itself became more practical as it was not necessary to have any previous benchmark data. After a number of brainstorming sessions and discussions with the management, we then came up with the following set of attributes: 1.

Average Efficiency Efficiency is a usability attribute that is relevant to the use of a phone in the daily life. Our strategy to set the goal for efficiency was as follows: •

• •

Identification of a representative set of time critical user tasks. Such tasks are for example those that users would perform frequently in their daily life. Comparing the efficiency of the new and the reference user interface in relation with each of these tasks. Using the average efficiency increase as the quantitative usability attribute.

We planned to measure efficiency using a simple version of keystroke level analysis method (Card, 1983). 2.

At the stage of goal setting, we had only preliminary ideas how to do measuring. The exact way of how to carry out measuring was done later.

3. SETTING USABILITY GOALS In goal setting, we identified two categories of applications (sets of functions that are in the phone). The focus applications were those where a clear improvement of usability was seen strategically important for the business; the baseline applications were those where usability is important but where a more modest increase of usability was acceptable. As a result, we identified four usability attributes to be measured: efficiency and overall usability for the two different categories of applications, focus and baseline applications. Setting of quantitative goals to these attributes was more guessing than science. The goals were set together with the management on the basis what we thought is realistic. It was anyway the first time when this kind of goal setting was done. We naturally set tougher usability goals for the focus applications than for the baseline applications. As a result we had the following kind of table for the goals:

Overall Usability This attribute describes the overall design of the user interface: how users learn to use the phone quickly; whether the design metaphors and paradigms are of good quality and use of keys is logical; the style remains consistent through the different applications and users do not make errors (and if they do, they can recover from them) etc. There was a practical reason for selecting this attribute: the way we planned to do the evaluation. Three usability experts (expert teams) would do expert evaluation independently, and give quantitative scoring based on their total view using a subjective scale 1 … 7. The new UI concept as

Applications Focus Baseline

Increase to Overall Usability (scale 1 .. 7) + 2 points +1 points

Increase to Average Efficiency +30 % +10 %

Table 1. Improvement goals in contrast with a reference product. (NOTE. The figures are for illustration purposes only.) The goals were to be measured against a reference product, which – again – was agreed together with the management.

4. PERFORMING THE EVALUATION After the goal setting, we carried out several iterations of designs. We produced different concepts, and made qualitative usability evaluations - mainly through paper prototyping. We were able to experiment different designs with the many different applications that would run on a cellular phone. We finally selected one of the several user interface concepts that we had experimented as a candidate for evaluation at the milestone. At this point, we did not have any quantitative data about the quality of the concept; the selection was based on the experience and vision we had gained through the number of different alternatives we had explored. We prepared a set of different scenarios of about how the user interface would behave in different situations and with different applications, and documented those in PowerPoint slides. We did not even aim for thorough specification and documentation of the user interface in the short period of time we had. However, we 'knew' in our minds the details of our design: how our design would work in the different applications and situations. The evaluation had to be carried out with these limitations. We had only limited documentation, but the knowledge that the team had was much broader. Our strategy was to use the team members as ‘prototypes’ for the evaluators. The way of doing that was presentations, answering to questions etc.

4.1 Measuring Efficiency To keep the calculations practical, we decided to measure the efficiency using the following procedure: 1. Define a representative set of efficiency critical tasks 2. For each task: - If our concept is more efficient, the score is 1 - If equal, then the score is 0 - If worse, the score is -1. 3. The total efficiency: (sum of scores/number of tasks) * 100% Following that procedure, we identified a representative set of efficiency critical tasks for both of the two categories. This was done together with representatives of product marketing and application specialists. The main criteria for the selection of the efficiency critical tasks were:

• •

tasks that are done frequently tasks that need to be done quickly.

As a result, we ended up with sets of tasks for both categories, each containing 20 … 30 tasks. Using the procedure described above, measuring of efficiency per task was rather straightforward: it was easy to determine which one of the two user interfaces provided a more efficient way of accomplishing a task. No detailed level keystroke level analysis calculation was required. The total effort to do the measuring was a couple of days. A usability expert (not belonging to the project) did comparison together with the key designers of the project. If not documented, the members of our design team explained how a specific task would be done with our design. - There existed detailed knowledge about the functionality of the reference concept.

4.2 Measuring Overall Usability We organised presentation sessions where our user interface and the reference concept were presented to the evaluators. We briefed the evaluators about our business case: what are the drivers, what is the goal user segment, what are the key applications. While we did not have a running prototype, we presented the behaviour through slides and by answering questions that the evaluators asked. We asked the evaluators to score the quality of the usability of both concepts in a subjective scale: 1: poor: would not recommend to anybody . . 7 excellent: would recommend to a colleague The scoring was asked separately for the focus applications and baseline applications. In addition, we asked the evaluators to give qualitative comments. While one reason was to get feedback to further improve our design, a very essential reason was to gain confidence to the validity of the evaluation. If the comments were in line with each other, it would give confirmation to the validity of the results.

5. RESULTS

Even if the drivers and goals of the project were well known by the project team, it was a bit of excitement to the team to wait for the results. The results, when they came out, indicated clearly that we had been successful: there was improvement in all of the areas. The efficiency measures were very clear: we got an exact score on how much better our design was compared with the reference. The evaluators produced also a table that shows all the tasks that were gone through, and the score of each task. The measures about the overall usability were very much in line with each other - even if the evaluators did the work independently, without any mutual communication. An interesting thing was that they used different usability evaluation methods: two teams used heuristic evaluation (Nielsen 1994) while one team used SUS (Brooke 1995). In addition, the qualitative feedback from the evaluators was very similar. This makes us believe that the results of the 'overall usability' are rather valid.

6. DISCUSSION This was our first time to use quantitative measures in the design of user interface style of a cellular phone. In that sense, it was experimental. We can find both positive and potentially problematic aspects in our approach. A very positive aspect simply was that it 'it could be done'. We did specify quantitative usability goals, we measured our design against these goals, and we got the results. We also found it very useful to have quantitative usability goals 'project wise'. Compared with some earlier user interface concept projects, we now had clear and shared view on the user interface concept what we are aiming at. Our approach was practical. The efficiency evaluation took just a couple of days. Expert evaluation required more effort, but still it was done in less than one week. The biggest potential problem is the validity of the usability results. If we look at the results, they indicate that our design was a success. We reached our goals. The problem, however, is that evaluation was done on one had at rather abstract level, on another

hand with numeric calculations. There is a clear risk that these evaluations do not tell the whole truth. To do the reliable usability evaluation, one naturally needs to do the evaluation using real products and real users, which is a future issue. Anyway, we think that we did capture some essential usability in our measures. We covered quite a large spectrum of user tasks in two simple measures. At this stage, we 'feel' that we really gained good results in usability. Another positive thing was that quantitative measures are a very good means of communication to the management (who are not usability professionals). For example, explaining the results of efficiency evaluation by saying that 'n % of user tasks are quicker to do with our design than with the reference product' made sense. When telling the names and positions of the usability experts were gave credibility to the results of expert evaluation. Naturally, there is a lot of space for improvements. A more sound means should be established for defining the efficiency critical user tasks. In this project, the tasks were identified in a brainstorming session. In the future, those should be based more on true data received from user studies. In expert evaluations, we organised presentation sessions to introduce the concepts. The main feedback from the experts was that it had been easier if we had given some preliminary documentation to them a couple of days earlier. Afterwards, we have also been thinking of some other attributes that could have been used instead of (or in addition to) efficiency and overall usability. One such attribute could have been the number of major usability problems and the goal – naturally – to decrease the number of these major usability problems. This attribute might have been more laborious to measure. There also might have been some contradictory opinions: what is a major usability problem and what is not. However, communication with this kind of attribute could probably be even more tangible to the management than was with the attribute we used. By claiming that we have been able to overcome such and such a number of major usability problems with this new design compared with the old user interface should make clear sense to anybody.

REFERENCES Brooke, J. SUS – A quick and dirty usability scale. (1995). In: Usability Evaluation in Industry, Pat Jordan (ed.), Taylor&Francis, Card, S. K., Moran, T.P., and Newell, A. (1983) The Psychology of Human-Computer Interaction. Hillsdale, NJ: ISO 13407: Human centred design processes for interactive systems. 1999. Nielsen, J. (1993). Usability Engineering. AP Pro-

fessional. Nielsen, J., Mack, R. (1994). Usability Inspection Methods John Wiley & Sons, New York. Wesner J., Hiatt J., Trimble D. (1994). Winning with Quality. Applying Quality Principles in Product Development. Addison Wesley. Reading, Massachusetts.