Empirical Study of Software Developers' Experiences

66 downloads 206 Views 1MB Size Report
of the software development lifecycle, including coding, implementation, and .... Other concerns were expressed about the CASE tool's real world match: Some.
Empirical Study of Software Developers’ Experiences R. Kline and A. Seffah Human-Centered Software Engineering Group Psychology Department and Computer Science Department, Concordia University 1455 Maisonneuve Blvd. W., Montréal, Québec, H3G 1M8, Canada E-mail: {r_kline, seffah}@cs.concordia.ca Abstract There is evidence that CASE tools do not completely meet the goal of their diverse users. Part of the problem seems to be a gap between how programs are represented and manip ulated in the user interfaces of CASE tools and the e x p e r i e n c e s of software develo pers, i ncludin g maintainers, testers and pro grammers. The empirical study presented in this paper uses two different methods to measure the experiences of skilled and novice developers with the same CASE tool for C++. The methods include h euristic evaluation conducted with the experienced developers and psychometric evaluation conducted with both groups. The results indicate that experienced and inexperienced developers reported similar kinds of pro blems, including poor program learnability, difficulties with the visibility and usefulness of progra m functio nalities, and ambig uous erro r and he lp messages. These findings a re discussed in relation to other empirical results about developers’ experiences with CA SE tools.

1. Introduction Integrated CASE tools are intended to support all phases of the software d evelopm ent lifecycle, including coding, implementation, and mainte nance. U nfortunately, there is evidence that these integrated developmen t environme nts do not always fulfill these goals. For example, Iivari [1] found that the best sin gle negative predictor of CASE tool use is perceived voluntarin ess. T hat is, if developers perceive that use of a CA SE tool is vo luntary, they tend not to use it. Results of other surveys by Seffah and Rilling [2] and Jarzabek and Huang [3] suggest that even highly experienced developers often use on ly a small number of CASE tools capabilities out of the total set available. Either developers are un aware that these functionalities exist or they do not know how to apply them in a different context. For example, a developer may know how to use correctly use a visual debugger tool during development, but they m ay not know how to use it during maintenance to understand the structure of the application from the exe cution grap h. Developers in the same surveys cited above also describe CASE tools as relatively difficult to learn. They also cited

numerous problems with the user interfaces (UI) of CASE tools. Included amon g these problems a re unclear error messages and non-intuitive organization of icons, menus, or toolbars (i.e., low affordance). They also reported that graphical representations intended to decrease memory load often h ave just the op posite effect. Findings similar to those just summarized above have been interpreted as evidence for a cognit ive gap between developers’ experiences and how CASE tools are organized in terms of how their users interact with them [1-3]. Understanding developers' experiences with extant tools is an importa nt step towards developing more human-cen tric CASE tools. The obvious question we are addressing here is how developer experiences can be quantified or appre ciated. In wha t follows, we present a measurement-oriented framework for understanding and quantifying developer experience. As an example, we consider the case of a widely-used CASE tool, Microso ft Visual C++ 6.0.

2. Measuring Developer Experiences There are several different methods t o measure developer experiences with CASE tools, including ethnographic interviews, laboratory observation, remote testing, measurement of formal usability metrics, psychome tric evaluation, and heuristic evaluation. Interview methods can be rather subjective, and laboratory testing or remote testing can be expensive or difficult to im plement. Heuristic evaluation an d psychom etric evaluati on both have the advantage of being relatively low cost. In heuristic evaluation, about 5-10 experienced users evalu ate a program’s user interface against a set of accepted principles, or heuristics [4]. Early lists of heuristics were lengthy and rather difficult to apply, but Nielsen [4] proposed a relatively small set that was specialized for use in this study—see Table 1. Psychometric evaluation in w hich develo pers com plete objective questionnaires offers a more systematic wa y to measure and com pare exp eriences ac ross different developers and CASE tools. The questionnaire selected for this study, the Software Usability Measurement Inventory (SUMI; [6]), facilitates these ends through its multi-factor form at, generally exc ellent psycho metric

1

characteristics, and relatively lar ge norma tive sample. The 50-item SUM I measures five different usabili ty areas, includin g affect (whether users like the program), program helpfulness, learnability, efficiency, control (whether users feel like they know what the progra m is doing), and over all satisfaction. Ra tings of indiv idual developers can be compare d against thos e in the SUM I’s normative sample in a m etric where a sc ore of 50 is average, the standard deviation is 10, and higher scores indicate greater satisfaction. The SUMI scoring program also identifies individual items wh ere the respo ndents report statistically lower levels of satisfaction. Inspection of these critical items m ay identify specific kinds of usability problems. Heur istic

Definition

Visibility

Informs user about status.

Real World Match

Speaks the developer’s language.

Consistency

User feels in control of what they are developing (the program), not just controlling the CASE tool

Error Handling

Clear erro r mess ages and recovery.

Rec ognit ion, n ot Re call

Minimizes memory load.

Flexibility

Suits users of varying experience.

Minimalist Design

Shows only relevant information.

Rele vant H elp

Easy to search and find information

Table 1. Definitions of heuristics (adapted from [5]). Unlike heuristic evalua tion, small samples are inadequa te for psychometric method s. This is because results from small samples tend to be statistically unstab le due to samplin g error. As a rule of thumb, there should be at least one subject (user) for each item on a questionnaire. The SUMI has 50 items; thus, a sample of at least 50 subjects is needed in order to place much confidence in the stability of the results. Psychometric evaluation is also more suitable for novice developers. For this reason, we used in this study both methods with a group of experienced developers and only psychometric evaluation with a larger sample of novice d evelopers.

early thirties, and they had on average about 10 years of general experience with computers and about 5 years of professional programming experience. All reported that they were very familiar with the C++ CASE tool they rated. The inexperienced developers made up a comparison group. These 54 individuals were enrolled in a undergra duate computer science program (64% male, 36% female; average age = 27.3 years), taking an advanced course in C++ programming, and used the same CASE tool rated by the experienced developers. A total of 75% of the students reported using computers for over 3 years, 18% fo r 1-3 years, and only about 6% for less than 6 months. The average student reported using the C++ CASE tool an aver age of 3-10 hours a wee k.

3.2 Procedure The experienced developers completed a rating form with detailed definitions of the nine heuristics listed in Table 1. The rating form also solicited comments about program usability in each area or about usability problems not mentioned by any of the heuristics. The experienced users completed the SUMI at the same time. All questionnaires were completed anonymously. The inexperienced developers also comp leted the SU MI ano nymously. T heir participation in this research project was voluntary and had no bearing on their course standing.

4. Results 4.1 Heuristic evaluation

3. Method

Review of the comments by the experienced developers generally indicated few usability prob lems for the co ntrol, flexibility, and minima list design heuris tics (Table 1). Other areas did n ot fare so well. A notable ex ample is error handling for w hich many problems were noted, such as ambiguous messages and poor system error representation. Software developers using the C++ CASE tool for testing generally also complained of insufficient information about recovering from errors. Concerns about inadequate feedback from the C++ CASE tool were also noted for the visibility heuristic. Other concerns were expressed about the CASE tool’s real world match: Some of its dialogs use sp ecific technical te rminology or are ambiguously worded.

3.1 Par ticipants

4.2 Psychometric evaluation

A total of seven experienced de velopers p articipated in the heuristic evaluation of a widely-used CASE tool for C++. These sa me partic ip ants als o co mpleted the SU M I questionnaire about this CASE tool. Most of these users had similar backgrounds and demographic c haracteristics. All but one were male, most were in their late twenties or

4.2.1 Experienced developers. Presented in Figure 1 is the median SUMI satisfaction profile for the experienced developers. This profile suggests about average o verall satisfaction with the C++ CASE tool. Specifically, the experienced develop ers’ descriptio ns of the efficiency, helpfulness, and sense o f control in using the tool are all 2

about average compared to the SUMI's normative sample. However, they expressed below average satisfaction with the program's learnability; that is, they described it as difficult to learn. The content of critical SUMI items was generally consistent with problems identified in the heuristic evaluation by the same expe rienced develop ers.

difficulties learning new functions, trouble movin g data in and out of the program, and disruption of a preferred way of working.

5. Discussion It is not surprising tha t experienc ed deve lopers rep orted greater overall satisfaction with the CASE tool for C++ than inexperienced developers. Of greater interest are the areas of agreement: Both groups described the CASE tool as difficult to learn and mentioned concerns about lack of understandable on-screen information. These points of agreement are also consistent with results of other studies of programmer experiences of CASE tools [1, 2, 3]. The latter suggests that the find ings of this study ma y not be specific to the particular C++ CASE tool evaluated here.

Figure 1. Satisfaction profile for experienced developers. 4.2.2 Inexperienced developers. Presented in Figure 2 is the median SUM I satisfaction pro file for the inexperienced developers. As a gro up, they expressed less satisfaction with the C++ CASE tool than the experienced developers. Specifically, the only median scores for the inexperienced developers that were about avera ge are in the areas of affect and program helpfulness. However, the below average score of about 40 on the learnab ility scale is comparable to that of the experienced de velopers. Thus, both groups d escribed the C++ CASE tool as relatively difficult to learn. The inexpe rienced develop ers also reported additional problems in the areas of global satisfaction, control, and efficiency. Results of nonpara metric statistical tests paint a similar picture: The group medians d o not differ statistica lly on the affect, helpfulness, learnability scales at the .05 level, but they do on the rest of the scales. Specific usability problems identified by SUMI critical items among the inexperienced users include the lack of helpful inform ation on the sc reen when it is ne eded,

It is also of interest that two very different measurement m ethods, heuristic evaluation and psychometric assessment, identified quite similar kinds of user experiences. This suggests convergent validity—that the usability of a particular CASE tool is robust enough to be evaluated in more than o ne way. W e believe that these results provide additional evidence for a conceptual gap between the software engineers who develop CASE too ls and the software engineers who use them. This study is also part of an ongoing effort to better apprecia te develop ers’ experien ces with extant software engineering tools used in program development or maintenance.

6. References [1] Iivari, J., “Why are CASE Tools not Used?,” Communication of the ACM, 39 (10), 1996, 94-103. [2] Seffah A, and Rilling, J., “Investigating the Relationship between Usability and Conceptual Gaps for Human -Centric CASE Tools,” IEEE Symposium on Hum an-Cen tric Computing Languages and Environ ments , Stresa, Italy, September 5-7, 2001. [3] Jarzabe k, S., and Huang, R., “The Case for UserCentered CASE Tools,” Communication of the ACM, 41 (8), 1998, 93-99. [4] Nielsen, J., “ Heuristic Evaluation,” in J. Nielsen and R. L. Mack (Eds.), Usability Inspection Methods, John Wiley, New York, 1994. [6] K ira kow s ki, J ., and Corbe tt, M ., “ SUMI: The Software Measurement Inventory,” British Journal of Educational Technology, 24 (5), 1993, 210-212.

Figure 2. Satisfaction profile for inexperienced developers. 3