Development of a Measure to Assess the Quality of ...

1 downloads 0 Views 1017KB Size Report
computing. this study developed and assessed a nreczsureof user-developed applicalions quality. The quaiity construct comprises eight dimensions: reliability.
Development of a Measure to Assess the Quality of User-Developed Applications Suzanne Rivard École des HEC Montréal, Qué.

Guylaine Poirier MinistèreJustice Montréal, Qué.

Abstract Building on work in software engineering and in end-user computing. this study developed and assessed a nreczsureof user-developed applicalions quality. The quaiity construct comprises eight dimensions: reliability. effectiveness. portability. economy. user-friendliness. understandability. verifiability. and maintainability. ln turn. each quaiity dimension is composed of a set of criteria. Finally. each critmon is measured'by a series ofitems. The instrument was taled'by means of a survey involving 110 end-users. Confirmatory factor analysis. using the partialleast squares technique. was condut:ted. The results indicatetha! the 56-items instrument is reliable and valid. and tha! il might be a useful tool for researchers andfor practitioners alike.

Introduction Today's organizations have less financial resources to devote to information technology than before. While the annual inaease in Information systems (IS) budgets was 14.6% during the 1975-1985 period. it slowed down to a low 2.2% in 1993 [18. 41]. As weil. the number of workstations in the 100 computer premier organizations. that is. the organizations that make the most efficient use of their computing resources. decreased by 20.6% between 1990 and 1992. from a median of 55 workstations per 100 employees in 1990 to a median of 44 workstations per 100 employees in 1992 [13. 14. 15. 16. 17]. This reduced availability of slack resources for data-processing bas lead organizations to exert more control on their spendings on information systems activities. including end-user computing. For this purpose. robust evaluation tools are needed to improve the management of end-user computing, and to make sure that human. financial. and material resources devoted to it are used efficiendy. Management of end-user computing is an important issue in the public and the private sectors. that requires valid and reliable measures [12]. ln a survey of 131 information systems directors of Canadian organizations conducted by Bergeron. Rivard and Raymond [7]. as much as 96 % of the respondents indicated that they would use an instrument for evaluating the quality of end-user

Louis Raymond François Bergeron UQTR Université Laval Trois-Rivières, Qué. Québec, Qué. computing activities if such an instrument was available. Ooly four percent did not show interest in sucb an instrument lndeed. the development. and validation of various taols and teclmiques for 18 measurement is one of the key issues for 18 management in the 1990s [30]. End-users are employees. outside the information systems department. who use computer applications to acc:omplish their organizational tasks. Among them are user-deve1operswho deve10p applications for themselves or for their coworkers [27. 35]. While the technica1 abilitiC3ofuser-deve1opers may vary considerably, they are basicaIly able to analyze. design and implement computer applications [40]. The purpose of the present study. whicb follows the preliminary work by Rivard. Lebrun and Talbot [37]. was the deve10pment of an instrument to measurc the quality of user-cieve1opedapplications (UDA). The following section provides an overview of the concept of software quality. [ust from a software engineering perspective. and then from an end-user computing standpoint The def'mition of software quality used in the study is then presented. followed by the methodology adopted to assess it. Finally. the results of the data analyses con~ucted to asSC3Sthe measurement properties of the instroment are presented and discussed.

Software quality For severa1years now. the definition and measurement of software quality bas been a concem for researcbers and practitioners in the area of software engineering [8. 9. Il. 20. 24. 33. 38]. The early efforts ([24]. for instance) resulted in the identification of several dimensions of software quality: re1iability, maintainability, availability. precision. tolerance to errors. accuracy. and efficiency. However. it was saon found that sucb a list of quality attributC3was not readily usable. since different attributes appeared to share the same meaning. and that a given term cou1dbear different meanings. Following these ear1ydevdopments. which provided a nonorganized view of software quality. a consensus started to develop around the notion that quality was a multidimensionalconstruct, and that its understanding required a

hierarchica1dermition [8. 9J. A c:onceptualmode! for such a dcfmition was fU'Stproposcd by Cavano and McCall [9J who derme quallty as a four layer CODStruct.At the top levc1 of the hicrarchy is the construct of quality which is constituted by second levc1 quallty dimensions. ln turn. each quallty dimension is composcd of third levc1criteria FinaIly. each criterion is mcasured by a set of metries or itcms.locatcd at the bottom levc1 of the hicrarchy.

Relying on system suceess literature. Amoroso and Cleney [1J focussed on end-user information satisfaction and on application utilization as components of application quality. As a rcsult of thcir study, they propose two instruments which measurc intended and actuaI utilization patterns. as weIl as user information satisfaction. The instruments werc found to have rcasonably high levels of reIiability and validity.

From this generic model of software quality, sevcra1 hicrarchica1 dcfinitions emergcd [2. 20. 32J. and efforts were made to opcrationaIize them [33. 38J. A further cxamiDationof thcsc operationa1izations rcvea1sthat Most dimensions of software quality are mcasurcd through metries that are quitc close to system code. For instance. Petrova and Vccvcrs [33J review 26 studies proposing mcasures of software rc1iability. Most of the studics operationalizcd software re1iability through code-relatcd metrics sncb as numbcr of mots and numbcr of binary dccisions. Similarly. Robillard. CoaIlier. and Coupai [38J, propose metries for operationaIizing the testability and rcadability of a system. These include numbcr of indcpendent paths, numbcr of loops. Mean nesting level. and comments' volume ratio. As demonstratcd by Robillard et al.• such an approach is Most appropriate from a software engineering perspective. However, while the conceptual definitions of software quality are immediatdy relevant to the mcasurement of user-devdoped applications qu8J.ity,the highly technica1 stance adoptcd for their operationaIization is Dot readily usable in the contcxt of end user computing, where the user perspective bas to be taken into account [34]. This aspect is emphasized by Amoroso and Cheney [1J who derme userdevc10pcd applications quallty as "the degree to which an application of high grade accomplishes or attains its goal from the perspective of the end user"(p.2).

Kceping in mind the perspective adopted by thcse two studics. that is. that the end-user perspective is of paramount importance in measuring application quality, this study pursucd further the efforts made by Rivard et al. Following Clurchill' s [10J recommendations on construct development and validation. the instrument proposcd by Rivard et al. was rcexamincd in light of the results of the pilot test they had conducted as weIl as of the software metries and the IS literaturc. During this exercise. a document recently publishcd by the International Organization for Standardization (ISO) with the objective of providing a standardized framework for the c;!tannnation of software quality. was found to be Most useful [25]. This examination lead to a refinement of the mcasure which is presentcd in Figure 1. The proposed view of quality includes eight dimensions: reliability, effectiveness. portability. economy. user-friendIiness. understandabili ty, verifiabili ty. and maintainabili ty. A given dimension is mcasurcd via a numbcr of criterion variables which, in turn. are measurcd through a series of items. Each dimension. aIong with the criteria that compose it, is presentcd below; the Figure aIso indicates the numbcr of items used for assessing each criterioD.

Consequently, while the software metrics Iitcraturc is Most useful in providing a generic definition for application qua1ity, efforts rcmaÎn to be made in order to operationa1izeit for the end-user computing context. Two recent studies attemptcd to address this issue. ReIying on the software metries literature. Rivard. Lebrun and Talbot [37J proposcd snch a mcasurc which included ten quality dimensions: rcliability. user-friendliness. integrity, extendibility, corrcctness, undcrstandability. portability, economy, efficiency, and verifiability. The ten dimensions included a totaI of 24 criteria. mcasurcd via 93 items. The rcsuIts of the pilot study undertaken to assess their instrument were encouraging. However, since their sample was quite smaIl (28 respondents). these authors caIled for care in the interpretation of the results and for more assessment efforts.

Reliability. Software reliability relates to the ability of an application to perform its intended functions with appropriate precision [ 2. 9, 32J. According to the ISO, reliability refers to "a set of attributcs that bear on the capability of software to maintain its level of performance under statcd conditions for a stated pcriod of time" [25]. From the Iiteraturc. five criteria were found to contribute to the reliability of an application; they are: security. intcgrity, coherence, functiona1ity. and absence of errors. Emciency. Efficiency is appropriately defincd by the ISO as "a set of attributes that bear on the relationship betwecn the level of performance of the software and the amount of resourccs used. undee stated conditions" (p.4). This is consistent with the view shared by sevcra1authors [9,24]. The ISO further breaks down efficiency into two components: time behavior and rcsource behavior. Time behavior is related to response time and throughput rates; rcsource behavior refers to the amount of resourccs required to perform the function.

Figure 1 User-devefoped application

Portability. While some authors refer to the degree of portability of an application as its ability to run on different computers [11]. others talk about the effort required to transform a program from a given configuration to another [9]. ln a more genera1manner. the ISO defines portability as the set of attributes "that bear on the ability of software to be transferred from one environment to another" (p.4). Genera1izability and adaptability are two key criteria for assessing portability [25]. Economy. The cost-benefit aspects of an application are particularly relevant to the conte:tt of end-user computing. While it is not often cited in the conte:tt of large software programs [39]. the economy dimension is central to managerial concems in the case of user-developed

quality

applications [7. 35]. Following Rivard. Lebrun and Talbot. it is defined here around a single criterion. that of profitability. User·friendliness. User-friendliness is critical in the conte:tt of end-user computing. It is defined here as the case of leaming how to use a system. how to operate it. how to prepare the input data, to interpret the results. and to recover from errors [2.3, 28. 39.46]. Three criteria emerge from the literature; they are: accessibility. help features. and case of use. Understandability. Understandability is the e:ttent to which one cao understand what an application does, its structure, and its modules [37]. Understandability cao be

assessed through criteria such as modlÙarity, conciseness, clarity, uniformity. strUcturedness. and infonnativeness. Verifiability. Verifiability is defined as the ease of testing the application. to ensure it performs its intended function [2. 8. 9. 20. 32). The main criterion to assess verifiability is that of testability. Maintainability. Maintainability is "a set of attributes that bear on the effort needed to make specified modifications" (p.4). The dimension of maintainability comprises three main criteria: modifiabiIity, flexibility, and compatibility [2.3,8).

Copies of the questionnaire were sent to the IS managers, with retum envelopes and a coyer letler explaining the purpose of the research. Respondents returned the questionnaire direct1y to the researchers. A follow-up (by mail) was made four weeks after the first mailing. A total of 292 questionnaires were distributed., and 117 were retumed. out of which 110 were usable for data analysis purposes. The response rate was 100 percent for the organizations contacted. and 38 percent for the respondents. Respondents' experience with end-user computing differed widely; 2S percent had never developed an application. 55 percent had developed from one to ten applications. and 20 percent had developed more than ten.

Methodology Using as a first pool of items those found in Rivard et al. •s questionnaire. an instrUment was developed. Wbile a further examination of the literature suggested the inclusion of new itetns. it also suggested to discard sorne of the items from the former questionnaire. As indicated in Figure 1. the resulting instrUment includes 88 itetns. It was pre-tested with nine users, with various backgrounds and responsibilities. in three organizations. ReslÙts of the pre-test indicated that the items were cIearly formulated. and that they were weIl understood by the respondents. Assessment of the measurement properties of the instrument was performed via a field study of 22 Quebec organizations. This convenience ~ample was selected through personaI contacts. The targeted respondents of the survey were of four types: user. user-developer, developer. or manager of end-users. The applications had to have been developed by end-user personnel. that is. personnel outside the IS department. without formaI computer science education. such as college or a university degree in computer science. Participants who had received basic training on commercial software only (e.g. Dbase. Lotus), be il in-house courses or public types of seminars. were considered end-usees. No selection criterion was applied to the type of applications. as long as their developer met the end-user selection criteria. PreIiminary phone calls were made to the IS directors to explain the project. solicit their participation. and de termine the number of end-usees who would receive the questionnaire. They were asked to distribute the questionnaire to end-users working with a user developed application. wilhout discrimination with respect to the perceived overaIl quaIity of the application. While it is possible that only the Most satisfied users completed the questionnaire. this was heyond the researchers' control; yet. it was estimated that this would not significantly affect tlle validity of me study.

The applications were developed with dBase (18% of aIl applications), Lotus (16%), SAS (14%). Dataease (9%), Image (4%), Foxbase (3%), DB2 (1%), and other end-user tools (35%). The types of applications were as follows: (16%), data entry (14%), inventory management scheduling (11%), accounting (11%), file management (10%), budgeting (9%), customer management (7%), statisticaI analysis (6%), printer management (6%), phone management (6%) and time management (4%).

Results and discussion While the development of the measure was done in a topdown fashion. starting with the UDA quaIity factors. followed by the criterion variables for each factor and then by the items for each variable. the assessment of the measurement properties was performed bottom-up. A coofirmatory factor anaIysis approach was chosen. using the partial least squares technique (PLS, [45)), as opposed to an exploratory approach using principaI components analysis. This choice was based on the a priori development of the dimensions of user-developed application quaIity, and the need to test the hypothesized factor structure [5]. Compared to the other widely used "second generation" multivariate analysis technique USREL, PLS is more suited for causaI-predictive analysis emphasizing theory development. LISREL is more recommended when the objective is to confirm tbat a theoretica1 model fits observed data [6]. Nonetheless. given the methodological nature of the research objectives, PLS was preferred here because of its greater robustoess. that is, not requiring multivariate 1l0rmaIly distributed data and large samples [21]. ln ternal Consistency. The first measurement property to be assessed is the internaI consistency of operationaIization. This involves two related issues. namely unidimensionality and reliability [43]. The

multiple items that measure a criterion variable must necessarily be unidimensional if this variable is to be treated as a single value. The same cao be said. at a higher level. for the multiple criteria that under1ya quality factor. that is. assessing that all criteria measure the underlying construct (UDA quality factor) to which they are theoretica1lyrelated. UlÙdimensionality is assessed in PLS by examining the loadings of the measures on their associated construct. using 0.5 as a eut-off point. following Rivard and Huff [36]. Measures are reliable if they are free from random error. Reliability is usually assessed by Cronbach's alpha coefficient. with a value greater than 0.7 deemed acceptable [31]. However. this coefficient is calculated with the assumption that all measures weight equally on their corresponding construct. A less restrictive definition of reliability. the rho coefficient. is based on the ratio of construct variance to the sum of construct and error variance [44]. A rho value greater than 0.5 indicates that the construct variance accounts for at least 50% of the measurement variance. Another reliabiIity criterion is the average extracted variance. wbich should be 0.5 or more. that is. measures should account for at least 50% of the variance in their corresponding construct [22]. Both the reliability coefficient and the average extracted varianceare computed with the loadings obtained from PLS [23]. as shown in the bottom of Table 1. A PLS run was first done for each of the 23 criterion variables. using item scores. Items were selected for removal on the basis of their not loading at the 0.5 level on their corresponding variable. or not providing acceptable values for the alpha and mo coefficients.and for the average extracted variance. As shown in Table 1. this resulted in the elimination of five criterion variables out of the original twenty-three. namely accessibility. conciseness. clarity. modiJiability and f/exibility. and in the deletion of a number of measurement items for the remailÙng variables. The results for the deleted variables might be explained by the fact that many applications are developed by end-users for stricüy personal ends. to answer "ad hoc" needs which are very specific [42]. These applications are often seen by the user-developer as being only temporary or "throw-away". and thus not requiring that comments be inserted in the code. that the application be properly documented. or that other end-users and future modifications be taken ioto account. Unidimensionality of the eight UDA quality factors was assessed by doing a PLS run with the scores of the eighteen remaining criterion variables. obtained by averaging the items remailling in each. The IOéldingpattern

matrix presented in Table 2 globally confirms the hypothesized factor structure. with eacb variable save one loading at 0.5 or more on its associated factor. The one exception is functionality whose loading on the reliability dimension was only 0.23. whereas it loaded on the effectiveness dimension at the 0.441evel. As this criterion refers to the completeness of outputs (reports) and inputs (data base), a case could be made that this is a feature showing an application being effective rather than reliable. An alternative assessment of unidimensionality was made by correlating a criterion variable with its corresponding factor (whose modified score was computed by excluding the variable in question). The right-hand column of Table 2 shows acceptable levels for all item-total correlations (0.30 or more [26]), with the exception of functionality. Note tbat some variables loaded on more than one factor. especially user-friendliness and understandability wbich might in fact constitute a single dimension. The (WO lefthand columns of Table 3 provide the reliabili ty indicators for each factor. While the alpha value for portability and user-friendliness is low. the rho coefficient is satisfactory for all dimensions of UDA quality. Given that this last indicator is more appropriate in the context of confirmatory factor analysis. the eight dimensions of the hypothesiz.edfactor structure arejudged here to be reliable. Discriminant Validity. Given the hypothesized factor structure, discriminant validity is the degree to wbich the UDA quality factors are unique from each other, or measure distinct concepts. It is assessed by examining the correlations between pairs of factors. The shared variance between two factors (the squared correlation) should be inferior to the average variance extracted by the criterion variables that underlie each factor [22]. As shown in right-hand side of Table 3. all of the UDA quality factors satisfy the aforementioned test of discriminant validity. There is nonetheless some shared variance between reliability and verifiability (0.28). and between user-friendliness and understandability (0.27) as was alluded to in the preceding assessment of unidimensionality. but not enough to warrant questiolÙng the fundamental discriminating power of the eight factors. Convergent Validity. A construct is shown to have convergent validity if there is measurement consistency across multiple operationalizations. The results of measuring the same trait with different methods should thus correlate with one another [26]. For tbis purpose, three scales were added to the instrument. measuring the overall quality of the user-developed application. the quality of the application's componenls (menus. screens.

~----_.

--

LOO LOO LOO ----.74 .65 .44 .66 .49 .78 .69 .46 .67 .74 .79 .82 .64.53 .48 .62 .82 .71 .43 .76 .94 .84 .24 .80 .70 .72 .42 .75 .47 .76 .26 mo .56 .51 .52 .63 1 of .97 .71 .45 .55 .95 .54 1.00 .29 .90 .89 .S7 .37 .95 .48 1.00 0432650n. .00 .57 .40 .85 .85 .36 AEV 1.00 .35 1.00 .89 .73 .33 .61 mob AEVc alpha alplJaa

423n. 76items 5871initial of items fInalTable 1 Base of use Performance Genera1izability (PlS) of the criterion variables (N=110) assessmentscAverage Testabili ty ProfItabili Modwarity ty Security ModifIability bReliability coefficient = Œi=l.nLoadingi)2!«~i=l.nLoadingJ4 ~i=l.n(1-Loadingi2» extracted variance = ~i=l.nLoadingi2!n FAcrOR

Qum

Table 2 Unidlmensionallty assessment (PLS) of the quallty factors (N=110) QUAurY Criterion variable

FACfOR

loading pattern matrixa 2

3

4

5

6

7

8

1. RELIABILITY

.50

.60 .78

Security Integrity Coherence

.57

FunctionaIi ty Error-free

.23 .78

2. EFFEcrIVENESS Performance

.52

.84 .84

Efficiency

.40

.48

3. PORTABIUTY Generalizability Adaptability

.31 .13 .49

.41 .41

.58

.80

.30 .30

.80

4. ECONOMY .50

Profitability 5. USER-FRIENDUNESS Base of use

1.0

1.0

.78 .78

Help feacures

.67

.41 .41

.43 .76 .84 .89

.31 .56 .64 .65

6. UNDERSTANDABIUTY Modularity Uniformity Struccuredness Informativeness

.56

.66 .67

7. VERIFIABILITY Testability

.50

1.0

1.0

8. MAINf AINABILITY Compatibility

1.0

1.0

aA clash indicates (bat the loading is inferior to .5 bCorrelation of the criterion variable with the quality factor (p