Cornelia Boldyreff - CiteSeerX

13 downloads 50312 Views 104KB Size Report
development and maintenance. Overlaying this research has been work on Web metrics: firstly to evaluate web application quality, secondly to describe change, ...
Towards the Engineering of Commercial Web-Based Applications Cornelia Boldyreff, Elizabeth Burd and Janet Lavery R.I.S.E. Department of Computer Science University of Durham Durham, DH1 3LE, U.K. +44 191 374 2638 {cornelia.boldyreff, liz.burd, janet.lavery}@durham.ac.uk ABSTRACT Since the mid 90s, we have been studying Web development and maintenance at Durham. One aspect of the research has been concerned with characterizing the evolution of Web sites over time. Another has been concerned with determining appropriate process models of Web development and maintenance. Overlaying this research has been work on Web metrics: firstly to evaluate web application quality, secondly to describe change, and thirdly to guide our research on process models for web engineering. Working with small to medium enterprises developing commercial web-based applications has given our research a firmly based practical emphasis and empirical grounding.

both impact analysis prior to making changes as well as program comprehension during the maintenance interventions. The graph structures obtained from these activities naturally lent themselves to representation as hyperlinked documents and through the use of translators the project developed, these graphs were rendered as a collection of web pages viewable through the Mosaic browser. Thus, using Mosaic, a maintainer could trace through an identified requirement to the relevant section in the design addressing that requirement, and from the design follow a link to the underlying code [2]. These links were established through semi -automated analysis that relied upon the use of requirement identifiers in the associated documentation and the convention of preserving design module names in the corresponding code modules. The text analysis supported linking of keywords and phrases throughout the documentation files and source code files.

All of our research has been leading towards the establishment of web site engineering as a new discipline within Software Engineering. An overriding concern has been determining how to build the foundations of web site engineering based on the lessons that can be learnt from traditional Software Engineering, the identification of the new challenges introduced by taking an engineering approach to web site evolution, both development and maintenance aspects, and the unique problems posed by the web and its associated technologies and innovative applications.

One very interesting result of the research on the AMES project was the recognition that while creating these large webs to support application understanding was viable and useful, the webs themselves imposed an additional maintenance burden on the maintenance team. In general, this led to a growing recognition that development of webbased content itself and its associated structure posed a maintenance problem. In the same way that Lehman’s first law states that software, if it is to remain useful, must be changed to reflect changes in its environment [3], it became clear that web-based content and web-structure required a clear maintenance strategy in order to remain useful over time.

Keywords Web site engineering, web maintenance, web evolution, web metrics 1 BACKGROUND Using the web in the early 1990s was an exciting and challenging experience. In the AMES project, we were concerned to capture as much potentially useful information about large software application undergoing maintenance as possible in order to facilitate their maintenance in the future [1]. Under the activities of application understanding and impact analysis, the information obtained from a combination of code analysis and text analysis was linked together and used to support

This problem formed the starting point for research into Web development and management at Durham started in the mid-nineties. Others were seeking to apply sound software engineering and software maintenance practices to their web sites as well; and through the Durham Maintenance Workshop, joint work in this field between the University of Durham and the UK CCTA became established. The web site development and management practices at the CCTA featured as a major case study in our determination of the support requirements for developing 1

our workbench [4]. From our studies of the CCTA web sites and other then developing web sites in the midnineties, a number of findings arose. Our initial studies found a variety of problems, as follows: •

uneven, largely poor, quality of authorship;



badly structured web sites and associated pages;



difficult to navigate sites and pages;



large hypertext structures created through exhaustive linking of nodes;



and a general lack of tools for the effective development and management of a web site.

maintenance problems is presented in Section 3. An overview of our research on web management and design processes and the relevance of these, particularly usability and accessibility engineering, for commercial web site developers is explained in Section 4. The foundations of web site engineering are outlined in Section 5. The paper concludes with our vision of web site engineering in the future and an assessment of progress to date. 2 WEB SITE EVOLUTION AND METRICS Hatzimanikatis and his colleagues published a key paper, Measuring the Readability and Maintainability of Hyperdocuments, in 1995 [7]. This paper inspired our research on web site metrics. As well as the research reported by Hatzimanikatis, earlier work by Brown [8] on maintaining long-life hypertext and works by others on hypertext quality [9, 10] were influential although none of these groups were explicitly proposing metrics for web sites. We saw the possibility of developing the work on hypertext metrics, which in part derived from classical software metrics, and extending it as the basis for studying the WWW, web pages and web sites in particular. Various content metrics can be used to gauge readability, such as sentence length, vocabulary analysis and font analysis, and content analyzers can be used to check spelling and grammar to give a measure of editorial quality control. In our early work, we chose to ignore these and to concentrate on structural metrics derived by software metrics. In our studies, we concentrated on the following [5]:

These led to the requirements studies more fully described in [4] and the development of a workbench for web developers and managers. It was also found that web sites like large software applications were often distributed and being developed and maintained by several authors. In some cases, these distributed development and maintenance activities lacked any form of management control and were uncoordinated leading to further maintenance problems. Because there was little coordination between the web server managers and the web page providers, errors recorded in the server logs were not being passed on. Through ignorance, many web page providers failed to provide users of their web pages with any mechanism for feedback, so complaints from users were not being sent. In addition, owners of web pages were not carrying out periodic checks of their links to ensure that they remained correct over time. There was no general consensus of opinion, or standards, on what constitutes web site or web page quality; no suitable metrics were either established or in use. This early research is fully described in [4] and the resulting workbench 1 remains on-line. Addressing the above problems has formed our research agenda in web site evolution over the past six years. Building on our earlier research within the AMES project, we have developed methods of analyzing and assessing web sites and pages with the goals of improving their overall quality and maintenance. Our pioneering work on web site evolution has sought to study the nature of change within both web sites and pages using metrics which we have developed from traditional software metrics in order to gain a better understanding of the web evolution process, so that it can be made more predictable and better supported by tools [5,6]. A key goal here has been to support better, more controlled web maintenance and management.



per page metrics: LOC (lines of code), COM (lines of comments), NOM (number of modules, i.e. nodes), number of links, i.e. edges, counts of various HTML tags, MVG (McCabe’s Cyclomatic complexity within the page (for local links), fan-in, fan-out, HenryKafura/Sheppard measure; and



per site metrics: Number of pages, totals and means of the above measures.

In order to study changes to web sites over time, we chose a small number of web sites in three categories: small, medium and large. The content of the web sites was captured at regular intervals of time, rather than studying consecutive releases of the software systems as in classical software evolution studies performed by Lehman and his colleagues in the Feast project2 . Our goal in these studies is to identify patterns of change within web pages and sites and use these to develop models of change.

In what follows, our work of web site evolution is covered in Section 2. A deeper analysis of commercial web

In reviewing the results of our studies, it is helpful to distinguish between the WWW as software-in-the-large and as software-in-the-small following DeRemer and Kron’s distinction made with respect to programming [11]. Taking the first view, the studies [5,6] showed that web software

1

2

www.dur.ac.uk/cornelia.boldyreff/workbench/ 2

www.doc.ic.ac.uk/~mml/feast/

evolves and degrades like conventional software; all forms of changes, corrective, adaptive, perfective and preventative, occur. An additional form of maintenance, speculative or pre-emptive, is also needed; this would entail periodic checking of external and internal linking within a web site. The internal link checking is required as a form of regression testing as the other forms of maintenance may introduce errors. It also was apparent that systematic web site maintenance required a large commitment of personnel in the case of any substantial web site.

factors. For example, the relationship between structural measures and quality factors such as maintainability and usability needs to be clarified, as does the relationship amongst the various factors such as usability, accessibility, and readability. So while measuring and modeling the web pages and web sites has allowed us to study both structure and contents changing over time, and to determine quality factors operationally, there is still much work to be done [12,13]. In the following section, some problems associated with commercial web sites are discussed with a view to relating our theoretical studies to more practical industrial concerns.

From the web-in-the-small viewpoint, we observed that just as conventional source code can be represented using a call graph, a web site can be represented as a graph of web pages (nodes) and links (edges). Web pages may themselves contain embedded software, e.g. JavaScript, Java, cgi, VRML, etc. There is close resemblance of the web page to software in that the browser interprets the page content and users “execute” links to alter the flow of control. There are various kinds of links: internal/external, images, frames, etc. Links may be distinguished as either organisational or referential.

3

MAINTENANCE PROBLEMS ASSOCIATED WITH COMMERCIAL WEB -BASED APPLICATIONS Similarities can easily be drawn between large web sites and large-scale software systems [5, 14]. Large sites often contain many thousands of lines of 'code', HTML source, split into many 'modules', web pages, stored in many places with large amounts of data often important to the owner of the site. Many commercial web sites incorporate legacy data from established company databases and other legacy applications, so that the web site contents cannot easily be distinguished by its associated software systems. Nevertheless, establishing sound maintenance practices in these instances is compounded by the problems of maintaining the established databases and legacy applications in a new context of usage [15].

The results of our studies showed that simple web metrics can be collected easily. However, even with a small study of six sites, there were large quantities of data to analyze. Index documents can be easily identified as they are characterised by high fan-out (and fan-in) and thus exhibit very high coupling. It was observed that the overall rate of change on a site increased with the size of the site although one of the sites studied, the smallest site, exhibited no changes for a very long period of time. Throughout only growth and never shrinkage of web sites was observed.

In most traditional software systems, the code is segregated from the data by data abstraction, and many systems are data processors in which the program only interacts with the data when that data is provided to it. However, due to the design of HTML, in web sites, tags to format both its structure and style of presentation and even generate further data mark up the data. Thus, the data is an integral part of the 'code', and this complicates web maintenance [16]. Others have noted that the blurred distinction between data and software in the HTML model presents developers with some interesting and problematic consequences [17].

Various recurrent structures, which we have termed web design patterns, emerged from studies of site graphs; indices, tours, picture galleries can all be identified. In future, further research on web design patterns is planned. One predictive indicator has been observed; high link density (links per page) is strongly related to the probability of additional links being added to the page per unit time [12]. From our studies, it seems likely that the web maintenance process itself evolves with time as the site grows [13]. For example, from no changes to minor corrections, to managed re-design, to multi-developer site, finally to multiple data base driven, dynamically generated site. It is also clear that web evolution is closely related to usability as following Lehman’s First Law, it can be anticipated that a large web application must undergo continuous change or it will become progressively less useful. In the future, we foresee the development of a unified measurement programme combining structural metrics with content metrics, particularly those concerned with all aspects of usability [13]. From our research, it is evident that existing web metrics require better interpretation in the form of mapping the low level measures to higher level quality

Traditional software engineering has been tackling the problems of legacy systems and maintenance for some time and Durham has been at the forefront of this research, particularly with regards to legacy systems [18]. While the web presents new challenges, there is evidence of some known problems recurring under a new guise within webbased applications as the following paragraph illustrates. The use of 4th Generation languages and tools resulted in much generated code which was often altered without reference to the original 4GL source and therefore became difficult to maintain. A similar phenomenon is occurring with web development tools. Due to the ease with which web pages can be generated using a variety of tools (MS Frontpage, Netscape Composer, Macromedia Dreamweaver 3

example, Evrsoft 1st Page 20004 incorporates the Tidy HTML checker as an option. However, for the most part, separate tools such as Bobby, to check web site accessibility, must be used independently. Other efforts such as the Campaign for Plain English have sought to improve the quality of web site text content, but they do not offer any automated means of checking a site [25]. In our research, we have shown how simple diction and style tools developed as part of the UNIX Writers Workbench in the 1970s and now ported to Linux and freely available can be used for this purpose. However, these tools remain standalone and interpreting their results is likely to pose a problem for the typical commercial web page developer. The Internet communities, and the standards bodies in particular, have started to make great efforts to counter the lack of standards problem. For instance, the consortium in charge of standardising the web, the W3C5 , has introduced a number of standards relevant to both authors and browser developers alike. There is also work being carried out through the IEEE, which has resulted in IEEE Std. 20011999 Recommended Practice for Internet Practices --Web Page Engineering --Intranet/Extranet Applications.

to name but a few), web sites can be created by people with little or no formal knowledge of software engineering and in very little time. Changes are made to the generated HTML with another tool or directly; and the result is frequently a web site with errors. Also while the tools support web site creation, there is little support for periodic maintenance required to keep the site up to date. For a small scale personal web site, it is not important that the site is error free and up-to-date; however, for a commercial web site, this is often key to the businesses' on-line success. The number and size of web sites has grown and is still growing rapidly with the result that simple measurement studies show that modern sites are as large and complex as large-scale traditional software systems [19]. As an indication of commercial web site size, in 1996, the maintainers of the Microsoft web site3 estimated that they maintained over a million pages with a predictable impact on maintenance [20]. The rapid growth of the WWW combined with the necessity for businesses to quickly deploy their commercial web sites results in sites being produced with little or no design and no concern for the maintenance issues. This has resulted in a state of poorly maintained sites [6].

One of the main uses for databases on the WWW is for the storage of company data such as customer or product details. This information may be stored in specialised databases designed and used solely for the company web site, or it may utilise a connection to existing company databases to produce the same result [17]. However, this brings further maintenance problems. Obviously it is wasteful to maintain the same information in more than one database although for security reasons it may be sensible to make one database a partial or complete replica of another [16].

In research carried out into the state of web sites [6, 20, 21], four common problems has been identified; they are as follows: •

broken links,



incorrect or out-of-date data,



inconsistent information, and



inconsistent style.

These problems are closely related as surveys of web users [22] have shown that inconsistency and inaccuracy of data rank alongside with broken links as major obstacles in the take-up of the web. Inconsistent style and poor navigation have also been cited as major problems that discourage users from continuing to use a site [23]. Research into these areas is developing alongside Human Computer Interaction and Usability research to try and develop guidelines and strategies that will help businesses to utilise the web more effectively [24].

Many companies and organisations holding information in existing databases have sought to integrate these data stores with their web sites, rather than incur the penalties of having to maintain separate data sources. Dynamic page generating systems and associated scripting languages make such integration possible although considerable redevelopment of existing web pages may be necessary. In addition, co-ordination of the database maintenance with the web site maintenance is required. Databas e systems can also be used to store more complex components of web sites or references to these. Examples of these are databases to store links [26, 27]. These enable maintainers to easily verify and update links throughout a site. However, in the case of older web sites, some reverse engineering of the site is required to ensure that currently replicated elements within the web site are identified,

One major problem identified in our early studies that is of particular concern to commercial web sites is the lack of standards, especially quality standards [4]. While the lack of standards problem has been addressed by several bodies in recent years, through lack of awareness and education and training, their widespread usage amongst commercial web developers and maintainers remains patchy. Some web development tools incorporate standard checkers, for 3

www.microsoft.com 4

4

www.evrsoft.com/1stpage

5

www.w3c.org

abstracted, and, where appropriate, moved to a database or subsumed into a scripting language program. We have had some success at carrying out such reverse engineering of small web sites at Durham [16], but further research is needed as the cited paper explains. Given the investment of companies in their existing databases and current web pages, this research has addressed the possibility of reversing engineering existing web pages to identify replicated contents that can usefully be stored in databases where the databases will either already exist or be specifically created for this purpose. Through this research, the aim is to determine how to assist companies in developing a better basis for the future maintenance of their web sites and to overcome some the problems associated with web site maintenance discussed earlier. So far, our research prototype only demo nstrates how this could be done, but further work is needed to bring this approach to the point where it could be employed in practice by maintainers of commercial web sites. It seems likely that these problems and partial solutions discussed above will re main until commercial web site developers gain a better understanding of web engineering processes.

In recent years, a number of hypermedia and web development methods have been proposed. Although none of these is in widespread use, we have found it helpful to survey these and abstract out the common elements into a high level model of web development and provide a classification of these various methods; see the two figures below taken from [28]. Thes e have been used to guide the development of prototype case tool for web developers reported in [29].

4

WEB SITE PROCESSES AND THE ROLE OF QUALITY Conventional software engineering has been focused on making the software life cycle and its underlying processes explicit through definition via modeling and subject to measurement as the basis for achieving software process improvement. The processes of web development and maintenance have received very little corresponding attention. A model of web management pro cesses resulting from our research in the 90s with the CCTA is shown in Figure 1 taken from [4].

Figure 2 Hypermedia Development Stages

Figure 1 CCTA Management Model

Figure 3 Method Relationship Hierarchy

However, this model addresses web development and maintenance at a high level and quite a sophisticated extension to the processes would be required to determine the impact of any change on all the other pages in the site to ensure that changes were made uniformly across a site.

More recently, hypermedia/web development process models derivative of the traditional software engineering life cycle models such as the waterfall, prototyping, spiral, and incremental development models can be found in Lowe and Hall’s book [30]. Their fullest developed model 5

includes project planning and project management with overall system architecture, system design and application partitioning (see page 246 in [30]). This model omits any consideration of quality determination other than conventional testing and totally ignores maintenance other than a reference to change requests during development.

success of the web site be measured against these business objectives [33]. The design of the actual web site should support the business' customers in accessing the content of the web site and successfully achieving their own underlying goal. The BIG team has popularised Nielsen’s web usability engineering principles [23] which take the form of practical guide lines as a means of supporting SMEs’ need to attract repeat customers to their web site, as illustrated by Lavery’s presentations7. This emphasis allows SMEs to better align their web development and maintenance with their core business, their customers, their objectives, and the business' evolution in a changing world.

Despite the acknowledged need for further detailed refinement, the work that has already been done in the area of web process modeling, methods and metrics provides a sound base from which work on web process improvement can start. Companies developing and maintaining commercial web sites are in need of guidance on how best to improve their existing web development and maintenance practices. By working with these companies to define basic web development and maintenance process models, we can start them on the path to improving their processes. The basic web process improvement models also provide a means of addressing various core web quality issues such as maintainability, and factors that specifically relate to the users/customers of a web site: usability, navigability, accessibility and readability.

Nielsen’s recommendations are relevant to SMEs and credible because they are supported by usability studies. For example, one such study reported in [34] found that measurable usability was improved by 124% on a web site after it had been changed to support a scannable layout, a reduced word count, and objective language. In fact, simply using an objective language style of language as opposed to promotional language style improved the usability of a web site by 27%.

Our research on the Business Informatics Guild (BIG) project has aimed to popularise web engineering to ensure its uptake by small to medium enterprises [31]. Smaller businesses do not have the financial and personnel resources of large companies and yet their participation in the web is on the increase. The BIG project a three-year EDRF (European Regional Development Fund) funded multidisciplinary project that began in September 1998. This project brings together the expertise of the University of Durham's Research Institute for Software Evolution (RISE), Statistics and Mathematics Consultancy Unit, and Foundation for Small and Medium Enterprise Development. The project's overall objective is to support small and medium enterprises (SMEs) in the North East of England in their use and understanding of information and communication technology (ICT). To date the project has provided support to over one hundred SMEs in the North East of England. The most successful project development, in terms of SME participation, has been the one that did not required a large investment of time by the SME owner/managers; this is the BIZ-KIT club and its associated web site6 . The BIZ-KIT club has provided an excellent forum for the dissemination of software engineering best practice on the development and evolution of web sites to the SME owner/managers. One key message for this audience has been the importance of linking web site evolution to business objectives and changes.

As in traditional software engineering, small companies pushed for time omit the testing their web developments. Again we found Neilsen’s recommendation of usability testing with only five (testers) users [35] more in keeping with the local SMEs’ needs. Regression testing following maintenance was little practiced among the SMEs, so we introduced this to local SMEs in a training seminar along with the notion of periodic link checking to ensure links to external sites are still useful to the SME's site customers. Instruments such as network caches and dynamic IP addressing make it impossible to get an exact picture of how customers are using a web site. However, we demonstrated to SMEs how it is possible to get an indicative view that can be used to support business decisions by applying the Analog analysis tool8 . Most Internet Service Providers can supply a log file containing data on web page access for an SME's web site. This data can be used as input to an analysis tool such as Analog. This tool extracts the information that is available such as the number of accesses to a particular web page and where a customer was prior to accessing a particular web page. The latter is particularly useful if the SME's web site is paying for advertising on other businesses’ web sites. Another method of collecting data to provide an indicative measure of use of the web site is the application of simple

It is necessary that the SME owner/managers set clearly defined and measurable business objectives prior to the development or maintenance of a web site [32] and that the

6

7

www.dur.ac.uk/janet.lavery/documents/WebUsability.ppt and www.dur.ac.uk/janet.lavery/documents/What Makes a Good Web Site.ppt 8

www.biz-kit.org 6

available from www.analog.org

freeware page counters. By applying these counters to all or key web pages, maintainers can find those web pages that are rarely accessed. Low access may indicate a web site navigation problem or perhaps that the product/service offered is inappropriate for sale on the web, or that it is not of interest to the current customers of the web site. Such findings can then be used to drive further development of the site.

and maintenance of web sites. It is likely that further studies will show that the application of software management techniques such as change control and quality assurance techniques are relevant to web site management and development practices. Web metric s can help to describe, to assess and evaluate, and to develop new approaches to web engineering processes and products. Supporting the widespread development of commercial web sites by non-professional software engineers is a great challenge for web site authoring and management tool developers. Currently few of the available and widely used tools incorporate any process or method support.

Other quality factors can also be measured by commercial site developers and addressed if necessary. For example, there is growing interest is accessibility as it becomes clear that there may be a legal requirement for companies to provide this in the future [25]. Thus, through a BIZ-KIT seminar, members were told of the work of CAST and its freely available accessibility checking service on the web.9

6 TOWARDS WEB SITE ENGINEERING Although some progress has been made towards modeling of web development and maintenance processes, further research is needed. Given the diverse nature of web sites being developed, it is likely that a variety of such models will be found to be useful. In our research with the CACDP [25, 37], we are currently investigating lightweight process es suitable for a small commercial enterprise without a large team of web site engineers or managers. However, these lightweight processes are unlikely to be suitable for a large company maintaining millions of web pages across a number of distributed web servers.

5 FOUNDATIONS OF WEB S ITE ENGINEERING From the research presented so far and the associated discussion, it should be clear that in developing the discipline of web site engineering, we have drawn on a number of existing disciplines. The primary sources are Software Engineering and Distributed System Engineering, and within these sub-disciplines of Software Metrics and Software Process Improvement (SPI), Human-Computer Interface Studies and Hypermedia Theory. Further specialisations within these, such as hypertext metrics, usability metrics, and hypermedia design, have also been identified. In the figure below (adapted from an earlier draft of Figure 1 in [36]), we have attempted to show how these various disciplines are interrelated and contribute to the development of web engineering.

While there has been progress in recent years in establishing a number of web related software standards, further work is required to ensure conformance to such standards; and further work on web site quality definition and determination is needed. Again as with process, it is unlikely that work of quality will cover all forms of web sites as fitness for purpose depends ultimately on the intended purposes of a web site as these are extremely varied [13]. However, our work has shown that core quality factors such as maintainability, readability, usability, accessibility can be measured through the suitable employment of metrics and that such employment is facilitated by both training and tools. Integration of such tools and checkers with one of the more popular web page development systems would provide the necessary employment of the system by practitioners needed to form the basis for a more in-depth evaluation. We are considering this as a future development for some of the tools and checkers described in this paper. The modest results achieved in our web site reverse engineering research do clearly show that techniques from conventional reserve engineering can be applied to alleviate some of the maintenance problems found with existing web sites and the current storage systems employed for web sites. This work has shown that there is scope for further research on the reverse engineering of web sites and demonstrated that conventional reverse engineering principles and techniques can be applied to web sites [16]. As commercial web site developers may be forced in the future to ensure the accessibility to their sites by law, there

Figure 4 Research Taxonomy The research so far has been accomplished largely through applying the established models, methods, metrics, tools and techniques of Software Engineering to the development

9

www.cast.org/bobby 7

is likely to be a major requirement for some form of automated assistance to allow existing web sites to be easily re-engineered to ensure that they meet the legally required level of accessibility. As reverse engineering is likely to precede any such re-engineering, there is clearly a need for further work in this area and a strong commercial as well as social motivation for such work. As in Software Engineering, the recognition that distributed development presents unique challenges to both our theory and practices is growing amongst those seeking to develop the discipline of web site engineering. We will be addressing the issues raised here within the GENESIS10 project which will focus on distributed system engineering including web-based applications in a collaborative software engineering environment. There is potential for applying research from Computer Supported Collaborative Work studies to Web Site Engineering practice, particularly where distributed working is the norm [15]. Whether or not in the future, we will have a comprehensive understanding of models of web evolution and their associated laws as with software evolution remains an open question and research challenge. Our results from the small scale studies we have undertaken are promising, but we have a long way to go in developing this research. Greater automation of the web development and maintenance proces ses allied with improved training and awareness of the need for quality and how to achieve it are required. A number of areas of further research have also been identified. Both improvements to practice and more basic research are needed; as these can drive each other forward, all can benefit.

4.

Dalton, Susannah, A Workbench to Support Development and Maintenance of World-Wide Web Documents, MSc Thesis, University of Durham, Durham, U.K., 1996.

5.

Warren, PJ, Boldyreff C, Munro M, The Evolution of Websites, International Workshop on Program Comprehension, IEEE Computer Society Press, 1999.

6.

Warren, PJ, Boldyreff C, Munro M, Characterising Evolution in Web Sites, Proceedings of WSE'99, 1st Annual Workshop on Web Site Evolution, pp. 46-48, 1999.

7.

Hatzimanikatis, A.E., C.T. Tsalidis, and D. Christodoulakis, Measuring the Readability and Maintainability of Hyperdocuments. Software Maintenance: Research and Practice, 7: p. 77—90, 1995.

8.

Brown, P.J. Assessing the Quality of Hypertext Documents. in European Conference on Hypertext, INRIA, France: Cambridge University Press, 1990.

9.

Botafogo, R.A., E. Rivlin, and B. Schneiderman, Structural Analysis of Hypertexts: Identifying Hierarchies and Useful Metrics. ACM Transactions on Information Systems, 10(2): p. 142—180, 1992.

10. Garzotto, F., Mainetti, L., and Paulini, P., Hypermedia Design Analysis and Evaluation Issues, Communications of the ACM, 38(8), 1995. 11. DeRemer, F and Kron, H, Programming-in-the-Large versus Programming-in-the-Small, IEEE Transactions on Software Engineering, June 1976.

REFERENCES 1. Boldyreff, C, Burd, EL, Hather, RM, Mortimer, RE, Munro, M, Younger, EJ, The AMES Approach to Application Understanding: a case study, Proceedings of the International Conference on Software Maintenance, IEEE Computer Press, 1995.

12. Boldyreff, Cornelia, Warren, Paul, Gaskell, Craig and Marshall, Angus, Web-SEM Project: Establishing Effective Web Site Evaluation Metrics, Proceedings of the Second International Workshop on Web Site Evolution, WSE'2000, pp 17-20, Zuerich, Switzerland, 2000.

2.

Boldyreff, C, Burd, EL, Hather, RM, Munro, M, Younger, EJ, Greater Understanding Through Maintainer Driven Traceability, Proceedings of the 4th Workshop on Program Comprehension, April 1996, pp. 100-106, IEEE Computer Press, 1996.

13. Boldyreff, Cornelia, Gaskell, Craig, Marshall, Angus and Warren, Paul, Establishing a Measurement Programme for the World Wide Web, Workshop on Internet Application Standards, IEEE SAINT'2001, San Diego, USA, January 2001.

3.

Lehman MM, Belady L, Program Evolution: Processes of Software Change, Academic Press, London, pp. 247-274, 1985.

14. Brereton P, Budgen D, Hamiliton G, Hypertext: The next Maintenance Mountain, IEEE Computer, Vol. 31, No. 12, pp. 49-55, 1998. 15. Boldyreff, Cornelia, Net-Centric Computing: Now and in the Future – Some Fictional Insights, Proceedings of the 3rd International Workshop on Net-Centric Computing, May 14, 2001, Toronto, Canada.

10

IST-2000-29380, RTD project funded by the European Commission and project partners, 2001-2003. 8

16. Boldyreff , Cornelia and Kewish, Richard, Reverse Engineering to Achieve Maintainable WWW Sites, IEEE Working Conference on Reverse Engineering, October 2001. To appear.

Web: An Engineering Approach, John Wiley & Sons, 1999. 31. Lavery, Janet and Boldyreff, Cornelia, Considering Customers: The Key to Successful Commercial Web Sites, submitted to 3rd International Workshop of Web Site Evolution, November 2001.

17. Antoniol G, Canfora G, et al, Web Sites: Files, Programs or Databases?, Proceedings of WSE'99, 1st Annual Workshop on Web Site Evolution, pp. 6-8, 1999.

32. Drobik, Alex, "e-business is not easy business". In The Computer Bulletin, page 27, January 2000.

18. Bennett, K, Legacy Systems: Copying with success, IEEE Software, pp 19-23, Jan. 1995.

33. IBM, "Ten success factors for e-business", IBM Corp., IBM Software Division, Route 100, Building 1, Somers, NY 10589, http://www.jellysideup.com, June 1999.

19. Bray, Tim, Measuring the Web, Proceedings of the Fifth International World Wide Web Conference, Paris, France, 1996.

34. Nielsen, Jakob, "How Users Read on the Web", In useit.co Alertbox, www.useit.com/alertbox/ , October 1st 1997.

20. Prevelakis V, Managing large WWW Sites, Internet Research: Electronic Networking Applications and Policy, Vol. 9, No. 1, pp. 41 -48, 1999.

35. Nielsen, Jakob, "Why You Only Need to Test With 5 Users ", In useit.com Alertbox, www.useit.com/alertbox/ , March 19th 2000.

21. Aspden P, Katz J, Motivations for and barriers to Internet usage, Internet Research: Electronic Networking Applications and Policy, Vol. 7, No. 3, pp. 170-188, 1997.

36. Warren, Paul, Gaskell, Craig and Boldyreff, Cornelia, A Foundation for Website Metrics Research, submitted to 3rd International Workshop of Web Site Evolution, November 2001.

22. White MD, Abels EG, Hahn K, Identifying user-based criteria for Web pages, Internet Research, Vol. 7, No. 4, pp. 252 -262, 1997.

37. Donkin, Joanna, Boldyreff, Cornelia, Burd, Liz, Marshall, Sarah, Supporting Sign Language Users of Web-Based Applications: A Feasibility Study, Proceedings of the 1st International Conference on "Universal Access in Human-Computer Interaction" (UAHCI), August 2001. To appear.

23. Nielsen, J, Designing Web Usability, new Riders Publishing, 2000. 24. White MD, Abels EG, Hahn K, User-based design process for Web sites, Internet Research, Vol. 8, No. 1, pp. 39-50, 1998. 25. Boldyreff, Cornelia, Burd, Liz, Donkin, Joanna, Marshall, Sarah, The Case for Plain English to Increase Web Accessibility, submitted to 3rd International Workshop of Web Site Evolution, November 2001. 26. Arnold SC, An Architecture for Maintaining Link Structure of a Website, Proceedings of WSE'99, 1st Annual Workshop on Web Site Evolution, pp. 9-11, 1999. 27. Hartman JH, et al, Index-based Hyperlinks, Computer Networks and ISDN Systems, Vol. 29, pp. 1129-1135, 1997. 28. Kyaw, Phyo and Boldyreff, Cornelia, Survey of Hypermedia Design Methods, Technical Report CS-398, Department of Computer Science, University of Durham, 1998. 29. Kyaw, Phyo, An Investigation of Web-Based Hypermedia Design Support methods and Tools, MSc Thesis, University of Durham, Durham, U.K., 1999. 30. Lowe, David, and Hall, Wendy, Hypermedia & the 9

10