The retrieval of structured design rationale for the ... - Semantic Scholar

3 downloads 1935 Views 3MB Size Report
Mar 7, 2012 - Research about how to capture, store, and re-use design ratio- nale has ...... done on a HP EliteBook 8440p laptop which has an Intel Core i5.
Advanced Engineering Informatics 26 (2012) 251–266

Contents lists available at SciVerse ScienceDirect

Advanced Engineering Informatics journal homepage: www.elsevier.com/locate/aei

The retrieval of structured design rationale for the re-use of design knowledge with an integrated representation Hongwei Wang a,b,⇑, Aylmer L. Johnson a, Rob H. Bracewell a a b

Engineering Design Centre, Cambridge University Engineering Department, Cambridge CB2 1PZ, UK School of Engineering, University of Portsmouth, Anglesea Read, Portsmouth PO1 3DJ, UK

a r t i c l e

i n f o

Article history: Received 1 February 2011 Received in revised form 8 February 2012 Accepted 9 February 2012 Available online 7 March 2012 Keywords: Knowledge management Design rationale Design knowledge re-use Information retrieval

a b s t r a c t Design knowledge can be acquired from various sources and generally requires an integrated representation for its effective and efficient re-use. Though knowledge about products and processes can illustrate the solutions created (know-what) and the courses of actions (know-how) involved in their creation, the reasoning process (know-why) underlying the solutions and actions is still needed for an integrated representation of design knowledge. Design rationale is an effective way of capturing that missing part, since it records the issues addressed, the options considered, and the arguments used when specific design solutions are created and evaluated. Apart from the need for an integrated representation, effective retrieval methods are also of great importance for the re-use of design knowledge, as the knowledge involved in designing complex products can be huge. Developing methods for the retrieval of design rationale is very useful as part of the effective management of design knowledge, for the following reasons. Firstly, design engineers tend to want to consider issues and solutions before looking at solid models or process specifications in detail. Secondly, design rationale is mainly described using text, which often embodies much relevant design knowledge. Last but not least, design rationale is generally captured by identifying elements and their dependencies, i.e. in a structured way which opens the opportunity for going beyond simple keyword-based searching. In this paper, the management of design rationale for the re-use of design knowledge is presented. The retrieval of design rationale records in particular is discussed in detail. As evidenced in the development and evaluation, the methods proposed are useful for the re-use of design knowledge and can be generalised to be used for the retrieval of other kinds of structured design knowledge. Ó 2012 Elsevier Ltd. All rights reserved.

1. Introduction The design and development of complex products involves a complex process in which design engineers perform a large number of activities related to creating and evaluating solutions, solving problems raised, and making decisions. Most of these activities depend upon the knowledge and experience obtained in previous projects, and require the effective and efficient re-use of design knowledge. Therefore the need for capturing, representing, and retrieving knowledge is raised. The knowledge needs of engineering designers can vary considerably, depending on their different levels of experience and different contexts of working [1]. For instance, when a description in text is not enough, designers may need to refer to a CAD model to understand the link between the function of a component and the realisation of a specific function. Moreover, cur⇑ Corresponding author. Present address: School of Engineering, University of Portsmouth, Anglesea Road, Portsmouth PO1 3DJ, UK. Tel.: +44 0 2392 842569; fax: +44 0 2392 842351. E-mail address: [email protected] (H. Wang). 1474-0346/$ - see front matter Ó 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.aei.2012.02.003

rent product development is shifting towards an integrated and collaborative scheme [2]. As such, an integrated representation scheme of design knowledge needs to be developed to reflect its multi-faceted nature. Much research work has been done to study how to record product knowledge and process knowledge [3,8,9]. Though knowledge about products and processes can illustrate the solutions created (know-what) and the courses of actions (know-how) involved in their creation, the reasoning process (know-why) underlying the solutions and actions is still missing from an integrated design knowledge representation. Design rationale is an effective way of capturing this missing part of an integrated representation of design knowledge, and can be viewed as a valuable intellectual asset of an enterprise. It can be defined from a variety of viewpoints, e.g. ‘‘an explanation of why an artefact, or some part of an artefact, is designed the way it is’’, or as something that ‘‘includes all the background knowledge such as deliberating, reasoning, trade-off, and decision-making in the design process of an artefact – information that can be valuable, even critical, to various people who deal with the artefact’’ [4]. It records designers’ knowledge of what issues should

252

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

be addressed, how specific solutions are generated, as well as judgements as to why a particular solution should (or might not) work. Design rationale can offer designers useful information about how previous designs evolved and the context in which such evolution happened. It is also deemed to be an essential part of the information needed for identifying the explicit linkage between the design record and the emergent outcomes as seen in service [5]. Research about how to capture, store, and re-use design rationale has been undertaken in a wide range of domains, including social science, software engineering [6], and engineering design [4]. Although designers can refer to various sources of knowledge with an integrated representation scheme, effective retrieval methods are also necessary for re-using design knowledge – especially for the design of complex products, which can involve a huge amount of knowledge. The retrieval of design information can actually be done in a number of ways, e.g. shape-based retrieval, design cases retrieval, and ontology-based retrieval [7]. For the integrated representation of design knowledge, retrieval of design rationale is very helpful, for several reasons. Firstly, design engineers tend to search for knowledge about specific issues and solutions before deciding to view detailed CAD models or process specifications. Secondly, design rationale is mainly described using text, which is effective in expressing and meeting designer’s knowledge needs. Last but not least, design rationale is generally captured by identifying elements and their dependencies, i.e. in a structured way, which conforms to the reasoning process of designers and is therefore potentially useful for improving retrieval performance. A retrieval method based on matching keywords is reasonably effective, as the information in design rationale records is mainly described using text. However, most keyword-based retrieval methods discard the implicit structures of these records, resulting either in poor precision of retrieval or in isolated pieces of information that are difficult to understand. This research aims to go beyond keyword matching by utilising the implicit structures in design rationale records so as to develop methods and tools to facilitate the provision of useful and re-usable design knowledge in new design projects. The remainder of this paper is organised as follows. In Section 2, we review related work on knowledge management, design rationale, and design information. In Section 3, we describe the utilisation of an integrated representation of design knowledge based on design rationale to fulfil designers’ knowledge needs. In Section 4, a framework for retrieving design rationale is introduced. In Section 5, detailed methods and algorithms for retrieving design rationale by utilising the implicit structures are introduced. After that, implementation of a prototype system and evaluation of the methods developed are given. Finally the discussions and conclusions are given in Section 7.

2. Literature review 2.1. Knowledge management for engineering design A characteristic that distinguishes engineering designers from other professionals is that they have the capability of applying technical knowledge, making decisions, and adopting courses of action, to solve design problems. This capability can also be termed as ‘knowledge’ which is the intellectual asset of, and should be well tended by, design organisations. Studies on knowledge management for engineering are mainly aimed at addressing issues such as what knowledge should be captured, and how to represent and retrieve it. The ultimate goal of any knowledge management research is to re-use the knowledge effectively and efficiently, without imposing too much burden on designers. According to Hicks et al., the re-use of knowledge in engineering design typically

aims to make available the experiences of individual knowledge or organisational knowledge from previous design activities in order to better inform and enable future design activities [8]. Generally, there are two well-accepted schemes for classifying design knowledge. The first is based on the contents of the knowledge whereas the second has a focus on the property of knowledge. For the first scheme, knowledge about artefacts (product knowledge) and knowledge about the problem-solving process (process knowledge) are distinguished. For the second one, design knowledge is divided into explicit knowledge and implicit knowledge, in terms of the extent to which a piece of knowledge can be articulated. Explicit knowledge refers to the pieces of knowledge that can be easily codified and effectively presented to designers using computing. Implicit knowledge, on the other hand, is about the experience which is hard to codify and can only described by experienced designers. The distinction between explicit knowledge and implicit knowledge also reflects the different viewpoints, namely the codification view and the personalisation view, on design knowledge [9]. Specifically, the former has a focus on explicit knowledge and is aimed at codifying pieces of knowledge and performing tasks automatically using computing. The latter emphasises the capture and re-use of implicit knowledge which is often done by assisting individual designers in recording information. These two views pertain to the two typical approaches to knowledge management for engineering design, namely Knowledge-Based Engineering (KBE) and Knowledge and Experience Management (KEM). KBE and KEM, though having different focuses, can be applied to manage both product and process knowledge. This research aims to develop computing methods for retrieving design rationale records and is actually related to both KBE and KEM. Firstly, design rationale is captured with a focus on KEM. Secondly, this work is also focused on developing computing methods used for knowledge retrieving. KBE was defined by Chapman and Pinfold as ‘‘an engineering method that represents a merging of object-oriented programming, artificial intelligence techniques and computer-aided design technologies, giving benefit to customised or variant design automation solutions’’ [10]. From this definition, a distinctive feature of KBE can be identified as its focus on the automatic generation of geometric information and its purpose of codifying knowledge in computer programs to streamline the design process. The types of knowledge that can be codified and utilised to automate the design process are still quite limited, though good progress has been made in this area. Some KBE applications have been researched and published. For instance, Susca et al. presented a KBE application which was used to perform an automatic calculation and evaluation of the mass properties of racing cars [11]. Marx et al. developed a KBE system integrated with numerical analysis codes to evaluate aircraft structural concept, material and process selections [12]. Skarka studied the automatic generation of design models based on the Methodology and tools Oriented to Knowledge-based engineering Applications (MOKA) framework [13]. KEM is mostly motivated by the fact that designers in a team rely very heavily on consulting experienced colleagues to acquire information and knowledge [14]. However, experts are not always readily available to consult as they may retire or transfer to other organisations. Therefore, there is a great need to study how to capture, store, and re-use experts’ experiences that are very difficult to codify in KBE systems. Design knowledge captured in KEM systems is generally descriptive and needs to be well understood by designers. A typical KEM application is the design rationale system, e.g. the Design Rationale editor (DRed) tool, which aims to capture the reasoning process when design solutions are created and evaluated [14]. Unlike KBE, KEM methods can capture a wide range of design knowledge useful for decision making in later projects, however, a prominent difficulty for KEM is that designers may be

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

reluctant to capture their knowledge, as this is not their main focus in a design project. No matter which knowledge management approach is utilised, the capture, representation, and retrieval of knowledge is of great importance. 2.2. Knowledge representation and design rationale The purpose or intended use of the captured knowledge will significantly affect the quantity and level of detail which must be captured in order to acquire a useful body of knowledge, and describe the limit of its applicability [8]. Once the purpose of re-using knowledge is determined, methods for capturing and representing that knowledge can be developed. The capture can be done in many ways, e.g. manual capturing as the design project proceeds [14], automatic capturing during the design process [15], and extracting knowledge from design components [16]. For most of the present knowledge management systems, two kinds of knowledge are commonly involved, namely product knowledge and process knowledge [3]. The former is concerned with the function, form, and behaviour of products whereas the latter mainly focuses on how solutions are created and implemented. Currently, an integrated and collaborative scheme has been proposed and studied for complex product design and development, which essentially requires a multi-disciplinary development team [3,17]. The design knowledge for such a scheme is multi-faceted and requires an integrated representation scheme. Apart from product knowledge and process knowledge, design rationale is also deemed as a necessary part for the integrated representation as it captures lots of ‘engineering know-why’. Some researchers have begun to develop such integrated knowledge representations. For instance, Bracewell et al. proposed the development of an integrated design information space by integrating design rationale with other kinds of design information [18]. According to Giess et al., such an integrated information space is also useful for identifying the explicit linkage between it and the emergent outcomes as seen in service [5]. Design rationale is an important part of the integrated knowledge representation as it is able to describe the complex reasoning processes used. As pointed by Shipman and McCall, design rationale generally involves three points of view, namely argumentation, documentation, and communication [19]. These views have complementary advantages in facilitating either the capture or the retrieval of design rationale, and underline the usefulness of design rationale in practical design projects. A large scope of research on design rationale has been undertaken and published elsewhere, e.g. the devising of rationale models [20,21], and the development of computer tools for capturing design rationale [22,23]. Arora et al. listed several benefits of design rationale: firstly, it will help in recording the design decisions and the reasoning behind the design, which can be later analysed to control the overall design and manage the complexity of the design process; secondly, it will also have an impact on maintenance, which has been observed to be very much dependent on the design; thirdly, capturing design rationale and providing a suitable representation scheme will also help in reverse engineering of the design and creating libraries of design-process artefacts for design re-use and design traceability; fourthly, the rationale can be used to reason about the changes made by the designer, based on the knowledge base generated from previous projects [24].

253

many knowledge records for studying retrieval. Li et al. reviewed research work on design information retrieval and identified three prominent types of retrieval, namely shape-based retrieval, knowledge-based retrieval, and ontology-based retrieval [7]. Shapebased retrieval has domain-independent representation but there is a gap between low-level geometry features and high-level domain/user semantics. The other two kinds of retrieval use nongeometry and domain dependent textual representations, such that users can search the product based on functional specifications, constraints, attributes of the product, or the concept graph which may be similar to the previous designs [7]. Liu et al. proposed a computational framework for the retrieval of document fragments [25]. Charlton systematically studied the retrieval of various types of mechanical design information, from textual information to the structured representation of a design by using data mining techniques and Natural Language Processing (NLP) [26]. To find useful pieces of integrated design knowledge which includes product knowledge, process knowledge, and design rationale, the retrieval of design rationale should be studied in the first instance for the following reasons. Firstly, design engineers tend to need knowledge about issues and solutions before going into details of these elements. Secondly, design rationale is mainly described using text that tends to embody a lot of ‘know-how’, which may fulfil a subsequent designer’s knowledge needs. Last but not least, design rationale is generally captured by identifying elements and their dependencies, i.e. in a structured way, which conforms to the reasoning process of designers and therefore is promising for improving retrieval performance. There are mainly different approaches for retrieving design rationale, including a navigation approach (which permits designers to explore design rationale by traversing from one node to another via existing links) and a query-based approach (which matches keywords in response to designers’ queries). Kim et al. proposed two methods for the retrieval of design rationale captured using the DRed tool. The first approach is query-based, using NLP techniques to evaluate the similarity of rationale records [27]. The second approach analyses the task models of design re-use, and recommends relevant pieces of design rationale within a design process [28]. Two prototype systems were developed to demonstrate these methods. The first approach essentially involves establishing semantic annotation for design rationale records while the second approach tries to statistically anticipate the next design task likely to be performed by designers. In our earlier work, a keyword-based retrieval tool was developed for the design rationale captured using DRed [29]. A few specific methods were developed to improve the keyword-based retrieval, namely suggesting potential keywords to designers, quantifying the relevance of retrieved information, and automatically recommending relevant information. The keyword-based retrieval methods are reasonably effective as the information in design rationale records is mainly described using text. However, most of them discard the implicit structures of these records, which results either in poor precision of retrieval or in isolated pieces of information that are difficult to understand.

3. Integrated knowledge representation for effective knowledge re-use 3.1. Integrated knowledge representation

2.3. Design knowledge retrieval Although much work has been done on both KBE and KEM, research on design knowledge retrieval is still relatively sparse. This is partially ascribed to the fact that few of the developed systems are actively used in industry as yet, and therefore there are not

As discussed above, the design and development of complex products involves lots of tasks of solving problems and making decisions. These tasks are generally carried out by a multidisciplinary team and require a huge number of electronic files to be produced for different stages of the design process. Team members’

254

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

expertises, together with the materials produced, are very important intellectual properties of an organisation and will tremendously improve the efficiency of future design projects if properly re-used. There still exist difficulties in re-using the electronic materials though their management keeps improving due to the increasing maturity of Information Technology (IT) services in organisations. This is partially caused by the isolation of different materials and the low efficiency of accessing these materials. The deployment of software packages for Product Life-cycle Management (PLM) in design organisations can largely resolve this problem, and current PLM solutions offer very good functionality for managing product and process information. Nevertheless, previous research shows that designers in a team still rely very heavily on consulting experienced colleagues to acquire information and knowledge [14]. Computer records, by contrast, can provide very specific information but generally fail to offer high-level semantics. For instance, CAD models enable us to understand the layout of an assembly together with detailed dimensions for each component. However, this kind of information is only useful for designers who already have some knowledge about the design and would like to look at it in a more detailed way. Designers without knowledge about the design will find the models difficult to understand, as ‘reverse engineering’ is a task which requires a lot of prior knowledge. Although design reports can provide semantic descriptions, they are not always available for Intellectual Property (IP) protection reasons, and moreover a huge amount of information is contained in a single file, which can easily be overlooked in a Product Lifecycle Management (PLM) system with thousands of reports. Therefore, a method needs to be developed to provide the missing semantic information, and more importantly provide pieces of knowledge with fine granularity for engineering designers. For this purpose, we propose an integrated representation scheme that uses design rationale to organise and integrate the other necessary materials about the product and the design process. Design rationale not only has the functions of documentation, communication, and argumentation, as identified by Shipman and McCall [19], but is also able to offer important engineering ‘know-

whys’ that are missing in other electronic materials such as CAD models and requirement specifications. In this research, the design rationale records analysed and utilised are captured by using the Design Rationale editor (DRed) tool which is developed at the Engineering Design Centre (EDC) of Cambridge University and is owned and controlled by Rolls-Royce plc. The design rationale records created using DRed are termed DRed graphs throughout this paper as they are captured as graphs with dependencies. It is noteworthy the methods developed are based on DRed graphs but can be easily extended to other design rationale models. Detailed introduction to the DRed tool and how it can be utilised to capture design rationale is published elsewhere and is beyond the scope of this paper [14,29]. In the latest version of DRed, a template is provided for defining and structuring the various tasks involved in a complex product development process. The various sources of knowledge can be then structured as well via the template. An illustration of the representation of integrated knowledge used by DRed is shown in Fig. 1. There are four main parts in a project in which DRed is utilised to organise and integrate various pieces of design knowledge, namely project management, problem formulation, solution generation, and prototype manufacturing and development. Specifically, ‘project management’ is concerned with how the design project should be organised and carried out (network diagram, work breakdown, stakeholder analysis, etc.). ‘Problem formulation’ involves the analysis work on previous designs and the specific requirements for the current one. ‘Solution generation’ is about the design tasks and covers both the conceptual and the detail design stages. In ‘prototype manufacturing and development’, design rationale about the issues raised with regard to manufacturing will be captured. As shown in the figure, various files are created for different purposes (e.g. files for showing sketches or CAD model) and linked to DRed files via bi-directional links. Based on the template, the representation scheme not only includes knowledge about issues, answers, and arguments but can also provide detailed information about project management, requirement analysis, functional analysis, etc., with details provided by external documents such as sketches, material data sheets, and CAD systems.

Fig. 1. Integrated knowledge representation based on the template in DRed.

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

255

Table 1 Different knowledge needs fulfilled by the integrated knowledge representation. Knowledge needs

Illustration to the needs

How they are fulfilled

Obtaining information Typical value

Requesting where specific information in the form of documents, numerical data, etc., could be obtained Requesting typical values, as well as maximum and minimum values

Terminology

Queries regarding what a particular term meant

Trade-offs

Effects of one issue on another

How does it work

How a particular part of the product functioned.

Why What issues to consider When to consider issues

Why a design is carried out in a particular way. Issues that should be considered during particular stages of the design process and also the importance of issues When issues should be considered

How to calculate

The methods used by a designer to achieve a task

Design process

Aspects of the design process including: the information provided during the design process; what is expected to be produced, etc. The distribution of design work between departments; the relevant company procedures; information on relevant people; other aspects of company procedure fell into this category

Always. DR models are linked to external files, e.g. spreadsheet storing material data Sometimes. For example, the DR about how the dimension of a component is calculated Sometimes. For example, text node is sometimes used in DRed graphs for explaining terms Always. IBIS-based DR captures issues, answers, and arguments Always. The functional analysis diagram can record functions of components or systems. Always. Capturing ‘know-why’ is the feature of DR. Always. IBIS-based DR captures issues, answers, and arguments Always. IBIS-based DR captures issues, answers, and arguments Sometimes. For example, the DR about how the dimension of a component is calculated Always. Diagrams for ‘‘project management’’ can be used to provide these pieces of information Seldom. Diagrams for ‘‘project management’’ mainly concern the design project and company procedures are seldom covered

Company process

3.2. Fulfilling designers’ knowledge needs In an integrated knowledge representation scheme, different pieces of knowledge complement each other so as to provide design knowledge that can be easily re-used by designers. For example, designers can open a CAD model when they would like to see more details about a solution described in a piece of design rationale. Meanwhile, they can also open the design rationale file when they are viewing a CAD model and would like to know the context in which this model was created. Design rationale captured using DRed is stored as a plain text file with a .dre postfix in the computer’s filing system, and is organised as a graph with dependencies. Fine granularity of information is achieved as the nodes in a graph can be referred to as basic elements of information. As well as the contexts and semantics, design rationale also offers detailed knowledge about why a particular solution was chosen and developed as it is. Actually, the knowledge needs of engineering designers can differ, depending on their levels of experience and contexts of working. Ahmed and Wallace did a comprehensive analysis of the discourse between novice designers and experienced designers and identified eleven main kinds of knowledge needs [1]. These needs are listed in Table 1, together with some illustrations and answers to whether design rationale can fulfil these needs. It is noteworthy that the answers were obtained by evaluating the capabilities of the design rationale captured using DRed. As shown in the table, design rationale can fulfil most of these needs. In particular, needs about trade-offs, what issues should be considered, and why a design is finally accepted, fit well with the concept of capturing design rationale. A few other needs such as terminology and how a part of a product works can also be fulfilled, as design rationale contains a lot of information about how technical knowledge is applied to realise functions. The remaining needs, such as how to carry out a specific calculation, design process, or company process can also potentially be fulfilled by design rationale. Even when they cannot be fulfilled by design rationale on its own, other files linked to a piece of design rationale record can also assist, given that the query somehow appears in the record. Since the integrated knowledge representation can fulfil the various knowledge needs identified and design rationale can fulfil most of them on its own, a need is raised immediately to address the issue of finding the relevant pieces of knowledge

from a large amount of records. DRed graphs contain lots of descriptive information and are linked to other materials created for other purposes. Moreover, as DRed is now being used and evaluated in industry, the number of DRed graphs is steadily increasing. A study on how to retrieve design rationale is thus timely and useful for the effective re-use of integrated design knowledge.

4. A framework for design rationale retrieval An integrated knowledge representation offers various materials created during the design process to fulfil the information needs of engineering designers. Design rationale is the key to such a representation scheme as it not only contains lots of engineering ‘know-whys’ but also plays the important role of organising all the other materials to form a piece of integrated knowledge. Apart from its key role in the representation, there are two further reasons for supporting the retrieval of integrated knowledge by searching design rationale records. Firstly, the text in design rationale records is much easier to retrieve compared with other kinds of information such as geometric and pictorial information. Secondly, engineering designers mainly re-use knowledge to find solutions for an issue or check whether a proposed solution even succeeded (or failed) in previous projects. So it is reasonable to find pieces of design rationale in the first instance and then look at other resources of knowledge for more details. The design rationale space for a complex design project tends to be very large and is generally divided into several smaller pieces. For example, in DRed, a large design rationale space is captured by creating a number of separate DRed files, each of which contains a separate part of the complete DRed graph. The nodes in the graph (each of which might define an issue, an answer, or an argument) form the basic elements of the design rationale captured using DRed, and can be treated as separate results in response to a query: firstly because designers usually know whether they are looking for issues, solutions, or arguments; and secondly, because the dependencies in the graphs are highly structured, and these implicit structures are very useful for designers. Moreover, the structure of the information in DRed graphs is also very useful for its retrieval as it not only indicates the features (e.g. type and status) and contexts of each DRed node but also reflects the dependencies be-

256

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

tween different nodes. Based on the above considerations, a framework for retrieving design rationale is developed, as shown in Fig. 2. The framework consists of three layers, namely the resource layer, the methodology layer, and the Graphical User Interface (GUI) layer. Specifically, DRed files together with other files for storing indexes, word variations, etc., are involved in the resource layer. The GUI layer is concerned with the construction of interfaces from which users can submit queries, choose searching options, and view retrieval results. There are two parts in the methodology layer, namely keyword-based retrieval and improvements to it. The keyword-based retrieval system includes methods for suggesting potential keywords and processing queries, dealing with word variations, synonyms, etc., constructing indexes, and algorithms for measuring similarity and ranking results. The development of keyword-based retrieval provides the basis for further improvements such as utilising the implicit structures in DRed graphs, utilising users’ contexts of working, supporting collaborative work, and understanding the semantic information in DRed nodes. These potential improvements form the roadmap of our ongoing research which is aimed at developing effective and efficient retrieval methods for design knowledge. This paper will focus on the first aspect of the potential improvements, i.e. the retrieval of DRed graph fragments on the basis of the understanding of their structures. 5. The retrieval of structured design rationale 5.1. Structured information in DRed graphs The structured information of a DRed graph is another source of semantics apart from the textual contents of the DRed nodes it contains. As introduced above, DRed nodes are the basic element

of information in DRed graphs. Generally, information about a node’s type, status, position in a graph, and connections with others, can be utilised by a retrieval system to infer the purpose of a given node, and to use these inferences to determine the degree to which a particular node matches the query submitted by a user. Specifically, a node’s position in a graph and its connections with other nodes can be utilised to infer its importance and complexity. For example, if a keyword (e.g. ‘combustion’) appears in the top node of a DRed graph, it is very likely that the whole rationale record is trying to resolve a problem about combustion. If many connections are created for a node, this probably means that this node is about a complex topic requiring many arguments or raising many further issues. Moreover, the connection between two nodes also indicates the dependency between them and offers some useful information for a retrieval system. The different kinds of dependencies together with the number of times they appear in the dataset utilised in this paper are shown in Table 2. The arrow symbol indicates the left node is derived from the right one. As shown in the table, the connection from an answer node to an issue node (meaning that an answer has been proposed for an issue) is the most common. An initial method for utilising this structured information is to classify DRed nodes in terms of their types. Such a classification can not only filter the retrieved results, but can also help users to submit queries that better reflect their information needs. Moreover, the type and status information of a DRed node can also make it stand out from the results list. In this research, we assume that there are two main reasons for design engineers to search DRed graphs: firstly, they have a design issue and would like to see whether similar issues have been identified and resolved in previous projects; and secondly, they have a potential solution, and would like to see whether similar solutions were used in previous projects. Therefore, issue nodes are given the highest priority, as it

Fig. 2. A framework for the retrieval of design rationale.

257

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266 Table 2 Different kinds of dependencies in DRed graphs. Dependency

Illustration of the dependency

Numbers

Answer ? :Issue Pro-argument ? :Answer Con-argument ? :Answer Answer ? :Answer Issue ? :Answer Issue ? :Issue Answer ? :Con-argument Issue ? :Con-argument Issue ? :Pro-argument

An answer is proposed for an issue An pro-argument for an answer is raised An con-argument against an answer is raised An answer is divided as further sub-answers Further issues are raised from an answer proposed An issue is expanded as several sub-issues Answers are developed to reply to a con-argument An new issue is raised as the result of a con-argument An new issue is raided as the result of a pro-argument

236 174 172 81 62 39 26 19 11

is believed that engineers tend to work in a problem-oriented way. Answer nodes will be given medium priority as solutions are also things that engineers would like to find and analyse. Argument nodes will be given lowest priority, with pro-arguments and conarguments being treated equally. The statuses of DRed nodes can also be utilised in the classification discussed above. For example, issue nodes can be classified as groups of resolved issues or open issues, and be assigned different priorities. Although users cannot specify the priorities, they can use different options to filter the results (as shown later in Fig. 6), e.g. looking at only resolved issues. Another method of using the status information is to generate a summary for each node, in addition to its text. This summary might include the type of the node, its status, and any important nodes connected to it. In this way, users can quickly ignore a particular node group if they are only looking for solvable issues, whilst still being able to get further details if they would like to see why no effective solutions were found for this issue. The levels of DRed nodes in a graph in essence reflect the different stages of a problem-solving process at which they were created. Higher levels mean earlier stages of the process whereas lower levels correspond to later stages. Nodes at high levels tend to describe a design problem with a high level of abstraction whereas those at low levels are usually aimed at describing solutions and arguments with more details. An important usage of the interconnections of a DRed node is to put the node in context, like putting a few sentences in a paragraph. A single DRed node does not contain much information, consisting of a sentence or two to introduce an issue, to describe a solution, or to make an argument. When users navigate a DRed graph, they look at a node, try to understand its contents, and quickly move to another one to reason using information accumulated. Therefore, a group of interrelated DRed nodes, i.e. a sub-graph in a DRed graph, can help put any single node in context and make the information easier to understand. In this way, a sub-graph can be used as the response to users’ query. If this can be implemented, users can even submit a piece of design rationale as a query to better express their information needs.

5.2. Finding groups of nodes as retrieval results The reason for returning a group of interconnected nodes as a retrieval result is to better fulfil users’ information needs by offering them more complete information than that contained by a single node. A set of other DRed nodes associated with a DRed node can not only help users to understand its context but also provide more information for users to judge whether the nodes found are genuinely useful. The construction of a group of nodes involves starting from a node to check whether other nodes directly or indirectly associated with it also match the query until a reasonably number of nodes are found. A number of principles for finding groups of nodes are shown in Table 3, which not only guide the development of an effective and efficient algorithm but can help

rank the nodes groups retrieved. Specifically, the first column shows the principles developed; the second column gives illustration to each principle; and the third column describes how these principles are applied in the development of retrieval algorithms. A simple method is to search from the top node of each tree and go down to lower levels of the tree, which requires checking a large number of nodes and is not efficient. A better method is to utilise the results list obtained by performing keyword-based retrieval and construct a group of associated DRed nodes from the results list. This method can be developed by picking up a node from the top of the results list and searching for other nodes either directly or indirectly associated with it. Such a method can moderately improve the retrieval performance as it avoids searching from the top of each tree. Nevertheless, this method still involves many iterations of checking as to whether an associated node matches the query. A further improvement for this method is to

Table 3 Principles for finding groups of DRed nodes. Principles

Illustration of the principles

Application of the principles

The contents of the nodes should be relevant to the query The dependencies between the nodes should be as strong as possible

This is to ensure that the contents of the nodes can fulfil the user’s information need The nodes in a group should all be connected together to describe a piece of design rationale

The information carried by the nodes should be as complete as possible

The story told by the group of nodes should be complete so that it can be well understood by the user

The number of extra nodes added to the group should be reasonable

Extra nodes are useful to help the user understand the meanings of a nodes group. However, adding too many extra nodes may degrade the nodes group’s relevancy to the query The nodes in a group should match the query as a whole and the case in which only a small number of nodes are relevant should be avoided

Nodes groups are constructed by using the results retrieved by keyword-based search Any node to be added to the group should have connection with at least one of the nodes already in the group If a node is deemed not too ‘far’ from a node already in the group but is not directly connected with it, the nodes connecting the two are also added to the group A limit is set to constraint the number of nodes in a nodes group. In addition, nodes more relevant to the query are checked first to determine whether they should be added to the group The algorithm can check the number of keywords contained in each node in the group, and then determine whether the keywords appear in those nodes evenly. This principle has not yet been applied to the algorithm implementation

The keywords in a query should appear in different nodes as far as possible

258

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

Fig. 3. Tags attached to DRed nodes in a tree structure.

measure the ‘distances’ between two nodes in the results list and construct a group of nodes which have acceptable ‘distance’ between each other. These ‘distances’ can be measured quickly by attaching numerical tags to nodes and comparing the differences between tags. As shown in Fig. 3, the first part of a tag represents the unique name of a project so that DRed nodes created for different projects can be distinguished. The second part after symbol ‘@’ indicates a node’s position in a tree. In this example, the ‘distance’ between ‘Issue A’ and ‘Issue B’ is two as the difference between their tags is ‘2.2’ which indicates two levels deeper in the tree. In general, the ‘distances’ between any two nodes can be calculated based on these tags. An algorithm for constructing nodes groups by using these ‘distances’ is shown in Fig. 4. This algorithm can greatly improve the speed of finding nodes groups. Firstly, only DRed nodes in the results list need to be evaluated. Secondly, the attaching of tags is a once-off pre-processing task, and therefore does not affect the run-time efficiency. 5.3. Using complex queries Users’ queries reflect their information needs and thus need to be correctly interpreted by a retrieval system. Using a set of keywords to form a query bas been proved to be an effective method, as evidenced in current large commercial search engines. In the context of retrieving design knowledge, designers’ knowledge needs are actually more complex. The query given by a designer might be simply checking whether a piece of terminology has been used before, or might sometimes contain very complex information (some words to describe an issue with some potential answers) which is difficult for a retrieval system to ‘understand’. The use of complex queries aims to distinguish the keywords given and put them in context so that the retrieval system can be better informed. This is easier for enterprise-level retrieval systems such

as design rationale retrieval than it is for large search engines, whose users have hugely varied needs. As discussed above, engineering designers’ motivations for retrieving and re-using design rationale mainly include: firstly, finding previous solutions for a given issue; and secondly, finding out how similar solutions worked for similar issues in the past. Thus their queries should tend to focus on issues and solutions, as well as a combination of both. Retrieval performance can be greatly improved if the retrieval system can identify what is the issue and what are solutions from the query. However, it is very difficult to do this using simple keyword-based retrieval, where all the keywords are treated equally. Furthermore, the use of complex queries for design rationale retrieval can help designers to better express their information needs by enabling them to describe an issue of interest, a proposed solution, and even the arguments for and against the solution (i.e. the query can itself be expressed as a small piece of design rationale). Retrieval using complex queries in essence is also a version of finding nodes groups to match a query, the only difference being that the query is more informative. A simple method can be developed to strictly evaluate each node in a complex query and only those nodes groups having the same structure and contents as the query are deemed to be useful. This method has a critical drawback, though, because there are many cases in which solutions for an issue node do not appear as its immediate children in the tree structure. Therefore, all such cases will be discarded by such a method. A better method can be developed to resolve this problem, as shown in Fig. 5. In the figure, the solid lines indicate the connections in both queries and DRed graphs, while the dashed lines indicate the action of matching. In this method, a group of nodes that match the top node in the query are found and suppose one of these nodes is ‘Issue A’. After that, a group of nodes are found to match the answer described in the query, and only those that have some sort of dependency (and this can be done by comparing the tags of each node) with ‘Issue A’ are kept, as illustrated in Step 2. In Step 3, similar things are done for the two arguments in the query. Finally, a path (or several paths) will be returned as the result, as shown in Step 4.

6. Development and evaluation 6.1. System implementation To verify the proposed retrieval methods, a prototype system is being developed on the basis of an earlier keyword-based system, and some preliminary evaluation has now been performed. Detailed introduction to the keyword-based system is beyond the scope of this paper, and is published elsewhere [29]. The

Fig. 4. The process of constructing nodes groups by using the ‘‘distances’’ between nodes.

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

259

Fig. 5. A method for finding results using complex queries.

Fig. 6. A snapshot of the prototype system.

prototype is implemented using Java and can run on multiple platforms. A snapshot of the GUI of this prototype system is shown in Fig. 6, with some annotations to some specific parts of the GUI. Though it is currently developed as a standalone application, it can be extended as a web-based collaborative tool and integrated with DRed once the methods are proved to be viable and useful. As shown in the figure, information about the different types of the retrieved results is shown on top of

the panel, which can assist users to decide whether to do further filtering on the results by using different retrieval options. The retrieval options enable users to choose the particular node types they would like to search for, e.g. what kind of DRed nodes (e.g. issues, answers, pro and con arguments) and what statuses these nodes should have (e.g. resolved issues or rejected answers). Currently, the summary generated simply includes the type and status information of a node, which enables users to

260

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

Fig. 7. Nodes groups shown on the GUI as retrieval results.

Fig. 8. A snapshot of the GUI for creating complex queries.

ignore some results at a first glance. Below the content of each retrieval result (i.e., a DRed node), there are two links. The first

can be clicked to open the DRed file where the node exists. The second link is used to automatically suggest other DRed nodes

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

261

Fig. 9. Two pieces of rationale from which the queries were formed.

that either have similar contents to, or are connected with, the node of interest to users. In addition to using type and status information to assist users, the prototype system also has two main features which make use of the rationale structure inherent in DRed graphs. The first feature is the search of nodes groups on the basis of the algorithm described in Section 5.2. Fig. 7 shows the GUI for displaying the retrieved nodes groups (in this case, a nodes group of eleven nodes is found for the query ‘‘stress defence’’ from a DRed graph with almost 50 nodes). To perform the search of nodes groups using the prototype system, users only need to select ‘‘Nodes groups’’ as the retrieval option and input a query in the field on the right. In each block, a DRed node’s type and status is shown on the top and its textual content is shown on the bottom. Keywords in the query used are highlighted to draw the user’s attention, and the whole content of the node is highlighted as the cursor is moved over a block. On the right of the GUI, the top panel provides a list of nodes groups found and allows users to click to view one of them, and the bottom panel shows an overview of the relevant nodes group to help the user navigate between different parts of

the graph. The second feature is the use of complex queries which not only allows users to better describe their information needs but also enables the system to better understand those needs. The GUI for creating a complex query is shown in Fig. 8 where a query is formed by specifying an issue, an answer, and two arguments. The prototype system allows users to create queries of arbitrary complexity, though in practice queries should be kept fairly simple to improve the chance of successfully finding matching results. 6.2. Evaluation The proposed methods have been evaluated by comparing the results obtained from the prototype system and those obtained from a keyword-based retrieval system [29] for a number of test cases. These cases were formed on the basis of a dataset comprised of 35 DRed graphs (some of which are separated while some are linked with each other) developed over 11 projects at a collaborating company. There are in total 892 DRed nodes connected using 865 edges. (In the examples shown, any commercially confidential

Fig. 10. Parts of the results obtained from keyword-based retrieval for the two cases.

262

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

Fig. 11. Nodes groups found in the two cases.

information has been blurred deliberately). The performance of a retrieval system can generally be evaluated in terms of its effectiveness and efficiency. Effectiveness is the focus of this research as the methods developed are mainly aimed at improving the quality of results. Effectiveness of retrieval is determined by two measures, namely recall and precision. Specifically, recall is defined as the fraction of the relevant documents successfully retrieved by a retrieval system in all the relevant documents in a collection whilst precision is defined as the fraction of the relevant documents in all the documents retrieved by a retrieval system [30]. Queries used in the test cases were formed by randomly opening a DRed graph and getting a few keywords from a piece of rationale that spreads over

more than one DRed node. For each test case, three sets of results were obtained, namely the one from keyword-based retrieval, the one from the search of nodes groups, and the one from the search using complex queries. Two test cases are described in detail to explain the comparison. The first one (the query ‘‘loose filler joining’’ is formed) is about using joining to resolve the problem that fillers became loose. The scenario is that the user has got a solution of using joining to resolve ‘‘loose filler’’ problem and would like to find out how such a solution worked in previous projects. The second one (query ‘‘avoid contamination coating’’ is used) is similar to the first one, which involves a scenario that the user has got a solution of using

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

263

Fig. 12. Results obtained from the search using complex queries.

coating to avoid contamination for a component and would like to see how coating worked in previous issues. These types of issue are actually very typical in the design process and the reasoning processes used in previous projects can provide useful knowledge and information for users to make effective and efficient decisions. The two pieces of rationale are taken from two separate DRed graphs, and are shown in Fig. 9. The second one involves an issue with an immediate solution attached to it while the first one has one further issue explaining the purpose of developing a solution for the ‘‘loose fillers’’ problem. The retrieval results can then be evaluated by finding out whether the nodes in the figure can be successfully found (i.e. recall of retrieval) and whether many non-relevant nodes would also be returned (i.e. precision of retrieval). The results (only the top four are shown due to length limitations) obtained from the keyword-based retrieval are shown in Fig. 10. In test case one, a total of 25 results are retrieved and all the nodes in Fig. 9 are placed at the top of the list. In case two, a total of nine results are retrieved and only one of the two nodes in Fig. 9 is placed at the top, although both of them are found successfully. The retrieval results obtained from the search of nodes groups are shown in Fig. 11. In both of the cases, all the nodes in Fig. 9 are successfully found and the two nodes groups consists of 14 and 3 nodes for cases 1 and 2, respectively. The retrieval results obtained from the search that uses complex queries are

shown in Fig. 12 where the complex queries used are shown on the right and the nodes groups retrieved are shown on the left. In all the tests, the nodes in Fig. 9 are successfully found and included in the results list, and therefore the three methods all have 100% recall. The emphasis is thus on the comparison of precision. Taking test case 1 as an example, the total number of results is 25 (all of these nodes contain at least one of the keywords used in the query) while only three of them are relevant to the piece of rationale shown in Fig. 9. When the search of nodes groups is used, the number of results is decreased to 14, which implies an increase of precision. When the complex query shown in Fig. 12 is used, further improvement is achieved and the three nodes in the group are exactly those originally used in the rationale. The improvement of results quality is equally good for test case 2 so will not be discussed for the sake of brevity. Furthermore, when nodes groups are used as retrieval results, all the non-relevant nodes are actually correlated with the useful ones and therefore are to some extent useful to users. In addition to the two test cases described above, retrieval tests have also been done for other four cases. Table 4 describes the setting and results obtained for the three methods (method 1 is the keyword-based retrieval, method 2 is the search of nodes groups, and method 3 is the search using complex queries) in the six test cases. Based on the data obtained, a comparison of simulation precision was undertaken and is shown

264

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

Table 4 Retrieval setting and results for the six test cases. Test cases

Queries used

Case 1 ‘‘rail station taxi’’

Case 2 ‘‘loose filler joining’’

Case 3 ‘‘avoid contamination coating’’

Case 4 ‘‘flange move away double screw number’’

Case 5 ‘‘improve ignition capability fuel mixing’’

Case 6 ‘‘stress defence front chamfer face manufacturing tolerance’’

Information needs

No. of nodes in the rationale

No. of results from method 1

No. of results from method 2

No. of results from method 3

The user would like to see the pros and cons of taking a taxi to the rail station The user has a solution of using joining to resolve ‘‘loose filler’’, and would like to see how it worked in previous projects The user has a solution of using coating to avoid contamination on a component, and would like to see how it worked in previous projects The user has a solution of doubling screw number to resolve the moving away of flange, and would like to see how it worked in previous projects The user would like to know whether mixing fuels can improve ignition capability The user would like to find out the manufacturing tolerance of components to resolve the stress defence issue for front chamfer face

2

7

4

2

3

25

13

3

2

9

3

2

3

29

17

10

3

60

13

3

3

26

12

5

in Fig. 13 where ‘keyword’ means method 1, ‘structure’ means method 2, and ‘complex’ means method 3. As shown in the figure, the search using complex queries has the best performance in terms of precision due to the strict matching, which can achieve a 100% precision in most cases. The search of nodes groups achieves much better precision than keywordbased retrieval and moreover even those non-relevant nodes found can also help users to understand the rationale. All the tests were done on a HP EliteBook 8440p laptop which has an Intel Core i5 CPU (2.40 GHz) and 3 GB memory. The prototype takes about one second to do the pre-processing whilst most of the time is spent

Fig. 13. Comparison of the retrieval precision of the three methods for six test cases.

on reading data from text files. For the test cases, all the three methods can complete the retrieval tasks in far less than one millisecond. This means the prototype system can still work well even if the number of DRed nodes scales up to more than a million. As the other two methods are based on the results of keyword-based retrieval, they inevitably take slightly more time. The extra time taken is, again, far less than one millisecond, if only the time taken by the searching algorithms is calculated. When the time taken for showing results on the GUI is also taken into account, the last two methods take about 15 ms more than the keyword-based method. Nevertheless, this difference can actually be omitted as neither of these two methods allow a very big graph (otherwise the users can simply open a DRed file) to be used as a result. Apart from the improvements in retrieval performance, the prototype system also has some functions and features which are not possessed by traditional keyword-based methods and which can make the search of information easier for designers. Firstly, the type and status information is shown together with the contents of a node, which allows designers to quickly decide whether a node is genuinely useful. Secondly, this piece of information can also be used to filter the results for users who have already got a summary of the total number of results retrieved together with their types. Thirdly, the utilisation of nodes groups as retrieval results can tremendously improve users’ understanding of a retrieved result, as complete and in-context information is offered by nodes groups. With the nodes groups, designers do not need to open a DRed files with tens or even more nodes but can still understand the issues and solutions involved in the piece of design rationale. Fourthly, complex queries are supported by the prototype system, which enables design engineers to easily express their information needs by submitting meaningful queries. Therefore, the retrieval method that uses the rationale structure can improve the performance of retrieval in terms of both precision and easiness of use, and the algorithms developed for this method have been found to be both valid and effective.

6.3. Discussion As a prototype, the system still has a few drawbacks, which opens the opportunity for further research in this area. Firstly, the current algorithm for constructing node groups assumes a tree structure in attaching unique tags to each node; in fact, graph

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266

structures are common in design rationale records and must also be supported. In fact, this problem can be easily addressed as any graph structure can be expressed as a number of trees. When a node appears in more than one of those trees, it can be assigned more than one tag, with each tag being unique in a single tree. When determining the ‘distance’ between nodes, the retrieval algorithm can simply identify the tree within which the evaluation is going to be done, and thus select the right tag for the calculation. Secondly, the size of a node group will increase when a keyword appears in many nodes of a DRed graph, as observed in the test cases. This is mainly due to the fact that the retrieval algorithm does not consider the degree of matching of the nodes concerned, as it mainly focuses on their connections. This issue can be resolved by removing those nodes which only contain one popular keyword, when the size of the group is big. Thirdly, the precision for queries with many keywords is still not as good as that of the ones with fewer keywords, as shown in Fig. 13. This problem will be investigated in future work. Last but not least, the methods developed are more suitable for enterprise-level retrieval systems where the size of records is not very large and the information needs are very predictable. Overall, though, the retrieval methods have been found to be both viable and effective, and they open up a new field that goes beyond keyword-based searches for knowledge retrieval. Finally, although this research is based on the DRed tool, the methods proposed and the algorithms developed, are generalised and can be adapted to work within other issue-based information systems for knowledge representation. The comparison of the methods developed in this work with keyword-based search [27,28] shows that they can achieve better precision. This shows that the use of the features of the knowledge records can improve retrieval performance and thus it is possible to go beyond keyword-based search in design knowledge retrieval. This work is different from shape-based retrieval which uses both geometric and topological information in the search and requires shape models to be used as queries. As discussed in Section 3, design rationale is central to the knowledge representation, which supports the search of information even though no shape models are given. Moreover, the methods in this work enable the use of complex queries which can assist users to express complex information needs. This work is actually similar to design cases retrieval which also has an integrated search space involving various pieces of information. However, this work specifically deals with knowledge retrieval and the methods developed can be extended to other knowledge retrieval applications which also have structured knowledge representations. Semantic retrieval is a very interesting topic as the contents (about issues, answers, or arguments) within DRed nodes contain rich semantic information. This work is focused on the use of the structures of knowledge records to improve retrieval performance whilst semantic retrieval is aimed at identifying semantic relationships and contexts. Actually semantic retrieval and the use of structures in DRed graphs discussed in this paper have complementary advantages. An integration of the two methods can hopefully achieve better precision, and will be studied in our future work.

7. Conclusions An important advantage of using design rationale as a source of engineering knowledge is that it contains a lot of engineering ‘know-why’ which is captured as a by-product of the normal process of designing. An integrated knowledge representation (based on design rationale, product knowledge, and process knowledge) can help capture design knowledge from various sources in the design process, and can thus fulfil various knowledge needs of designers. The wide adoption of design rationale capture tools in

265

industry makes it very desirable to develop methods and tools for the retrieval and re-use of these records. A keyword-based retrieval method can be effective, as such records are mainly stored as plain text. However, such methods are not very efficient when large numbers of design rationale records have been captured and stored: the results of interest will be hard to find, amongst the large quantity of results returned. Moreover, the contents of a single node are often not sufficient to allow users to understand the issues properly. As evidenced in the development and evaluation of a prototype system, the utilisation of the implicit structures in DRed graphs is useful and can help improve retrieval performance. In our current research, the structure of the information is utilised in four main ways. Firstly, retrieval results can be filtered by using the structure. Secondly, a summary can be generated for each DRed node to clearly inform users about its type, status, and other features. Thirdly, since the information in a single DRed node is usually inadequate, a group of nodes can be found and returned as the retrieval result. Fourthly, complex queries can be formed and used to better express users’ information needs, and thus find better results. The methods proposed are shown to be useful and the algorithms used are feasible. As any design knowledge captured is in essence structured, the methods developed here can also be used for the retrieval of other structured design knowledge. In our future work, we will study how to enable users to formulate complex queries and how to understand their contexts of working so that relevant design rationale records can be recommended automatically. Acknowledgements The authors acknowledge the all round support of Rolls-Royce Plc. The support of Gareth Armstrong is particularly acknowledged for helping with the revision of the manuscript. References [1] S. Ahmed, K.M. Wallace, Understanding the knowledge needs of novice designers in the aerospace industry, Design Studies 25 (2) (2004) 155–173. [2] H. Wang, H. Zhang, A distributed and interactive system to integrated design and simulation for collaborative product development, Journal of Robotics and Computer-Integrated Manufacturing 26 (6) (2010) 778–789. [3] S. Szykman, R.D. Sriram, W.C. Regli, The role of knowledge in next-generation product development systems, Journal of Computing and Information Science in Engineering 1 (1) (2001) 3–11. [4] W.C. Regli, X. Hu, M. Atwood, W. Sun, A survey of design rationale systems: approaches, representations, capture and retrieval, Engineering with Computers 16 (3–4) (2000) 571–577. [5] M.D. Giess, Y.M. Goh, L. Ding, C.A. McMahon, Improved product, process, and rationale representation and information organisation to support design learning, in: Proceedings of the International Conference on Engineering Design (ICED 07), August 2007, France, Paris, 2007. [6] A.P.J. Jarczyk, P. Loffler, F.M. Shipman III, Design rationale for software engineering: a survey, in: Proceedings of the Twenty-Fifth Hawaii International Conference on System Sciences, January 1992, Kauai, HI, USA, 1992. [7] Z. Li, M. Liu, K. Ramani, Review of product information retrieval: representation and indexing, in: Proceedings of ASME 2004 Computers and Information in Engineering Conference, October 2004, Salt Lake City, Utah, USA, 2004. [8] B.J. Hicks, S.J. Culley, R.D. Allen, G. Mullineux, A framework for the requirements of capturing, storing and reusing information and knowledge in engineering design, International Journal of Information Management 22 (4) (2002) 263–280. [9] McMahon, A. Lowe, S. Culley, Knowledge management in engineering design: personalization and codification, Journal of Engineering Design 15 (4) (2004) 307–325. [10] C.B. Chapman, M. Pinfold, Design engineering - a need to rethink the solution using knowledge based engineering, Knowledge-Based Systems 12 (5–6) (1999) 257–267. [11] L. Susca, F. Mandorli, C. Rizzi, U. Cugini, Racing car design using knowledge aided engineering, Journal of Artificial Intelligence for Engineering Design, Analysis and Manufacturing 14 (3) (2000) 235–249. [12] W.J. Marx, D.N. Mavris, D.P. Schrage, A knowledge-based system integrated with numerical analysis tools for aircraft life-cycle design, Journal of Artificial

266

[13]

[14] [15]

[16]

[17]

[18]

[19]

[20] [21]

H. Wang et al. / Advanced Engineering Informatics 26 (2012) 251–266 Intelligence for Engineering Design, Analysis and Manufacturing 12 (3) (1998) 235–249. W. Skarka, Application of MOKA methodology in generative model creation using CATIA, Journal of Engineering Applications of Artificial Intelligence 20 (5) (2007) 677–690. R. Bracewell, K. Wallace, M. Moss, D. Knott, Capturing design rationale, Journal of Computer-aided Design 41 (3) (2009) 173–186. K.L. Myers, N.B. Zumel, P. Garcia, Acquiring design rationale automatically, Journal of Knowledge records, Journal of Artificial Intelligence for Engineering Design, Analysis and Manufacturing 14 (2) (2000) 115–135. P.C. Matthews, L.T.M. Blessing, K.M. Wallace, The introduction of a design heuristics extraction method, Journal of Advanced Engineering Informatics 16 (1) (2002) 3–19. H. Wang, A. Johnson, H. Zhang, S. Liang, Towards a collaborative modeling and simulation platform on the Internet, Journal of Advanced Engineering Informatics 24 (2) (2010) 208–218. R.H. Bracewell, M. Gourtovaia, K.M. Wallace, P.J. Clarkson, Extending design rationale to capture an integrated design information space, in: Proceedings of the International Conference on Engineering Design (ICED 07), August 2007, France, Paris, 2007. F.M. Shipman, R.J. McCall, Integrating different perspectives on design rationale: supporting the emergence of design rationale from design communication, Journal of Artificial Intelligence for Engineering Design, Analysis and Manufacturing 11 (2) (1997) 141–154. J. Lee, Design rationale systems: understanding the issues, IEEE Expert 12 (3) (1997) 78–85. M. Klein, Capturing design rationale in concurrent engineering teams, IEEE Computer 26 (1) (1993) 39–47.

[22] K.C. Burgess, E.J. Conklin, M.A. Crisfield, Report on a development project use of an issue-based information system, in: Proceedings of the International Conference on Computer Supported Cooperative Work (CSCW 90), NY, New York, 1990. [23] J.E., Burge, D.C. Brown, SEURAT: integrated rationale management, in: Proceedings of the International Conference on Software Engineering (ICSE 08), Leipzig, Germany, 2008. [24] V. Arora, E.J. Greer, P. Tremblay, A framework for capturing design rationale using granularity hierarchies, in: Proceedings of the Fifth International Workshop on Computer-Aided Software Engineering, July 1992, Montreal, QUE., Canada, 1992. [25] S. Liu, C.A. McMahon, M.J. Darlington, S.J. Culley, P.J. Wild, A computational framework for retrieval of document fragments based on decomposition schemes in engineering information management, Journal of Advanced Engineering Informatics 20 (4) (2006) 401–413. [26] C.T. Charlton, The retrieval of mechanical design information, Ph.D, Thesis, Department of Engineering, University of Cambridge, UK, 1998. [27] S. Kim, R.H. Bracewell, K.M. Wallace, A framework for design rationale retrieval, in Proceedings of the International Conference on Engineering Design (ICED 05), August 2005, Melbourne, Australia, 2005. [28] S. Kim, R.H. Bracewell, K.M. Wallace, Improving design reuse using context, in: Proceedings of the International Conference on Engineering Design (ICED 07), August 2007, France, Paris, 2007. [29] H. Wang, A. Johnson, R. Bracewell, Supporting design rationale retrieval for design knowledge re-use, in: Proceedings of the International Conference on Engineering Design (ICED 09), August 2009, Stanford, California, USA, 2009. [30] C.D. Manning, P. Raghavan, H. Schutze, Introduction to Information Retrieval, Cambridge University Press, 2008.