Dynamic Content Presentation

0 downloads 0 Views 500KB Size Report
Dec 7, 1999 - a platform for dynamically generating documents, ... information on a variety of media have arisen. ..... Generating Personal Travel Guides –.
Dynamic Content Presentation Nathalie Colineau, Cécile Paris, Stephen Wan Intelligent Interactive Technology group, CSIRO Division of Mathematical and Information Sciences, Sydney, Australia {nathalie.colineau, cecile.paris, stephen.wan}@cmis.csiro.au Abstract Documents generated for different devices must each take into account the display constraints of the device for which it is tailored. This is especially true in the case of mobile devices, which typically have a small screen size constraint. In this paper, we present a platform for dynamically generating documents, tailoring the presentation for the user’s choice of display device, be it web browser, handheld computer, or even paper. In addition, this platform is able to automatically maintain coherence and navigation consistency across different device specific versions of the document, which is crucial in avoiding user confusion. Keywords Tailored delivery; Content coherence; Space constraints. Natural Language Generation, Document Re-use

1 Introduction As an increasing amount of information becomes available, a number of customised services that deliver information on a variety of media have arisen. At the same time, a variety of mobile devices have appeared on the market. Many of these devices have the capability to act as the medium by which users access a wide range of information services. However, enabling these newer mobile devices for delivery for web information services can be an expensive task. Indeed, the information provided to users needs to be tailored and organised as a coherent whole. Furthermore, for effective information delivery the new devices’ presentation constraints must be taken into account. In this article, we outline a platform we designed and implemented (cf. [1], [2]) that overcomes many of the costly barriers to delivering tailored information on mobile devices. In particular, we present how the space device constraints are taken into account. We also describe an application built for the corporate memory domain.

2 Research Issues Generating a document with customised information in response to a user’s information requirements involves not only searching for relevant information, but also ensuring that the information is consistent and up-to-date. In addition, when users switch between one or more interconnected devices, coherence and consistency across devices becomes vital to avoid confusing the user [3]. Thus, our emphasis is on creating applications that work equally well on different devices and that are consistent across devices. The coherence of a document between different devices must be maintained to ease comprehension and to reduce confusion when users move from one device to another. Our Virtual Document Planner (VDP) addresses these issues.

3 Virtual Document Planner Many current web-based information services that produce documents tailored for particular users do so using relatively simple techniques. Such systems often produce dynamic documents that are generated from templates containing a static outline of the document’s structure. As more delivery devices are supported or as the degree of information tailoring increases, the number of templates begins to escalate quite rapidly. As a result, the cost of maintaining a template-based system also rises. Thus, as template-based systems scale up, they can incur considerable costs. These costs are often too high for such services to be economically viable. In contrast, the VDP avoids many of these ongoing costs. Our aim is to develop a platform to support customised information delivery from multiple data sources that takes into account user preferences (such as the choice of delivery media) and that is able to reduce document construction and maintenance costs.

Proceedings of the 4th Australasian Document Computing Symposium, Coffs Harbour, Australia, December 7, 1999.

3.1

System Architecture

The architecture currently used is a typical Natural Language Generation (NLG) architecture[4], in which linguistic resources are separate from the NLG processes, which produce the text. The resources include: discourse rules (or discourse strategies), which specify the organisation of content; presentation rules, which specify appropriate layout; and lexico-grammatical resources, (i.e., lexicons and grammars) which specify the vocabulary and syntax of text in a particular domain, allowing simple or complex sentences as appropriate.

Our research hypothesis is that an appropriate model of the discourse can be exploited to provide a flexible way to deliver information to various delivery channels. We believe that such a model would ensure that the content organisation and document structure are kept constant, thus avoiding confusing users when they switch from one medium to another. Design for Device Independence: In NLG, a key design feature is to separate the planning of the content (what is being presented) and structure (why it is being presented) of a document with its surface realisation (how it is expressed in language) (cf. [6], [7]). Correspondingly, our approach in the VDP is to decouple content (and structure) planning from presentation planning. The content planner structures the chosen content, producing a discourse tree which represents the structure of the generated document. This is done using a library of discourse plans, which indicates how to achieve the discourse goal, which corresponds to the user’s information need. In presentation planning, the discourse tree (marked with ‘C’ for content in Figure 2) is extended to construct a document presentation for the particular medium selected by the user.

Figure 1: VDP Architecture Our generation architecture is shown in Figure 1. It combines a user model and a discourse model to deliver information from a set of data sources, in a coherent form, to a variety of media including paper, hand-held devices and the web. We employ a discourse generation approach based on a theory of document structure and coherence known as Rhetorical Structure Theory [5]. In this theory, a document is represented as a tree in which nodes and arcs represent the content of the document and the coherence relations between content parts, respectively. This ensures only relevant content is selected and assembled for the user, and further guarantees the coherence of the resulting document.

3.2

Enhancing Usability and Comprehension

As mentioned in the introduction, customising content to the user’s information needs is necessary but not sufficient for effectively addressing a user’s needs. In addition, as Chincholle notes [3], there should be a focus on creating applications that work equally well on different devices and are consistent across devices. To maintain coherence across different device specific presentations, the content and structure of a document must be kept constant while the specificity of each device must be taken into account to ensure usability. This is especially true for small or mobile devices.

Figure 2: Dynamic media delivery The presentation planner makes inferences based on the coherence relationships within the tree to tailor the presentation to suit the medium. For example, the discourse structure might indicate that some text is not of primary importance, and the system can reason that it could be placed in a hyperlink to save space on a small screen device. Re-using the discourse tree across all media offers two advantages: a) The content organisation and the structure of the generated document remain constant, and navigation, derived from this tree, is consistent across the different devices. b) Only the presentation of the tree is re-planned when users change medium. Programming and execution costs are thus reduced. Indeed, customising an application for a device can otherwise be expensive, and, unfortunately, the alternative, designing a single generic application, can result in loss of usability. The solution we propose strikes a good balance, allowing on-the-fly generation for various devices. Provision of Alternative Formats for Different Devices: To address different media, we have looked at the different (but equivalent) ways to visualise

content. Some data are not suitable for presentation on small or mobile devices. For example, large images or graphics should be replaced or removed, and tables, as proposed by Dreier [8], should be organised in lists, allowing users to drill down to the information they require. It is important to convey essential information in the most readable package, to gain time when downloading the data, and to have a better chance of it being read. Previous work on layout constraints (e.g., [9]) have shown that rhetorical text organisation can be used to facilitate text -formatting decisions: for example, a sequence relation can motivate the use of a bullet list. Similarly, our presentation planner uses a library of presentation plans, which indicate how the tree should be displayed. The presentation plans take into account the rhetorical structure, the chosen delivery medium and the type of data to be displayed. The presentation planner offers a selection of appropriate formats.

3.3

Dynamic Content Presentation

Controlling the way information is displayed is sometimes not sufficient to meet the device constraints. In particular, if we want to provide concise information, we also need to control the level of information detail provided. As we are dealing with dynamic delivery, we have to address another problem: the variability of the size of available data. Whether the information resides on websites or databases, it can be dynamic if regularly updated. It is thus difficult to know in advance the quantity of information available at any particular point. For example, in our corporate memory application, the amount of information on specific topics could be excessive or sparse. Thus, depending on the users’ requirements (e.g., the domain they are interested in or the specific research topic they need information about), different amounts of information may be available for display. To deal with this variability, we have identified three factors that affect the generation process. These are integrated in the presentation plans. We now briefly describe each of them in turn. Establishing the Level of Importance: One factor that must be taken into account is the relevance of the data. This can be obtained from two knowledge sources. On the one hand, the text structure gives us information as to the importance of the information provided with respect to the overall presentation. For example, an elaboration relation indicates that the data introduced by this relation provides additional information. Since elaboration information may not be of primary importance, we can decide to include it or not. A system can thus reason about the relations to decide what to omit if space is an issue. On the other hand, a set of data may have already been ranked by importance by the retrieval engine. This can allow a

system to decide which items to present (i.e., the most important ones). However, it is sometimes difficult to establish such a level of importance, in particular when all the items retrieved meet the requirements. Yet, a decision must be made if space is an issue. One solution is to inform users that there was too much data matching their requirements, and consequently that only the N first have been displayed. This solution would also include a mechanism (e.g., hypertext link if that is possible) to access the remaining data. The user then knows that there is more data and can find it if required. Alternatively, one can provide different views over the content, views that match the space constraints, as explained below. Providing Views over the Content: “Shorter is smarter” is generally the buzz phrase when talking about small or mobile devices. This, however, as also stated in Dreier [8], does not mean that everything must be a brief snippet. Instead, one might choose the level of granularity of information to deliver. This is the approach we propose here. To have such control requires the ability to provide different views on the data and to select the appropriate one in relation to other constraints such as the data relevance, the data importance and the delivery medium. The decision about the amount of detail to provide depends largely on the data itself. In some cases, there is little that can be excluded. In many cases, however, it is possible to just give an overview of the whole data, or to simply mention some of the information, omitting additional details. Consider, for example, providing accommodation details in a customised tourist guide. At one extreme, the guide could be too verbose, providing excessive information, including pictures of rooms, details on services, the web site of the hotels if they have one, and so forth. Alternatively, the guide could provide a short description of the accommodation, or simply give the name, the location and the phone number. Users are then able to obtain more information by themselves if they wish to do so. Fitting the Page Length: Finally, the criterion that is probably the most important is the space constraint. Whatever the device used, small, mobile or not, some documents need to fit on a limited number of pages. This page length constraint has different consequences depending on the content of a page. For example, top level pages might contain navigational links outlining the structure of the document, which link to other pages with the actual information. This strategy provides a concise overview of the space of answer information. In order to handle this size constraint without automatically removing part of the data, we must take into account the type of page being produced, while also considering the two previous factors, the relevance of the data and the different views that one can take on the data. In the next section, we present an approach to decide on the maximum possible level of detail to be

presented for each answer, given a finite amount of space. This approach, used in the VDP, integrates the three factors just described.

4 Sample Application The Virtual Document Planner can be applied to a wide variety of scenarios. To demonstrate the technology, we have implemented two applications: Tiddler, which constructs customised travel guides and Percy, which generates customised CSIRO brochures. In this article, we show how space constraints have been applied to the customised CSIRO brochures.

4.1

PERCY: Customised CSIRO Brochures

Percy is a prototype system that delivers a customised brochure about research currently underway at CSIRO. The brochure is generated dynamically in response to a user’s initial query, taking into account a user model. Information about CSIRO is extracted from existing web pages about CSIRO research. The scenario we envisaged was a deployment of stand-alone consoles in public settings such as in a technology museum, in conference exhibitions or in CSIRO laboratory foyers. In such environments, users are in a ‘browsing’ frame of mind but are likely to be inundated with information from a multitude of sources. Hence the need for a concise customised brochure. The user enters his or her profile details via a web form. This profile (see the example of the user model input) contains a list of research fields and application domains in which the user is interested. In addition, the profile also contains information about the user’s profession, and the delivery medium. The user model is used to select the relevant pieces of information to be included in the generated document. Example of user model input: Name: Alex Medium: Paper Domains: Medical Research Topics: Image Analysis Profession: CEO

Example of discourse rule: Effect: KnowAbout(?divisions, ?relevant, PROJECTS) Constraint: none Nucleus: About(?relevant, PROJECTS_ CORE ) Satellites: RST _ Background About(?divisions, CSIRO_ BACKGROUND) RST _ Enablement About(?divisions, CSIRO_ CONTACT ) Each plan operator is composed of 4 parts: an effect, potential constraints, a nucleus and potential satellites. Here, the aims is to provide information about relevants projects for a particular division. The nucleus represents the main information and the satellites represent additional information. In this case, the rule provides background and contact information for the division. The type of relationship between the nucleus and the satellite indicates which role the information plays. The top levels of a discourse tree is shown in Figure 3.

Figure 3: Discourse tree for Alex brochure. Nodes (horizontal lines) represent ‘spans’ of text. Directed arcs represent the RST relation from one span of text to another. Arc labels are printed in italics. The content is retrieved from a database of XML data. The data is extracted from HTML pages. Queries to the database can return a paragraph, a sentence or concept. Figure 4 shows a sample of the DTD used. It specifies how the data is organised and which information is available.

The process of generating the customised brochure is as follows. First, the content planner uses a library of discourse plan operators, which indicates how to achieve the discourse goal of answering the user's information need. The plan operators are constructed based on a study of existing brochures produced by CSIRO communications staff.

Fig ure 4: Extract of the application DTD

The discourse tree is then rearranged for presentation depending on the medium selected by the user. The presentation planner decides how best to express the discourse tree, determining the amount of information to be included in each page and the navigation interface (e.g. hyperlinks) needed. For this application, the presentation planner has the additional constraint of producing the paper version in two pages. Figure 5 and 6 show two different brochures produced: one for Alex as a Chief Executive Officer

and another one for Alex as a student. Since the first brochure (Figure 5) has a business perspective, technical details are kept to a minimum, background information about research group has been omitted and business development information included. The second brochure (Figure 6) is designed for a student interested in the research aspects of the projects. Thus, the brochure contains descriptions about the research groups, publications about relevant research, additional educational resources and opportunities to study with CSIRO.

Figure 5: A Customised CSIRO Brochure for Alex as a CEO

Figure 6: A Customised CSIRO Brochure for Alex as a student

4.2

Space Optimisation in Percy

Our approach to space optimisation combines constraints of granularity, relevance and size, to provide content that satisfies the users’ requirements and the device specificity at once. This approach provides only an approximation and not an exact size allocation, as done in the STOP natural language generation system [10]. Indeed, in our application, we do not have strict size constraints in terms of, for example, exact numbers of characters or lines, as in STOP . This is because PERCY constructs a brochure by re-using text from web sources. These may be written by a human author. It is therefore difficult for PERCY to have total control over text content or style, and document size becomes a real issue. PERCY uses a heuristic function that allows it to apply space constraints to a document as a whole or to parts of it. The heuristic function exploits different sources of information, which specify: (1) how many pages the generated document should be; (2) how long the section being planned should be; and (3) the different views, or level of granularity, we can have. Each view has a length value, which is obtained by prior analysis of the knowledge base. This length value can be in any units (e.g., number of lines of text).

The function performs a top-down allocation of a level of granularity, taking into account how much space (measured in lines) is still available. It carries out the optimisation as follows: given the space constraint for a section, the level of granularity for each answer is initialised to the lowest possible level, corresponding to the least amount of text (and space). This gives a lower bound of the space required to present all the data. If this does not fill the available space, the algorithm now tries to provide the maximal amount of information (i.e., the highest level of granularity), one item at a time. This is repeated until the space allocated exceeds the size specified for that section. At this point, the algorithm tries again with the next possible level. This process continues until we end up again at the lowest level of detail. By implementing our solution for space constraint as an explicit function, we obtain a solution that can handle a large number of situations and constraints. This is in contrast to an implementation in terms of templates or even plan operators, which would have seen an exponential explosion of possibilities represented by templates or plans. For example, in a brochure customised for someone with a business perspective (cf. Figure 5), technical details are kept to a minimum. Since the brochure has a business pers pective, background information about the research groups that perform

the work has also been omitted. However, business development information has been included.

5 Conclusion In this paper, we have presented a Virtual Document Planner which is able to construct dynamic documents for a variety of different media devices, including paper, mobile devices, personal digital assistants and webpages. By representing the generated document with a discourse tree, we can manipulate this abstract representation of the document to decide upon a presentation that suits the constraints of the medium chosen. This planning stage also includes a top down space optimisation to allocate more space in the document to information that is more relavant to the user’s information need. To demonstrate this technology, we outlined Percy, a system that produces personalised brochures about scientific research currently being undertaken at CSIRO.

Acknowledgements This work has been undertaken by the authors in collaboration with Dr Ross Wilkinson, Dr AnneMarie Vercoustre, Dr MingFang Wu of CSIRO and Dr François Paradis while he was at CSIRO.

References [1] Wilkinson, R., Lu, S., Paradis, F., Paris, C., Wan, S. and Wu, M. (2000). Generating Personal Travel Guides from Discourse Plans. In Proceedings of the International Conference on Adaptive Hypermedia and Adaptive Webbased Systems. Trento, Italy, August 2000. [2] Paris, C., Wan, S., Wilkinson, R. and Wu, M. (2001). Generating Personal Travel Gu ides – and who wants them? In Proceedings of the International Conference on User Modelling (UM2001); Sonthofen, Germany, July 13-18, 2001. [3] Chincholle, D., (2000). Designing effective mobile services on small communication devices. Tips and Techniques. Tutorial notes in the Proceedings of OzCHI 2000: Interfacing Reality in the New Millennium. December 4-8, 2000. Sydney. [4] Moore, J.D. & Paris, C.L. (1993). Planning Text for Advisory Dialogues: Capturing Intentional and Rhetorical Information. In Computational Linguistics, Cambridge, MA. Vol 19(4), 651694. [5] Mann, W.C. and Thompson, S.A., (1988). Rhetorical Structure Theory: Toward a

functional theory of text organisation. In Text 8 (3): 243-281. [6] Chisholm, W., Vanderheiden, G. and Jacobs, I. (eds.) (1999). Web content Accessibility Guidelines 1.0, http://www.w3.org/TR/1999/ WAI-WEBCOINTENT-19990505/ [7]

Web Design & Usability Guidelines. http://www.usability.gov/guidelines

[8] Dreier, T. (2000). Making Content Work. Intranet Journal: Wireless. http://www.intranetjournal.com/articles/200010 /puw_10_04_00a.html [9] Hovy, E., and Arens, Y., (1991). Automatic generation of formatted text. In the Proeedings of the 8 th Conference of the American Association for Artificial Intelligence. 92 – 96. Anaheim, CA: AAAI. [10] E. Reiter (2000). Pipelines and Size Constraints. Computational Linguistics. 26: 251-259.