efficient retrieval of complex objects: query processing in a ... - CiteSeerX

0 downloads 0 Views 110KB Size Report
accessed by Information storage and Retrieval Systems (IRS), utilizing ... systems could be based on the kinds of data stored in the systems as well as on the ...
EFFICIENT RETRIEVAL OF COMPLEX OBJECTS: QUERY PROCESSING IN A HYBRID DB AND IR SYSTEM

Junzhong Gu, Ulrich Thiel, Jian Zhao GMD-IPSI (Integrated Publication and Information Systems Institute) Dolivostraße 15, D-6100 Darmstadt, FRG email:{gu,thiel,zhao}@darmstadt.gmd.de

Abstract Information retrieval systems are currently being developed to represent and manipulate complex multimedia objects. Such objects contain two types of components, i.e. structured components (e.g. of the type integer, real, fixed-length string, etc.), and unstructured components (e.g. text, images, sounds). Relational database systems (RDBS) are often used to store structured data and retrieve it via exact matching, while unstructured data is organized into inverted files, and accessed by Information storage and Retrieval Systems (IRS), utilizing indeterminate matching. The difficulty is how to fill the gap between the RDBs and inverted files, as well as the gap between the RDBMSs and IRSs based on inverted files. We describe an approach to integrating the two types of systems and the two different types of data.

1 Introduction A decade ago, a clear distinction between database (DB) systems and information retrieval (IR) systems could be based on the kinds of data stored in the systems as well as on the techniques employed to access them. While DB systems were appropriate for large amounts of structured factual data, IR systems were “normally used to handle bibliographic records and textual data”[SM, p. xi]. The document representations were unstructured strings (or semi-structured records). However, most systems based on the string matching approach show only a limited retrieval capability, especially when applied to full-text databases. As a consequence, alternative IR models employ not only different access methods, but also structured document representations, e.g. [CKT], thus approaching an integration of text and fact retrieval, e.g. [FH]. In the database field, a rising need to cope with unstructured data comes along with the development of multimedia information systems, where not all data can be structured directly, e.g. textual passages, images, sounds, etc. Complex documents, as well as multimedia documents, comprise both structured and unstructured data. Therefore, the structured and unstructured data unavoidably coexist in one information system. In this case, the appropriate DB/IR system must possess the capability to store, control and retrieve both types of data. It should provide efficient access methods based on exact In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

matching for structured factual data 1, as well as powerful facilities to handle the uncertainty inherent to searching for less structured data. The latter component should be able to perform a ranking of the retrieved items according to a relevance function. Information storage and Retrieval Systems (IRS), conventionally based on inverted files with an index mechanism, are widely used for processing unstructured (or semi-structured) bibliographic records and textual data. “Virtually all the commercially available systems are based on inverted file design” [SM]. Advanced techniques such as probabilistic model, document and term clustering, automatic indexing, etc, are employed in some prototypical IRSs. Such techniques are very useful in handling less structured data like text, but do not aim at handling structured data efficiently. Database management systems (DBMS), especially relational database management systems (RDBMS) are widely used to store, control and retrieve structured data. However, most RDBMSs lack the ability to handle less structured data like text, images, etc. Long fields, textual fields and even binary fields are featured now in some RDBMSs to enhance the ability to handle less structured data. But the retrieval function is still based on exact matching, i.e. the performance in retrieving relevant items is not sufficient. In sum, RDBMSs provide mature techniques for handling structured data 2, while in the area of IRSs probabilistic approaches provide appropriate means for accessing textual parts of complex documents. Since the problem of integrating DB and IR functions in one system is not only an academic topic, but is also considered important for applications, e.g. office systems, considerable efforts towards an integration were made [for a discussion of the related work in the db area cf. AH, HMS]. According to [HMS] five approaches to integrating IRS and RDBMS can be distinguished: The first approach is called the standard RDBMS approach. It employs standard RDBMS directly to manipulate both structured and unstructured data, even though most RDBMSs do not provide the appropriate functionality to represent and manipulate unstructured data [HMS]. The second extended RDBMS approach extends traditional RDBMS with new data types (e.g. text) that are suitable to represent the unstructured data and processing functions on those new types, e.g. Sybase3, ORACLE/TEXT. The third approach is to extend an IRS to include the relational model for the presentation and manipulation of structured data. The fourth is called the external integration approach, such that an IRS and a RDBMS are embedded in a common environment, e.g. by adding a common query interface. Finally, the full integration approach combines the functionalities of both RDBMS and IRS into a hybrid data management system. The standard RDBMS approach does not really feature functions for manipulating unstructured data. The extended RDBMS or IRS approach requires the augmentation of an existing RDBMS or IRS, which in most cases is a standard commercial system. However, the efficiency and the power of the extension is restricted, since it cannot be as efficient and powerful as an independent facility. In general, there will be an imbalance in the ability to store and manipulate different types –––––––––––––––––––––– 1. Of course, other features of state-of-the-art DBMS, e. g. transaction management, are also highly desirable. However, in this paper, we will focus on aspects of query processing. 2. In the long run, object-oriented DBMSs will provide a better basis for an integration. However, since most OODBMSs pertain to SQL-style query languages, we decided to use a relational system in our first prototype. 3. Sybase is a relational database management system of Sybase Inc. In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

of data at same time, i.e. either the system will be more powerful for structured data (in the case of an extended RDBMS) than for less structured data or, in the case of an extended IRS, vice versa. The full integration approach, of course, is attractive for its complete integration. However, much effort is required to develop a hybrid data management system built according to this approach. Therefore, in our current project we decided to utilize existing RDBMSs (e.g. Sybase) and IRSs that are already powerful enough to represent and manipulate structured and unstructured data, respectively. The external integration could meet our requirements well, but this approach considers a RDBMS and an IRS as two parallel components that are loosely connected both with respect to the data communication and control flow. By combining the full and external integration, we propose an alternative approach to building an integrated IR system, called embedded full integration. It embeds the functionality of an IRS into a RDBMS, integrating inverted files with a RDB. In this paper, two major issues are addressed: (1) how to integrate a relational database and inverted files into an integrated IR database, and (2) how to integrate a RDBMS and a probabilistic IRS to an integrated IR system. Section 2 introduces a sample domain, Section 3 describes our approach to efficient query processing, Section 4 discusses the implementation. In Section 5 the user interface and examples are presented. The last section is the conclusion.

2 Searching Complex Objects: A sample Domain A typical example of mixing structured and unstructured data is academic information, e.g. the CORDIS data provided by ECHO (European Commission Host Organization) [ECHO]. CORDIS (Community Research and Development Information Service) provides information about European Community (EC) Research and Technological Development (RTD) programs and related matters for organizations and individuals.4 These data can be accessed via a conventional full-text retrieval interface. However, in this domain the users typically do not search for documents, but for relevant information objects, e.g. research programs or projects which fulfill certain conditions, together with factual data, e.g. addresses, numerical data, e.g. project duration, and textual passages, e.g. a project’s objectives. Given the functionality of a full-text retrieval system, the users must switch between the databases, and combine data sets retrieved from different isolated databases in order to obtain the complete information. From the users’ perspective, however, it seems to be highly desirable to be able to formulate a complex query covering all relevant aspects of the information objects of interest, no matter in which particular RTD-DBs the data is stored. Of course, the query formulation has to be assisted, e.g. via a form-based interface [ZKM]. Another prerequisite to a system capable to process such queries directly is a conceptual data model which defines the object classes of the domain of interest and their relationships. The conceptual model can be verbalized as follows: The EC research activities are grouped by programs (such as IMPACT, ESPRIT 2, etc.); each program contains a group of projects as members. Different EC commission services are responsible for different programs. Programs and projects have their contractors, i.e. primary contractors –––––––––––––––––––––– 4. The corresponding databases are provided as CORDIS databases, e.g. RTD-Programmes, RTD-Projects, RTD-Acronyms, RTD-Comdocuments, RTD-Publications, RTD-Results, RTD-Partners and so on. In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

and member contractors, called organizations located in some countries and some cities. There are some persons in the role of contact person of program, project, or commission service. The publications, including reports, articles, and conference papers, are issued by organizations, programs and projects, with some person(s) as author(s). The programs, projects and publications can be classified to some subjects.

Figure 1 provides a graphical overview of the conceptual model:

Program member_of

responsible_for

member_contractor primary_contractor member_of

Organization

is–a

Subject

in

Country component_of

subject_of

Article

City

Keyword

keywords_of

Report

in

locate_in

in_language_of

issued_by

Language

Person

locate_in issued_by

Contract_type

issued_by

cotract–type_of

subject_of

subject_of

contact_with

contact_with

contact_with

participate_in

Commision_service

Project

author_of is–a is–a

Publication

is–a

Conference_paper

Figure 1. CORDIS Conceptual Model

Using a RDBMS, some CORDIS data can be represented using strong types (e.g. integer, real, etc.), a facility not provided by the original CORDIS service. For instance, a project is described in the RTD-Projects DB by its acronym, title, objective, general description, start-date, end-date, duration, contact person, update-date of the data, etc. An advanced query processing requires that data about start-date, end-date, update-date are of Date/Time type, the duration of the project is an Integer value, the contact person of the project given by a person record, etc. But some are, e.g. the objective of a program, always in the form of free-text which cannot be automatically structured. An indeterminate (approximate and imperfect) searching of such nodes within a relational database system is not possible. It is preferable to store them in inverted files and apply an indexing mechanism. Therefore, we have to integrate both structured and unstructured (or semistructured) data in a hybrid DB and IR system. At GMD-IPSI, we have developed HYDRA, an experimental HYbrid DB and IR system suitable for Academic information (as well as multimedia information), using CORDIS data as an example. To combine the advantages of strong typed data management systems (e.g. RDBMS), which process structured data efficiently, and IRSs, which are superior in their capability for handling textual data, we integrated a RDBMS and a probabilistic IRS – the INQUERY system [CCH] – In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

into our system.5

3 Efficient Query Processing: the HYDRA approach The search for relevant information in the CORDIS domain, as well as in similar applications, requires a complex man-machine dialogue, in which the user iteratively enters queries and browses the retrieved data items, in pursuit of certain strategies. Hence, our approach to IR is inherently interactive, therefore putting a stress on the efficiency of the DB access. In this situation, recall is less important than response time, since the user will – in the normal course of interaction – submit a sequence of queries to the system, in which she may vary the constraints imposed on the factual attributes of the information objects deliberately in accordance with her strategy.7 As a consequence, we do not intend to achieve equally high recall performances for the textual as well as the factual data [cf. CKT, FH]. Instead, we aim at combining the abilities of both exact matching and probabilistic retrieval to enhance response times. For instance, a user can submit the following query: Find some collaboration partners who have since 1989 participated in EC projects which address topics related to information retrieval, document databases, or hypertext; industrial partners or other non-university institutions are preferred. If a partner is found, retrieve information about the contact person. Using the notation as in [CKT], the query can be described in relational calculus terms, {p|r(p)}, p as a variable representing related objects and r(p) as a Boolean combination of propositions involving the attributes of a complex object class. Viewing the nodes in Figure 1 as object classes (in the following in bold face, e.g. person), the above query can be noted as: Find all p detail in person where p member_of org where org in organization where (org type_of ’industry’ or org type_of ’non-university institution’) and ∃ proj in project where (org member_contractor_of proj or org primary_contractor_of proj) and proj.start_date >= 1989 and proj implies (’information retrieval’, ’document databases’ or ’hypertext’)

Here, p is the required contact person(s), r(p) is a Boolean combination of terms, which are propositions, e.g. proj.start_date >= 1989, p member_of org, etc. The identifiers in italics (e.g. member_of) represent predicates with typed variables, e.g. p member_of org is true if p is a member of organization org. –––––––––––––––––––––– 5. The research reported here is part of cooperation project between GMD-IPSI and the University of Massachusetts at Amherst. 6. Appropriate means to provide a cooperative dialogue guidance in the knowledge-based interface MERIT, which provides access to the CORDIS-DB of HYDRA, have been proposed in [STT, BCST]. However, a more detailed discussion of this topic is beyond the scope of this paper. 7. In text retrieval, the justification to present users “near misses” is based on the experience that they mostly result from differences in spelling etc. In advanced systems, also semantic relations are exploited. However, in fact retrieval the relevance assessment of “similar” data is often based on pragmatic decisions. For instance, whether a project that started in 1989 is equally relevant as a 1991 project, when the user asked for projects launched in 1990, depends heavily on the user’s plan or task. In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

In the CORDIS domain we distinguish between structured data types which are accessed via exact matching (e.g. “p member_of org”; “org type_of ’industry’”; “org type_of ’non-university institution’”; “org member_contractor_of proj”; etc.), and the textual data type which requires the computation of a relevance value. For example, the part of the sample query “proj implies (’information retrieval’, ’document databases’ or ’hypertext’)”, may result in a project description which doesn’t contain any of the phrases as listed above, but contains a relevant phrase, e.g. “information management”. For complex objects mixing structured and unstructured data, such as CORDIS data, it is better to provide an IR system integrating the facilities of a RDBMS and an IRS. Therefore, compared with the IR system model presented in [BC], our approach is based on two different parts of a query, as illustrated as in Figure 2. The left side of the figure describes the data resources and Person with Goals, Tasks, Intentions, etc.



← ←

Data Producer

Information Needs or Anomalous State of Knowledge

Hypertext RDB Inverted Files Scheme

Indeterminate Query





Indexing

Representation Determinate Query



Comparison ←



← Matched Information ←

Comparison ←





Retrieved information (incl. full text)



Use and/or Evaluation ←

← Reformulation

Figure 2. Hybrid DB/IR System Model database construction. In CORDIS, the sample domain described in Section 2, the data provided by the CORDIS Service in Luxembourg (CORDIS-L) is converted to a relational database (RDB) and some subordinate data collections based on inverted files. Since the conceptual schema (cf. Figure 1) induces relational links between textual (and factual) data items, we can regard them as being organized in a hypertext [cf. GT]. In the following, we will therefore refer to the DB containing the converted data as HT-CORDIS. The right side describes the IR activities: a person with some goal and intention related to a work task finds these goals cannot be attained because resources or knowledge are somehow inadequate. The user submits a query as a representation of the information need to the IR system. It is interpreted into two parts – a determinate and an indeterminate query. The indeterminate query involves textual data which is stored and managed by an IRS based on inverted files with an indexing mechanism. The comparison of the indeterminate query and the indexing leads to the selection of possibly relevant retrieved data, which is typed to records and sent to the relational database. The determinate query is related to the strong typed data stored in the RDB. The comparison of the data in the RDB and the determinate query leads to the retrieved information, which is used by the user if it satisfies his goal; otherwise, it is evaluated by the user who makes modifications (to query and/or the hypertext) In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

and starts another query. In HYDRA, we structure the CORDIS data as much as possible, such that most of it is strongly typed. The nodes in HT-CORDIS are classified in the types of integer, real, date/time, fixedlength string, variable-length string (, =, 0.5 and publication.pga9=’ESPRIT 2’

In this query, detail information about publications which belong to the program ’ESPRIT 2’ and relate to the keyword ”information retrieval” with relevance larger than 0.5 are sought. The result can also be ordered (in ascending or descending order) by the probabilities of relevance, such as: select * from publication where probability_rate(”information retrieval”, publication)>0.5 and publication.pga=’ESPRIT 2’ order by publication_inq.bvalue asc –––––––––––––––––––––– 9. publication.pga is an attribute of the relation publication, represent acronyms of the related EC programs, e.g. ”ESPRIT 2” here. In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

Relatively, ranking search in ProcINQUERY takes more time than searching strong typed values (e.g. integer, real value, etc.) in Sybase, therefore the ranked data with belief values (larger than 0.5, here), together with the relevant key in the relation publication, are stored in Sybase as temporal data, which is ordered by the relevance (>0.5, ≤1), after the execution of the related INQUERY query. The IR user can browse along the ranked data and reformulate new queries.

5 User Interface and Examples In this section, some aspects of the user interface to HYDRA are discussed. We combine a formbased user interface – often used for relational databases – with a free-text input field, such as is commonly employed for full-text and indeterminate queries. In fact, most user interface services of HYDRA are part of the TORI system [ZKM]. A query form consists of two parts: the INQUERY part and the RDB part. The INQUERY part contains a free-text input field and a relevance (belief value) setting field. The RDB part is a set of attributes of RDB relations. Query forms are generated automatically from a high-level form definition [ZKM]. Figure 7 shows a form definition which embeds INQUERY. Those form definidefine form project_form ( project.program_acronym , project.program_reference, project.project_acronym , project.title , project_objectives , project_description , project.startdate , project.enddate , project.duration , project.status ) with INQUERY

Figure 7. Form definition tions play two roles: they are part of the user interface description and are used later for the construction of queries from forms. The RDB part in a query form can be defined as part of a relation (view) or is composed of attributes that are joined from several relations (or views). The corresponding presentation of this form definition is shown in Figure 8. As shown in Figure 8, a list of operators (#and, #or, #not, #sum, etc.) is presented as an option menu from which the user can select one of the items. The operators interpret the relationships between the key words in the INQUERY free-text field [CCH]. With two steps, the system accomplishes the query that is specified in Figure 8. •

An INQUERY query is composed from the query form: #max(’information retrieval’,’document database’), INQUERY_DB=INQUERY-project

This query is sent to ProINQUERY which stores the results into a temporal relation project_inq in Sybase. •

Then a Sybase SQL is built:

In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

Figure 8. A form-based query in HYDRA select project.program_acronym, project.program_reference, project.project_acronym, project.title, project_inq.bvalue from project, project_inq where project.program_acronym = ”ESPRIT 2” and project_inq.bvalue > 0.6

In this SQL statement, the results of INQUERY which have been stored in project_inq are joined with the database part in the query form. In other words, the query on the database part is constrained by the result space of INQUERY. The number of the final results is 26. They are displayed in Figure 9. As a default, the results are ranked according to the relevance value which is calculated in INQUERY. The user can browse the result set with form or table format. For entering free-text queries the user does not need deep knowledge about the retrieval domain and the information organization. However, whenever the user has such knowledge, structureoriented querying (e.g. form-based) can facilitate the user’s search both in terms of efficiency and directness. Interwoven free-text and structure-oriented queries can make the user benefit from the advantages of both exact and inexact queries. In fact, iteration in information retrieval is particularly important in situations in which the user doesn’t know what is available or what differentiations to make. For example, after the user browses the result set obtained by combining INQUERY and database retrieval, she may understand the meanings of the database attributes in the query form. In order to get more information, the user can specify the next query only in the database part. Therefore, in each query form, there is a toggle button which allows the user to enable (appear) or disable (disappear) INQUERY part. In Figure 9, the user can take a hit from the result In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

Figure 9. Browsing result according to relevance value

set of the last query to the query form as an initial specification of the next query. Figure 10 shows the next query’s specification without INQUERY.

Figure 10. Reformulation a form-based query without INQUERY

In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

6 Conclusion Compared to an IR system simply based on a relational database system (as well as the RDBS extended with textual type), our system is more efficient in indeterminate full-text searching. Sybase, for example, provides a special data type – text which stores up to 2 31–1 bytes and some functions for string processing, such as: patindex(”%pattern%”, column_name), which returns the starting position of the first occurrence of the pattern in the specific column (i.e. attribute) of a relation, whereby % is a wildcard character; charindex(”char_expr”, expression), which returns the starting position of the specified char_expr, whereby expression is usually a column name; etc. [Sybase]. It can be viewed as a system implemented with the Extended RDBMS approach. But it can only process exact pattern matching, such that, in Sybase, “information retrieval” and “Information Retrieval” are viewed as different phrases (i.e. different patterns), even though they are the same for IR users. It is not suitable for retrieving textual data, because text has special characteristics which do not allow it to be handled easily by conventional databases, and the structure of text cannot be easily captured by a record-based data model. On the other hand, textual data streams are the only representation in most IRSs, and are usually organized as inverted files and indexed, even though some of the data may be structured. In such IRSs, inexact matching is applied to performing the query operation on such textual data, and has been proven neither efficient nor suitable to retrieve the structured part. A typical example is to retrieve numerical data with an inexact query: e.g. a query about EC projects restricted to a determined duration. Using the embedded full integration approach, the DB/IR system supports an extension of SQL. The next version of the system, which is currently under development, will be based on an objectoriented database system – VODAK developed at GMD-IPSI. It is suitable for multimedia document information retrieval and is based on an entirely object-oriented data model. Exact matching for strong typed data and inexact matching for textual data will both be directly encapsulated in the correspondent objects as methods.

Acknowledgements: Our most heartfelt thanks go to our colleagues A. Müller and H.-U. Hoppe. The discussions with them and their advice have been invaluable. The authors also want to thank Prof. B. Croft and his team from the University of Massachusetts for making the INQUERY system available.

References [AH]

Amstutz, H.; Holländer-Thönssen, B.: Elektronische Ablage und Archivierung auf der Basis eines Database Management Information Retrieval Systems: – Die Bedürfnisse – Das Angebot – Die Realität, Proc. GI/GMD-Workshop Information Retrieval (Fuhr, N., Ed.), Darmstadt, June 23-24, 1991, Berlin et al.: Springer, 1991, pp. 78-93.

[BC]

Belkin, N. J. and Croft, W. B.: Information Filtering and Information Retrieval: Two Sides of the same Coin? CACM, Vol. 35, No. 12, 1992, pp. 29-38.

In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).

[BCST]

Belkin, N.J.; Cool, C.; Stein, A.; Thiel, U.: Scripts for Information Seeking Strategies, Proc. AAAI Spring Symposium ’93 on Case-Based Reasoning and Information Retrieval, Stanford Univ., March 1993.

[CCH]

Callan, J. P.; Croft, B.; Harding, S. M.: The INQUERY Retrieval System, Proc. 3rd Conf. on Database and Expert Systems Applications, Sept. 1992.

[CKT]

Croft, W. B.; Krovetz, R.; Turtle, H.: Interactive Retrieval of Complex Documents, Information Processing & Management, Vol. 26, No. 5, 1990, pp. 593-613.

[CN]

Croft, N; Neuhold, E.: Towards the Integration of Text and Database Systems, Internal Report, Univ. of Mass, Amherst, 1991.

[ECHO] Amt für Amtliche Veröffentlichungen der Europäischen Gemeinschaften: ECHODatenbasen und Dienste, CD-53-88-762-DE-C, Luxembourg, 6. 1990. [FH]

Fuhr, N.; Hoffmann, T.: A Prototype for Integrating Probabilistic Fact and Text Retrieval, Proc. ISI ’91, Wissensbasierte Informationssysteme und Informationsmanagement (Killenberg, H.; Kuhlen, R.; Manecke, H.-J., Ed.), Konstanz, Universitätsverlag, 1991, pp. 94-103.

[GT]

Gu, J.; Thiel, U.: Automatically Converting Free-Text to Hypertext, Hypermedia – Proc. Hypermedia ’93 (Frei, H.P. & Schäuble, P., Ed.), Zurich, March 1993, Berlin et al. Springer, 1993, pp. 220-231.

[HMS]

Hoogeveen, M.; van der Meer, K.; Sol, H.: The Integration of Information Retrieval and Database Management Facilities in Support of Multimedia Information Work, Proc. ISI ’92, Mensch und Maschine – Informationelle Schnittstellen der Kommunikation (Zimmermann, H.; Luckhardt, H.-D.; Schulz, A., Ed.), Konstanz, Universitätsverlag, 1992, pp. 260-274.

[LS]

Lynch, C. A; Stonebraker, M.: Extended User-Defined Indexing with Application to Textual Database, Proc. VLDB, 1988, pp. 306-317.

[Mo]

Motro, A.: VAGUE: A User Interface to Relational Database that Permit Vague Queries, ACM Transactions on Office Information Systems, Vol. 6, No.3, July 1988, pp. 187-214.

[SM]

Salton, G.; McGill, M. J.: Introduction to Modern Information Retrieval, New York et al. McGraw-Hill, 1984.

[SS]

Stonebraker, M. et al: Document Processing in a Relational Database System, ACM Transactions on Office Information Systems, Vol. 1, No. 2, April 1993.

[STT]

Stein, A.; Thiel, U.; Tißen, A.: Knowledge-Based Control of Visual Dialogues in Information Systems, AVI’92, Proc. of the 1st International Workshop on Advance Visual Interfaces, Rome, Italy, May 27-29, 1992, Singapore, World Scientific Press, 1992, pp. 138-155.

[Sybase] Sybase Inc.: Sybase Commands Reference, Release 4.0, Sybase Inc., 1989. [ZKM]

Zhao, J.; Kostka, B.; Müller, A.: An Integrated Approach to Task-Oriented Database Retrieval Interfaces, Proc. of the Workshop Interfaces to Database Systems 1992, Glasgow (UK), Juli 1-3, 1992.

In: G. Knorz; J. Krause; C.Womser-Hacker (eds.): Proc. of the 1st German Natl. Conf. on Information Retrieval (Regensburg, Sept. 1993).