Accessing Information from Globally Distributed ... - Semantic Scholar

5 downloads 216597 Views 386KB Size Report
affordable universal access to multimedia information services on a global scale. .... and integration of growing knowledge from disparate knowledge domains.
Accessing

Information

from

Knowledge

Globally

Repositories (Extended

Alfred Department

Abstract)

V. Aho of Computer

Columbia

Science

University

[email protected].

Abstract This

paper

discusses in the

some

knowledge

tions from

of the

major

way of achieving

sal access to multimedia tributed

information

repositories.

the database

research

technical

obsta-

cost-effective stored

univer-

in globally

Opportunities

for contribu-

community

expanding

and users are influencing

alized communication

market

The

purpose

key

technical

Businesses are integrating

healthcare, merce

email,

are emerging

new

infrastructure.

entertainment, as important

Digital and

goal

areas.

com-

Un-

some

personal

and

issues

that

positioned

of the

if we

are

information the

to

database

attack.

The

from

and

- universal

access

The

remainder

in more

detail.

Scalability

The

most

sign

and

ture

architecture

that

but

in the price/performance of processors and computer memories, as well as exponential growth in the number of people and the amount of traffic carried on the

on a global

scale. striking to it

information

The terabytes

is becoming beginning of the

volume

Tens

information

and

to support

how

much

access

millions of

various

of

people on-line indicate

10 per

cent

Web

kinds

of

Internet.

is growing of

set

services

new

to the

information of

a rich

Estimates than

high

information

number

more

had

current

the

available. U.S.

gets and

information

of the

of 1996

of on-line

rate. of

infrastructure

rapidly

is growing

a signifi-

planet.

multimedia

how

de-

of effective to

connectivity

aspects

are

to

infrastruc-

creation

it was not engineered

infrastructure

staggering

on the

issues

is how

services

near-universal

interactive

population

the

telecommunications

of evolvable

the

problem

these

information

supports

people

reliability,

at the

discuss

communications

for

that

will

technical

of the

most

paper

a scalable

and

current

connected

of knowledge

evolution

browsing

challenging

marks

The

and

implement

fraction

The

integration

of this

3

cant

media

quality

- searching

and

Permission to make digital/hard copies of all or pati of tlis material for personal or classroom use is granted without fee provided that the copies are not made or distributed for profit or commercial sdvantage, the copyright notice, the title of the publication and its date appear, and notice is given that copyright ia by permission of the ACM, Inc. To copy otherwise, to ~publiah, to post on servers or to redistribute to lists, requires specific permission andlor fee.

on

some

be solved universal

come

integration

arrival of the information age. In the past decade we have continued to see order of magnitude improvements

PODS ’96, Montreal Quebec Canada O 1996 ACM 0-89791-781-2/96/06. .$3.50

has a

highlight

must

is well

of multiple

- information

high

the

focus

we discuss

- organization

resolved, however, is the question of who will pay for the new services and how. Rapid improvements in computer hardware, software, are accelerating

is to

that

of affordable

shall

that

- systems

on an

libraries,

electronic

application

technology

U.S. as

- scalability

services whereby

their operations

paper

community

information

communications

the

problems

for person-

homes, and remote locations with equal ease. An increasingly diverse set of new services needs to as a significant fraction of the populace be supported access to the

the

such

Issues

problems

We

research

the

international scale and people want to access their information environments and coworkers from their oifices,

gains

in

or television.

of this

achieve

services.

people can get access to others and the information they need in the form and media they want, any time, any where.

New

- integration

is developing

and information

home

appliance

telephone,

The

2

of this infrastructure.

A rapidly

every

information

computer,

to

from business, technology,

Virtually of

are highlighted.

The goal of developing an information infrastructure is to provide affordable universal access to multimedia information services on a global scale. Strong forces evolution

form

dis-

Introduction

1

edu

Internet.

cles standing

Distributed

at

pages are

a

and

already

available

over

System by

NASA

data

the

This

from

and

a comparable

the

genome.

The

amount

frastructure applications mand,

grow

and

tain

constraints

time

work

the

protocols

of-service vices.

and

At

the

storage

times

level

capable

and

of

the

storage

devices

net-

However,

will

slower The

require

to ensure

information

such

in seconds,

also

to end

organization

from

disparate

The

integration

In

have

data-

are two

secondary

use

own

to

types

involving

dealing

with

needs

and

preferred data

asking

to

text.

These

new

types

model

and

that

access

the

database

for handling

tabular

extend develop

to the

implementation

interactive

textual

multimedia

appropriate tradeoffs

operators for

the

new

new

with

we

are

multiple

forced types.

becomes to

deal

We need

much

We

with a data

The

model

its

own

more

has its own j argon

medicine,

systems difficult

and

This

appropriate

classi-

When

we

disparate

knowl-

arise.

At

the

terms

should

orwe

among

question

problem”.

of

have

and

knowledge

areas?

and

fields

music

works.

problems

aggregated

the

and

from

concepts

In

the

is some-

effect,

ontology

can

We

need

sources.

analyze

the

At

data

wide

complex

when

one

involving

2

like

for

we are

all

of

to

and

have

towards

the

that dis-

a single

creating

systems

process

tools

interconnected as using

different

approaches

sys-

interop-

so application

information

are

the

model

a “wrapper”

from

the

approach and

filter

of formats

Translators place,

convertors

to but will

for In

and

translates

notation

dif-

the

is to use agents information

level, needed another.

the for

the

informa-

integrating

the

media-

disparate

systems

use

for representing convert

Some

it is likely continue

to

ap-

source

called

from

existing

dealing one

each

of the

and conventions are

representation

explored sources.

is defined that

into

representational

variety

being

information

the source

to collect

is taking and

to work

gather

Another

types.

can integrate

like

among

with

from

tors

data

would

is fitted

multimedia

would

access from

as easy to perform

an integrating

does

and

we

heterogeneous

model.

has

level,

information

proach, tion

relational

community

that

even

domains.

information

composite

future

knowledge

an

For example,

“ontology

interfaces

with

varying

has

more

is

their

what

is an

systems

data. problem

of growing

domains

taxonomies,

the

user

Several

data

types. The

for

of

notations,

many

the

make

programs

those

widely

type

research

topics

organizing

heterogeneous

the

erable

video

methods.

for

level,

what

At

and

is a significant

have

Each

methods

understandable

Internet.

Media

and

with

to buffering

to combine

images,

characteristics. operators

developed not

audio,

and

an

are

endeavor

information

called

would

are starting

in

integration

distinctive

describe

times

challenge. documents

types

appropriate

Integration

literature,

schemes

ferent

Digital

of

browsers,

knowledge.

dance,

combined

or

these

of video

types

data appliances

manner

of human

ganizational

storage

containing

delivery

data

as

as a megabyte

creation

types

and

and

edge domains,

users.

of multimedia

data

problem.

interconnect

devices

devices

of Multiple

and objects

multimedia

interfaces, data

knowledge

field

fication

tem.

Integration

user

of organizing

parate

4

information

pleasing

The

chemistry,

need

of data. these

approaches

smooth

new

of

Organization

their

juke

systems

or as large

The

these

esthetically

way

be

as disk

than

new

the

we

thus

tools,

Every

ser-

such

and

algorithms

of

capabilities.

challenging

as electronic

form

multimedia

quality-

will

petabytes likely

systems

as disks.

caching

in cer-

information such

the

in our

Knowledge

video

New

privacy

of holding

of magnitude

such

and

5

within

diverse

like

describe

accommodate

of text

design

displaying

For example,

infrastructure

the most

silos.

devices

the

applications

measured

orders

audio

to meet

is tertiary-storage

or tape

deau-

the

need

we would

can

research.

on

natural.

multimedia

of security

in-

and

to be delivered

the

authoring

quality-of-service

it to appear

for

future,

access

about

video

with

We

byte

we

stored

efficiently

presentation

multimedia

movies

of data.

can

as a single

requires

healthcare.

devices

boxes

and

for

physical

take

three

has

sensitive

the immediate will

stringent

transmission

video

Guarantees for

repositories. that

The

at

video-intensive

Delivering

are needed

commerce

knowledge

and

which

of information

structures

a

information

with

be synchronized

requirements

necessary

by the

types

information with

of video.

in

centers

of information

carried

more

to

a movie,

stored

languages

small

with

are expected

teleconferencing,

puts

needs

of

producing

world

dramatically

on the

audio

and

genome

the

instruction.

information

requirements

The

amount

as

video

combined

of multimedia

for

such

and

be System

size.

of traffic

will

of information oceans

all forms associated different

is expected

and

countries

throughout

to produce

the

will

Information

universities

human

decade,

land,

and

Observing

to relaunched

of a petabyte

of unprecedented

various

dio

of this

atmosphere,

sources

Data

database

end

information

other

EOS

Earth

ofsatellites

a third

about

earth.

the

the

about

annually

The

Internet.

acollection

towards

to generate the

the

(EOS),

need

data

from

standardization for

translators

foreseeable

future.

a

The

classification

cilitates

its

world,

storage

schemes approaches,

and

new

forms

classification

and

keeping

of knowledge.

scheme

in the have

widely

database been

schema

maintaining keeping

classes.

It

approaches

is well

to attack

the

this

in the

infacili-

scheme

trying

to generalize

these

arising

from

the

information

Systems

The

global

systems

Integration information

integration

To

facilitate

the

introduction

challenge

faced

creation

of

in the

components

Open

in-

are critical

for

system

Many

existing

tate

and

systems

tional of lines

but

tems. systems

nents

7

and

help

Information

With

the

with

data

of new

quality

is likely

database

problem data

can

plete.

The

corruption

invalid

systems. grated, source, An

sources

When users

the

can

for

the

open

of

with

integrity. lineage

or data data

problem the

approach

of information

that

can

retrieve

of multimedia

sources

databases

research

areas.

through tools

by content Some

queries of

Designing with

technical

as

Effective

a combination

graphics.

querying

such

values.

multimedia

interactive

maps, correlate

characteristics

answered

search

examine

repositories.

or approximate open

California?” hybrid

and then

tools

a variety

is a significant

Universal

a set

these

kinds

of

challenge.

are inteof’ which

– its

assurance

is to origins,

trace

Access is both a technical

On top of the basic providers could offer appeal to specialized increased willingness what goes into the

incomderived

assigned One

databases,

information infrastructure scope and scale.

with

of reasons,

new information

inconsistent

is

best

to

of

“What

and a societal

one:

everyone at an affordable price? At the technical level, this would involve asking what services could be provided at what cost. The database community has studied system and query optimization questions for years. However, the issues for the global

problem

plagued

when

area

are still

Our final question

to believe. research

from

in

need

searching

or

require

less

decline

cities?”

knowledge

for

shape,

and

may

much of

What facilities and services should be part of the basic information infrastructure that should be available to

compo-

existing

inconsistent,

the

for index-

use

quality

prescribed

interoperable

9

sys-

specifications

a variety

with

algorithms

combinations

causes

as these

user

various

data

be

capabilities

of all

system

already

spread

well

browsing

interoperability.

is used to populate

are confronted

if any,

For

incorrect,

systems

important

information record

be

are

data.”

Sct-

access to

The

tasks the

air

such

for searching

or concept

to get worse.

systems

of “dirty

legacy from

data,

the

process

methods

Quality

explosion

Existing the

these

system

facilitate

techniques

certain

and other

is a need

texture,

them.

definitions

tests

assure

among

color,

may

in

the

images,

imprecisely

existing

is the

organizations

unambiguous From

where

using

facili-

with

“What

for boolean

are

influencing

and

the

tech-

as “What

Computer

stores.

retrieval

questions

and

billions

data

“What

factors

There

interna-

of interoperable

conformance

would

out

the

and

many

interwork

international

are needed.

construct

that

to

aspects

precise,

existing

throw

how and

various

the

transportation

satellite

in-

that

or

as SQL

data

information

public

data

well-defined

contains

we cannot learn

interfaces

we can

Because

of national

Clear,

have

interfaces

infrastructure

must

are addressing

not

expose

integration.

of software,

number

do

do not

information

systems, A

systems

often

such of

effective

data.

efficient

questions

such

textual

concepts:

layer

interoperability

languages

we need

multimedia

number?”

record-based

Answering

the

levels

information

Foundations

has developed

strategies

the physical

telephone

Query

the

has developed

precise

textbook

precise are

evolvability.

terfaces

the

of

Many

biggest and

between

by

of keywords.

we need interoperability

interfaces

Smith’s

cost

sys-

the

services

from

is Jane

to find

community

ing and searching

by engineers.

new

infrastructure

applications.

is

used

accuracy

items.

of distributed

answering

community

Evolution

infrastructure

of new technology,

at all levels to the

and

be the

Browsing

ways

ocean

conventional

tems.

6

for

ence?”

hiertypes

problems

can

determine

information

and

database

niques

of individual

heterogeneous

to

of particular

growing

The

of ap-

view

complete

information

languages

Searching

schema

problem

with

This

query

We need effective of

versions

worth

of distributed

problem

A number

address

of the

different

8

as an important

evolution

versions

and

tegration

to

and

or reliability

applica-

performing

community.

taken

integrating

The

annotations.

conchang-

existing

recognized

and users

accom-

However,

when

been

to

business

can cause

working

fa-

changing

to evolve

changing

old programs

proaches

and

need

working.

problem

archy,

a rapidly

to stop

has

cluding

of information

In

procedures

evolution

ties,

access.

integrated

ditions,

tions

organization

classification

modate

ing the

and and

If

of

to enhanced

information

educational,

and

average

sources,

3

increased

infrastructure information service customized services that would communities with presumably an to pay. The tradeoff is between basic infrastructure and what is services.

is going economic,

individual,

are ones of vastly

then

to and

become health

information

essential well

being

services

for

the

of

the

need

to

be

universal

world

and

affordable.

of

haves

to

information.

access and

enhanced

and

is likely

and

Otherwise,

have-nets The

services,

we’ll

distinction

however,

to be a subject

have

differentiated well

a

their

between

is not

of much

by

basic

understood

debate

in the

near

future.

10

Conclusions

We have

taken

major

technical

of

nations

all

global there

on are

will

have

the

as

their

well.

respective

transcend

individual is

global this

kind

impact.

the

of

research

community

is well

for

property,

total

would

issues shaped

by

communities, various

of sets

investment

is likely over

have

in

that

examination for

of work

have

degree

The

new

positioned

we

to some

critical

of dollars

It is also the kind

must

safeguards

several

infrastructure

of billions

that

that

questions

tradeoffs

information

hundreds

have

out

are being

from

little

affordability/universality

the

include

of the

forces

Since

we

point

communities.

relatively

affordable

questions

communities

but

services,

should

investigated

technical

conflicting

basic

and

of intellectual

all

being

sometimes there

we

the

of expression.

level,

already

of

if people

While

These

of freedom

some

be met

universal

protection

technical

mentioned

of

must

nontechnical

privacy,

guarantees

At

a

issues,

substantial

answered

view

that

infrastructure.

technical

individual and

top-down

information

focused be

a

challenges

in

to

run

the next

enormous

of the

into

decade, economic

the database

research

to conduct.

References [A90]

Alfred

V. Aho.

in strings. Science

[SSU95]

Avi

Algorithms

In Handbook J. van Leeuwen,

Silberschatz,

Unman,

eds.

Mike

of cm NSF Systems

[u82]

Jeffrey tems,

1990. and Jeffrey

achievements

on the Future

May

D. Unman.

patterns

the 21st century.

Workshop

second

Elsevier,

research:

into

Research,

Ed.,

Stonebraker,

Database

and opportunities

for finding

of Theo retzcal Computer

26-27, 1995.

Principles

edition.

Report

of Database

Computer

of Database Science

SysPress,

1982.

[WMB94]

Ian

H.

C.

Bell.

Witten,

Reinhold, [CSTB94]

[NRC94]

The

New York,

1994.

the Press,

Changing

tions/Information search

Moffat,

Gigabytes,

Realizing Academy

Alistair

Managing

Council,

Znfm-matiorz

and

Timothy

Van

Nostrand

Future

NationaJ

1994. Nature

of the

Infrastructure 1994.

TelecommunicaNationaJ

Re-