Modern Trends in Databases

19 downloads 1013 Views 843KB Size Report
Tomorrow BAD Modern Databases… ... We have databases that hold a high amount of data, in the order ... Distributed database management system ( DDBMS).
G64DBS Database Systems Lecture 17 Modern Trends in Databases

Modern Databases • GOOD Modern Databases • • • •

Parallel Processing Distributed DBs Web-based DBs Multimedia DBs

• Tomorrow BAD Modern Databases…

Tim Brailsford

Other Sorts of Database • • • •



There are several other flavours of DB in use today

Relational model

Distributed DBs

SQL

Object DBs

Design techniques

Multimedia DBs

Transactions

Temporal DBs



Logic DBs

Many of these topics relied on relational concepts

Benefits of Parallel Databases ! Improves Response Time. INTERQUERY PARALLELISM : It is possible to process a number of transactions in parallel with each other.

!

• •

Improves Throughput. INTRAQUERY PARALLELISM: It is possible to process ‘sub-tasks’ of a transaction in parallel with each other.



Why Parallel Databases? More and More Data!



We have databases that hold a high amount of data, in the order of 1012 bytes:

• 10,000,000,000,000 bytes! Faster and Faster Access!



We have data applications that need to process data at very high speeds:

• 10,000s transactions per second! ONE Processor just won’t do it!

SPEED-UP

Number of transactions/second



We have looked mainly at relational databases

1. Parallel Databases?

Linear speed-up (ideal)

2000/Sec 1600/Sec

Sub-linear speed-up

1000/Sec

5 CPUs

10 CPUs

Number of CPUs

16 CPUs

DUMB

• Parallel processing is clearly useful. How has this affected things?

• Mainframes • Client-Server Architecture • Parallel Systems • Distributed Systems

DUMB

DUMB

The client/server architecture is a general model for systems where a service is provided by one system (the server) to another (the client)

Server

Hosts the DBMS and database Stores the data

PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC

Client/Server Architecture

Client/Server Architecture



MAINFRAME COMPUTER

NETWORK CONNECTION

Some History

TERMINALS

CLIENT 1

CLIENT 2

Client User programs that use the database Use the server for database access

SERVER:

DBMS

DB

CLIENT 3

DATA LOGIC

PRESENTATION LOGIC BUSINESS LOGIC

2. Distributed Databases •



A distributed DB system consists of several sites

• • •

Sites are connected by a network Each site can hold data and process it It shouldn’t matter where the data is - the system is a single entity

Distributed database management system (DDBMS)

• •

A DBMS (or set of them) to control the databases Communication software to handle interaction between sites

Types of Distribution • There are two basic options with which we will be concerned when it comes to distribution:

• •

Distributed processing Distributed data

• With one exception (distributed data, nondistributed processing), neither of these necessarily implies the other

What is a Distributed Database?

Distributed Processing: CLIENT

CLIENT

CLIENT

CLIENT

WIDE AREA NETWORK

New York:

Moscow:

CLIENT

CLIENT

CLIENT

CLIENT

London:

CLIENT

CLIENT

CLIENT

CLIENT

Beeston:

CLIENT

CLIENT

CLIENT

CLIENT



A distributed database system is a collection of logically related databases that co-operate in a transparent manner.

• •

There should be ‘location independence’

DBMS

Distributed Processing:

DBMS

CLIENT

CLIENT

CLIENT

CLIENT

Moscow:

DBMS

CLIENT

CLIENT

CLIENT

CLIENT

WIDE AREA NETWORK

New York:

i.e.- as the user is unaware of where the data is located it is possible to move the data from one physical location to another without affecting the user.

London:

CLIENT

CLIENT

CLIENT

CLIENT

DBMS



Reduced Communication Overhead: Most data access is local, less expensive and performs better.



Improved Processing Power: Instead of one server handling the full database, we now have a collection of machines handling the same database.



Removal of Reliance on a Central Site: If a server fails, then the only part of the system that is affected is the relevant local site. The rest of the system remains functional and available.

Beeston:

CLIENT

CLIENT

CLIENT

CLIENT

DBMS

Reasons for Distribution •

Expandability : It is easier to accommodate increasing the size of the global (logical) database.



Local autonomy : The database is brought nearer to its users. This can effect a cultural change as it allows potentially greater control over local data .

Reasons for Distribution

Reasons Against Distribution



Complexity (distributed database systems, especially, are considerably more complex than centralized or client/server ones)



Security (more opportunities for protection failure or attack)

• • •

Software & management costs Lack of standards Data integrity more difficult to maintain

Transparency •

Fragmentation

To obtain the benefits of distributed data without incurring added operational complexity, distributed database systems should be transparent

• •

• When you split data up over separate locations you have to make a choice:

• Due you split up the rows of a table, or the

What is “transparency”?

columns of a table?

• These are horizontal and vertical

A transparent distributed database system would look, to a user, just like a centralized database system.

New York:

Horizontal Fragmentation

fragmentation respectively.

Vertical Fragmentation New York:

branch_name

account_number

balance

branch_name

customer_name

tuple_id

Hillside Hillside Hillside

A-305 A-226 A-155

500 336 62

Hillside Hillside Valleyview Valleyview

Lowman Camp Camp Kahn

1 2 3 4

Beeston:

Beeston:

branch_name

account_number

balance

Valleyview Valleyview Valleyview Valleyview

A-177 A-402 A-408 A-639

205 10000 1123 750

Transactions in Client/ Server Systems

account_number A-305 A-226 A-177 A-402

balance

tuple_id

500 336 205 10000

1 2 3 4

Transactions in Distributed Database Systems



Transactions in a single server environment are simple – the same as in a centralized system





Transactions in a multi-server system are serveroriented.

Transactions in a distributed database system may be either global or local





That is, a single transaction cannot involve multiple servers because the servers operate completely independently of each other

Support for global transactions is provided by the DDBMS (Distributed DBMS) in a true distributed database system.



This is not a simple task.

Distributed vs. centralized



Both have pro’s and con’s… In other words, if you choose a distributed database you are spreading out lots of small headaches rather than having one central migraine.

Client (Browser)

SQL query

Web Server



anyone?

HTTP request

MSQL query

MS SQL Server

SQL result

Client sends a request for a page to the web server

Web-based clients

Web server sends SQL to database

Web server Database server(s)

The web server uses results to create page The page is returned to the client

Web-based Databases •

Disadvantages:

Advantages:

• •

Security can be a problem if you are not extremely careful

World-wide access Internet protocols (HTTP, SSL, etc) give uniform access and security



Database structure is hidden from clients



Uses a familiar interface

Interface is less flexible using standard browsers Limited interactivity over slow connections

Corporate Style:

Internet Explorer

ASP .NETHTML page

Typical operation:

Web server serves pages to browsers (clients) and can access database(s)

HTML page

Database Server SQL result

Microsoft

Database access over the internet

• • •

Web-based Databases HTTP request

3. Web-based Databases

HTTP request

PL/SQL query

Oracle

JSP

SQL result

Internet Explorer

HTML page

Open Source: HTTP request

SQL query

PHP

Even more choice HTTP request

Firefox SQL query

HTML page

PHP

Firefox

HTML page

Perl MySQL

PostgreSQL

SQL result

SQL result

Ruby Python

Web Based Approaches

4. Multimedia Databases



(Microsoft) MSQL + JSP .NET (Open Source) MySQL + PHP

• • • • • •

(Oracle) PL/SQL + ? (Open Source) PostgreSQL + ?





The scripting language generates the query depending on what you the web user requests. It then takes the results and formats them into HTML + Javascript.

Querying Multimedia Databases Metadata searches



Information about the multimedia data (metadata) is stored



This can be kept in a standard relational database and queried normally



Limited by the amount of metadata avalilable

Content searches The multimedia data is searched directly Potential for much more flexible search Depends on the type of data being used Often difficult to determine what the ‘correct’ results are

Multimedia DBs can store complex information

They can be used in a wide range of application areas

Images

Entertainment

Music and audio

Marketing

Video and animation

Medical imaging

Full texts of books

Digital publishing

Web pages

Geographic Information Systems

Uses a familiar

Metadata Searches •

Example - indexing films we might store

• • • • • • •

We can then search for things like

Title

Films starring Kevin Spacey

Year

Films directed by Peter Jackson

Genre(s) Actor(s) Director(s) Producer(s) Keywords

Dramas produced in 2000

We don’t actually search the films themselves.



Metadata Searches Advantages:



Disadvantages:

Metadata can be structured in a traditional DBMS

Metadata can’t always be found automatically, and so requires data entry



Metadata is generally concise and so efficient to store

It restricts the sorts of queries that can be made



Metadata enriches the content



Content Searches An alternative to metadata is to search the content directly

• •

It is a richer source of information but harder to process



Image retrieval is hard.



It is often not clear when two images are ‘similar’



Image interpretation is unsolved and expensive



Different people expect different things

http://wwwqbic.almaden.ibm.com/cgi-bin/stamps-demo

But its more subtle than that… •

This is a common theme in all automated information management. But how on earth does a computer know what red means?



It cannot. It has no senses. It is impossible because it is not situated.



This is why we should not be concerned with AI. We must leave the semantics of what predicates mean to humans.

• i.e. Avoid Inference Engines like the plague.

Find images similar to a given sample Hum a tune and find out what it is Search for features, such as cuts or transitions in films

Content-Based Retrieval

Content-Based Retrieval QBIC™ (Query By Image Content) from IBM - searches for images having similar colour or layout

Multimedia is less structured than metadata

Example of content based retrieval



Do we look for? Images of roses Images of red things? Images of flowers? Images of red flowers? Images of red roses?

5. Yet more databases! Temporal Databases



Storing data that changes over time



Can ask about the history of the DB rather than just the current state



System time vs real time

Logic Databases A database is a set of facts and rules for manipulating them The DBMS maintains and controls these facts and rules A ‘query’ is made by applying the rules to the facts