Tomorrow BAD Modern Databases… ... We have databases that hold a high
amount of data, in the order ... Distributed database management system (
DDBMS).
G64DBS Database Systems Lecture 17 Modern Trends in Databases
Modern Databases • GOOD Modern Databases • • • •
Parallel Processing Distributed DBs Web-based DBs Multimedia DBs
• Tomorrow BAD Modern Databases…
Tim Brailsford
Other Sorts of Database • • • •
•
There are several other flavours of DB in use today
Relational model
Distributed DBs
SQL
Object DBs
Design techniques
Multimedia DBs
Transactions
Temporal DBs
•
Logic DBs
Many of these topics relied on relational concepts
Benefits of Parallel Databases ! Improves Response Time. INTERQUERY PARALLELISM : It is possible to process a number of transactions in parallel with each other.
!
• •
Improves Throughput. INTRAQUERY PARALLELISM: It is possible to process ‘sub-tasks’ of a transaction in parallel with each other.
•
Why Parallel Databases? More and More Data!
•
We have databases that hold a high amount of data, in the order of 1012 bytes:
• 10,000,000,000,000 bytes! Faster and Faster Access!
•
We have data applications that need to process data at very high speeds:
• 10,000s transactions per second! ONE Processor just won’t do it!
SPEED-UP
Number of transactions/second
•
We have looked mainly at relational databases
1. Parallel Databases?
Linear speed-up (ideal)
2000/Sec 1600/Sec
Sub-linear speed-up
1000/Sec
5 CPUs
10 CPUs
Number of CPUs
16 CPUs
DUMB
• Parallel processing is clearly useful. How has this affected things?
• Mainframes • Client-Server Architecture • Parallel Systems • Distributed Systems
DUMB
DUMB
The client/server architecture is a general model for systems where a service is provided by one system (the server) to another (the client)
Server
Hosts the DBMS and database Stores the data
PRESENTATION LOGIC BUSINESS LOGIC DATA LOGIC
Client/Server Architecture
Client/Server Architecture
•
MAINFRAME COMPUTER
NETWORK CONNECTION
Some History
TERMINALS
CLIENT 1
CLIENT 2
Client User programs that use the database Use the server for database access
SERVER:
DBMS
DB
CLIENT 3
DATA LOGIC
PRESENTATION LOGIC BUSINESS LOGIC
2. Distributed Databases •
•
A distributed DB system consists of several sites
• • •
Sites are connected by a network Each site can hold data and process it It shouldn’t matter where the data is - the system is a single entity
Distributed database management system (DDBMS)
• •
A DBMS (or set of them) to control the databases Communication software to handle interaction between sites
Types of Distribution • There are two basic options with which we will be concerned when it comes to distribution:
• •
Distributed processing Distributed data
• With one exception (distributed data, nondistributed processing), neither of these necessarily implies the other
What is a Distributed Database?
Distributed Processing: CLIENT
CLIENT
CLIENT
CLIENT
WIDE AREA NETWORK
New York:
Moscow:
CLIENT
CLIENT
CLIENT
CLIENT
London:
CLIENT
CLIENT
CLIENT
CLIENT
Beeston:
CLIENT
CLIENT
CLIENT
CLIENT
•
A distributed database system is a collection of logically related databases that co-operate in a transparent manner.
• •
There should be ‘location independence’
DBMS
Distributed Processing:
DBMS
CLIENT
CLIENT
CLIENT
CLIENT
Moscow:
DBMS
CLIENT
CLIENT
CLIENT
CLIENT
WIDE AREA NETWORK
New York:
i.e.- as the user is unaware of where the data is located it is possible to move the data from one physical location to another without affecting the user.
London:
CLIENT
CLIENT
CLIENT
CLIENT
DBMS
•
Reduced Communication Overhead: Most data access is local, less expensive and performs better.
•
Improved Processing Power: Instead of one server handling the full database, we now have a collection of machines handling the same database.
•
Removal of Reliance on a Central Site: If a server fails, then the only part of the system that is affected is the relevant local site. The rest of the system remains functional and available.
Beeston:
CLIENT
CLIENT
CLIENT
CLIENT
DBMS
Reasons for Distribution •
Expandability : It is easier to accommodate increasing the size of the global (logical) database.
•
Local autonomy : The database is brought nearer to its users. This can effect a cultural change as it allows potentially greater control over local data .
Reasons for Distribution
Reasons Against Distribution
•
Complexity (distributed database systems, especially, are considerably more complex than centralized or client/server ones)
•
Security (more opportunities for protection failure or attack)
• • •
Software & management costs Lack of standards Data integrity more difficult to maintain
Transparency •
Fragmentation
To obtain the benefits of distributed data without incurring added operational complexity, distributed database systems should be transparent
• •
• When you split data up over separate locations you have to make a choice:
• Due you split up the rows of a table, or the
What is “transparency”?
columns of a table?
• These are horizontal and vertical
A transparent distributed database system would look, to a user, just like a centralized database system.
New York:
Horizontal Fragmentation
fragmentation respectively.
Vertical Fragmentation New York:
branch_name
account_number
balance
branch_name
customer_name
tuple_id
Hillside Hillside Hillside
A-305 A-226 A-155
500 336 62
Hillside Hillside Valleyview Valleyview
Lowman Camp Camp Kahn
1 2 3 4
Beeston:
Beeston:
branch_name
account_number
balance
Valleyview Valleyview Valleyview Valleyview
A-177 A-402 A-408 A-639
205 10000 1123 750
Transactions in Client/ Server Systems
account_number A-305 A-226 A-177 A-402
balance
tuple_id
500 336 205 10000
1 2 3 4
Transactions in Distributed Database Systems
•
Transactions in a single server environment are simple – the same as in a centralized system
•
•
Transactions in a multi-server system are serveroriented.
Transactions in a distributed database system may be either global or local
•
•
That is, a single transaction cannot involve multiple servers because the servers operate completely independently of each other
Support for global transactions is provided by the DDBMS (Distributed DBMS) in a true distributed database system.
•
This is not a simple task.
Distributed vs. centralized
•
Both have pro’s and con’s… In other words, if you choose a distributed database you are spreading out lots of small headaches rather than having one central migraine.
Client (Browser)
SQL query
Web Server
•
anyone?
HTTP request
MSQL query
MS SQL Server
SQL result
Client sends a request for a page to the web server
Web-based clients
Web server sends SQL to database
Web server Database server(s)
The web server uses results to create page The page is returned to the client
Web-based Databases •
Disadvantages:
Advantages:
• •
Security can be a problem if you are not extremely careful
World-wide access Internet protocols (HTTP, SSL, etc) give uniform access and security
•
Database structure is hidden from clients
•
Uses a familiar interface
Interface is less flexible using standard browsers Limited interactivity over slow connections
Corporate Style:
Internet Explorer
ASP .NETHTML page
Typical operation:
Web server serves pages to browsers (clients) and can access database(s)
HTML page
Database Server SQL result
Microsoft
Database access over the internet
• • •
Web-based Databases HTTP request
3. Web-based Databases
HTTP request
PL/SQL query
Oracle
JSP
SQL result
Internet Explorer
HTML page
Open Source: HTTP request
SQL query
PHP
Even more choice HTTP request
Firefox SQL query
HTML page
PHP
Firefox
HTML page
Perl MySQL
PostgreSQL
SQL result
SQL result
Ruby Python
Web Based Approaches
4. Multimedia Databases
•
(Microsoft) MSQL + JSP .NET (Open Source) MySQL + PHP
• • • • • •
(Oracle) PL/SQL + ? (Open Source) PostgreSQL + ?
•
•
The scripting language generates the query depending on what you the web user requests. It then takes the results and formats them into HTML + Javascript.
Querying Multimedia Databases Metadata searches
•
Information about the multimedia data (metadata) is stored
•
This can be kept in a standard relational database and queried normally
•
Limited by the amount of metadata avalilable
Content searches The multimedia data is searched directly Potential for much more flexible search Depends on the type of data being used Often difficult to determine what the ‘correct’ results are
Multimedia DBs can store complex information
They can be used in a wide range of application areas
Images
Entertainment
Music and audio
Marketing
Video and animation
Medical imaging
Full texts of books
Digital publishing
Web pages
Geographic Information Systems
Uses a familiar
Metadata Searches •
Example - indexing films we might store
• • • • • • •
We can then search for things like
Title
Films starring Kevin Spacey
Year
Films directed by Peter Jackson
Genre(s) Actor(s) Director(s) Producer(s) Keywords
Dramas produced in 2000
We don’t actually search the films themselves.
•
Metadata Searches Advantages:
•
Disadvantages:
Metadata can be structured in a traditional DBMS
Metadata can’t always be found automatically, and so requires data entry
•
Metadata is generally concise and so efficient to store
It restricts the sorts of queries that can be made
•
Metadata enriches the content
•
Content Searches An alternative to metadata is to search the content directly
• •
It is a richer source of information but harder to process
•
Image retrieval is hard.
•
It is often not clear when two images are ‘similar’
•
Image interpretation is unsolved and expensive
•
Different people expect different things
http://wwwqbic.almaden.ibm.com/cgi-bin/stamps-demo
But its more subtle than that… •
This is a common theme in all automated information management. But how on earth does a computer know what red means?
•
It cannot. It has no senses. It is impossible because it is not situated.
•
This is why we should not be concerned with AI. We must leave the semantics of what predicates mean to humans.
• i.e. Avoid Inference Engines like the plague.
Find images similar to a given sample Hum a tune and find out what it is Search for features, such as cuts or transitions in films
Content-Based Retrieval
Content-Based Retrieval QBIC™ (Query By Image Content) from IBM - searches for images having similar colour or layout
Multimedia is less structured than metadata
Example of content based retrieval
•
Do we look for? Images of roses Images of red things? Images of flowers? Images of red flowers? Images of red roses?
5. Yet more databases! Temporal Databases
•
Storing data that changes over time
•
Can ask about the history of the DB rather than just the current state
•
System time vs real time
Logic Databases A database is a set of facts and rules for manipulating them The DBMS maintains and controls these facts and rules A ‘query’ is made by applying the rules to the facts