Databases
What is a database?
- A very large, integrated collection of data
- The amount of data is very large
- The data is structured and interrelated
- The data is integrated
- Models real-world enterprises or oraganizations
- Entities (e.g., students, courses)
- Relationships (e.g., Li is taking Database and Knowledge)
- Databases touch all aspects of our lives
Database Applications
- Banking: customer information, accounts, loans, and banking transactions
- Airlines: reservations, schedules
- Universities: students, registration, grades
- Sales: customers, products, purchases
- Manufacturing: production, inventory, orders, supply chain
- Human resources: employee records, salaries, tax deductions
Database Management System (DBMS)
Defination
A Database Management System (DBMS) is a software package designed to store and manage database
Functions
- Manages very large amount of data
- Supports efficent access to very large amount data
- Supports concurrent access to very large amount of Data
- Example: bank and its ATM machines, Internet application
- Supports secure, atomic access to very large amount of data
People working with DBMS
- End users: query/update databases through application user interfaces
- Database designers: design database “schema” to model aspects of the real world
- Database appliaction developers: build applications that interface with databases
- Database administrators(a.k.a DBA’s): load, back up, and restore data, fine-tune databases for performance
- DBMS implementors: develop the DBMS or specialized data management software, implement new techniques for query processing and optimization inside DBMS
Drawbacks of using file systems
- Data redundancy and inconsistency
- Difficulty in accessing data
- Data isolation - multiple files and formats
- Integrity problems
- Atomicity of updates
- Concurrent access by multiple users
- Security problems
Why use a DBMS?
- Data independence and efficient access
- Reduced application development time
- Data integrity and security
- Uniform data administration
- Concurrent access, recover from crashes
Database Systems
Defination
- A Database System (DBS) contains the following components
- Hardware platfom (PC/Workdstation/Cluster etc.)
- OS
- DBMS
- A (number of) database(s)
- DB APPs
Data Models
- A collection of tools for describing
- Data structures
- Data relationships
- Data semantics
- Data constraints
- Entity-Relationship data model (mainly for database design)
- Relational model
- Object-based data models (Object-oriented and Object-relational)
- Semi-structured data model (XML)
- Other older models:
- Network model (e.g. IDS[Integrated Data Store])
- Hierarchical model (e.g. IMS[Info. Management System])
Levels of Abstraction
- Physical level: describes how a record(e.g., employee) is stored.
- Logical level: describes data stored in database, and the relationships among the data.
- View level: application programs hide details of data types.
Schemas and Instances
- Similar to types and value of variables in programming languages
- Schema - the logical structure of the database
- Example: The database consists of information about a set of customers and accounts and the relationship between them
- Analogous to type information of a variable in a program
- Physical schema: database design at the physical level
- Logical schema: database design at the logical level
- Instance - the actual content of the database at a particular point of time
- Analogous to the value of a variable
- Physical Data Independence - the ability to modify the physical schema without changing the logical schema
- Applications depend on the logical schema
- In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others
Data Manipulation Language (DML)
- Language for accessing and manipulating the data organized by the appropriate data model
- DML also known as query language
- Two classes of languages
- Procedual - user specifies what data is required and how to get those data
- Declarative (nonprocedural) - user specifies what data is required without specifying how to get those data
- SQL is the most widely used query language
Data Definition Language (DLL)
- Specification notation for defining the database schema
- DDL compiler generates a set of tables stored in a data dictionary
- Data dictionary contains metadata (i.e., data about data)
- Database schema
- Data storage and definition language
- Specifies the storage structure and access methods used
- Integrity constraints
- Domain constraints
- Referential integrity (references constraint in SQL)
- Assertions
- Authorization
Database Design
The process of designing the general structure of the database:
- Logical Design - Deciding on the database schema. Database design requires that we find a “good” collection of relation schema.
- Business decision - What attributes should we record in the database?
- Computer Science decision - What relation schemas should we have and how should the attributes be distributed among the various relation schemas?
- Physical Design - Deciding on the physical layout of the database
The Entity-Relationship Model
- Models an enterprise as collection of entities and relationships
- Entity (rectangle): a “thing” or “object” in the enterprise that is distinguishable from other objects
- Described by a set of attributes (ellipse)
- Relationship (diamond): an association among several entities
- Entity (rectangle): a “thing” or “object” in the enterprise that is distinguishable from other objects
- Represented diagrammatically by an entity-relationship diagram
Database System Architecture
The architecture of a database systems is greatly influenced by the underlying computer system on which the database is running:
- Centralized
- Client-server / Browser-server
- Parallel (multi-processor)
- Distributed
Storage Management
- Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
- The storage manager is responsible to the following tasks:
- Interaction with the file manager
- Efficient storing, retrieving and updating of data
- Issues:
- Storage access
- File organization
- Indexing and hashing
Query Processing
- Parsing and translation
- Optimization
- Evaluation
- Alternative ways of evaluating a given query
- Equivalent expressions
- Different algorithms for each operation
- Cost difference between a good and a bad way of evaluating a query can be enormous
- Need to estimate the cost of operations
- Depends critically on statistical information about relations which the database must maintain
- Need to estimate statisics for intermediate results to compute cost of complex expressions
Transaction Management
- A transaction is a collection of operations that performs a sgingle logical function in a database appliacation
- Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashed) and transaction failures
- Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database
近期评论