database concept

Databases

What is a database?

  • A very large, integrated collection of data
    • The amount of data is very large
    • The data is structured and interrelated
    • The data is integrated
  • Models real-world enterprises or oraganizations
    • Entities (e.g., students, courses)
    • Relationships (e.g., Li is taking Database and Knowledge)
  • Databases touch all aspects of our lives

Database Applications

  • Banking: customer information, accounts, loans, and banking transactions
  • Airlines: reservations, schedules
  • Universities: students, registration, grades
  • Sales: customers, products, purchases
  • Manufacturing: production, inventory, orders, supply chain
  • Human resources: employee records, salaries, tax deductions

Database Management System (DBMS)

Defination

A Database Management System (DBMS) is a software package designed to store and manage database

Functions

  • Manages very large amount of data
  • Supports efficent access to very large amount data
  • Supports concurrent access to very large amount of Data
    • Example: bank and its ATM machines, Internet application
  • Supports secure, atomic access to very large amount of data

People working with DBMS

  • End users: query/update databases through application user interfaces
  • Database designers: design database “schema” to model aspects of the real world
  • Database appliaction developers: build applications that interface with databases
  • Database administrators(a.k.a DBA’s): load, back up, and restore data, fine-tune databases for performance
  • DBMS implementors: develop the DBMS or specialized data management software, implement new techniques for query processing and optimization inside DBMS

Drawbacks of using file systems

  • Data redundancy and inconsistency
  • Difficulty in accessing data
  • Data isolation - multiple files and formats
  • Integrity problems
  • Atomicity of updates
  • Concurrent access by multiple users
  • Security problems

Why use a DBMS?

  • Data independence and efficient access
  • Reduced application development time
  • Data integrity and security
  • Uniform data administration
  • Concurrent access, recover from crashes

Database Systems

Defination

  • A Database System (DBS) contains the following components
    • Hardware platfom (PC/Workdstation/Cluster etc.)
    • OS
    • DBMS
    • A (number of) database(s)
    • DB APPs

Data Models

  • A collection of tools for describing
    • Data structures
    • Data relationships
    • Data semantics
    • Data constraints
  • Entity-Relationship data model (mainly for database design)
  • Relational model
  • Object-based data models (Object-oriented and Object-relational)
  • Semi-structured data model (XML)
  • Other older models:
    • Network model (e.g. IDS[Integrated Data Store])
    • Hierarchical model (e.g. IMS[Info. Management System])

Levels of Abstraction

  • Physical level: describes how a record(e.g., employee) is stored.
  • Logical level: describes data stored in database, and the relationships among the data.
  • View level: application programs hide details of data types.

Schemas and Instances

  • Similar to types and value of variables in programming languages
  • Schema - the logical structure of the database
    • Example: The database consists of information about a set of customers and accounts and the relationship between them
    • Analogous to type information of a variable in a program
    • Physical schema: database design at the physical level
    • Logical schema: database design at the logical level
  • Instance - the actual content of the database at a particular point of time
    • Analogous to the value of a variable
  • Physical Data Independence - the ability to modify the physical schema without changing the logical schema
    • Applications depend on the logical schema
    • In general, the interfaces between the various levels and components should be well defined so that changes in some parts do not seriously influence others

Data Manipulation Language (DML)

  • Language for accessing and manipulating the data organized by the appropriate data model
    • DML also known as query language
  • Two classes of languages
    • Procedual - user specifies what data is required and how to get those data
    • Declarative (nonprocedural) - user specifies what data is required without specifying how to get those data
  • SQL is the most widely used query language

Data Definition Language (DLL)

  • Specification notation for defining the database schema
  • DDL compiler generates a set of tables stored in a data dictionary
  • Data dictionary contains metadata (i.e., data about data)
    • Database schema
    • Data storage and definition language
      • Specifies the storage structure and access methods used
    • Integrity constraints
      • Domain constraints
      • Referential integrity (references constraint in SQL)
      • Assertions
    • Authorization

Database Design

The process of designing the general structure of the database:

  • Logical Design - Deciding on the database schema. Database design requires that we find a “good” collection of relation schema.
    • Business decision - What attributes should we record in the database?
    • Computer Science decision - What relation schemas should we have and how should the attributes be distributed among the various relation schemas?
  • Physical Design - Deciding on the physical layout of the database

The Entity-Relationship Model

  • Models an enterprise as collection of entities and relationships
    • Entity (rectangle): a “thing” or “object” in the enterprise that is distinguishable from other objects
      • Described by a set of attributes (ellipse)
    • Relationship (diamond): an association among several entities
  • Represented diagrammatically by an entity-relationship diagram

Database System Architecture

The architecture of a database systems is greatly influenced by the underlying computer system on which the database is running:

  • Centralized
  • Client-server / Browser-server
  • Parallel (multi-processor)
  • Distributed

Storage Management

  • Storage manager is a program module that provides the interface between the low-level data stored in the database and the application programs and queries submitted to the system.
  • The storage manager is responsible to the following tasks:
    • Interaction with the file manager
    • Efficient storing, retrieving and updating of data
  • Issues:
    • Storage access
    • File organization
    • Indexing and hashing

Query Processing

  1. Parsing and translation
  2. Optimization
  3. Evaluation
  • Alternative ways of evaluating a given query
    • Equivalent expressions
    • Different algorithms for each operation
  • Cost difference between a good and a bad way of evaluating a query can be enormous
  • Need to estimate the cost of operations
    • Depends critically on statistical information about relations which the database must maintain
    • Need to estimate statisics for intermediate results to compute cost of complex expressions

Transaction Management

  • A transaction is a collection of operations that performs a sgingle logical function in a database appliacation
  • Transaction-management component ensures that the database remains in a consistent (correct) state despite system failures (e.g., power failures and operating system crashed) and transaction failures
  • Concurrency-control manager controls the interaction among the concurrent transactions, to ensure the consistency of the database