An Introduction to Data Modeling
Shashi Shekhar
Computer Science Faculty
University of Minnesota, Minneapolis, MN 55455



Presented
to Post-doctoral researchers
at Institute of Mathematics and Its Applications
University of Minnesota, Minneapolis, MN 55455
January 25th, 2001.


S1: Introductions
  • Audience
    • Name
    • Current Group
    • Goals
    • Background
      • Data Modeling (e.g. ERDs, Table design)
      • Objects (e.g. C++, Java)
    • Instructor
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Course Material
  • Course Notes
  • Butterfly book
    • Oracle8 Design Tips, D. Ensor & I. Stevenson,
      • O' Reilly Press, 1998 (ISBN 1-56592-361-8).
    • Reference Book
    • A First Course in Database Systems
      • J. Ullman, J. Widom, Prentice Hall, 1998.
      • ISBN 0-13-861337-0
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: Other Announcements

Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: Schedule
  • Introductions, Overview
  • A. Object Oriented Databases
  • B. Data Modeling; ODL interfaces, relationships
  • C. Classification
  • D. ODL Subclasses and Keys
  • Summary
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: What is a Database Management System?
  • 1. Manages very large amounts of data.
  • 2. Supports efficient access to very large amounts of data.
  • 3. Supports concurrent access to v.l.a.d.
    • Example: Bank and its ATM machines.
  • 4. Supports secure, atomic access to v.l.a.d.
    • Contrast two people editing the same UNIX file-
    • last to write "wins"- with the problem
    • if two people withdraw money from an account
      • via 2 ATM machines at the same time
    • new balance may be wrong
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Relational Model
  • Based on tables, as:
    acct# name balance
    12345 Sally 1000.21
    34567 Sue 285.48
    ... ... ...
  • "Big three" (relational) DBMS companies
    • - Oracle, Informix, Sybase -
    • among largest software companies in the world.
    • Other Players: IBM DB2, MS SQL Server/Access
  • Relational companies also challenged by
    • startup "object- oriented DB" companies.
    • But countered with "object-relational" systems,
      • which retain the relational core
      • while allowing type extension as in OO systems.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: Object Oriented Databases
  • SQL2 Column Type Limitations
    • Few types - numbers, strings, date, currency, ..
    • No user-defined types or methods
    • No type hierarchy (inheritance, part-of)
    • Column types are atomic
    • Ex. Employees has multiple private phones
      • = > create a separate phone table
    • Normalization = > Segmentation and Performance penalty
  • SQL2 Table type limitations
    • tables = sets of records (mostly)
    • Not bags, lists, arrays
    • Ex.: Bags allow Duplicates.
    • Which SQL operations treat duplicates correctly?
      • UNION, INTERSECTION, SELECT, ...
    • Inadequate for many applications
    • Business, CAD/CAM, GIS, CASE...
    • Impedance mismatch b/w DB and application languages
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: Two Approaches to Next Generation Databases
  • Extended Relational DBs (object-relational)
    • Ex. Informix UDB, Oracle8, DB2 UDB
    • Refine SQL
    • object = A field within a row of a table
    • User defined row/column types
    • User defined methods as columns
    • May support inheritance
  • Object Oriented DBs
    • Ex. ObjectStore, Versant, O2, Gemstone
    • Abandon SQL
    • Add persistent objects to C++/Java/Smalltalk
    • OQL to query sets of objects
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Basic Object Data Model
  • Basic Object Model is similar to object programming model
  • Objects and object identifiers
    • Object has state (attributes) and behaviour (methods)
    • OID, independent of attribute-values, location, structure
  • Complex values and types
    • Attributes: primitive (integer), or non-primitive (list)
    • Constructors (e.g. set, tuple, list, ...)
  • Encapsulation
    • interferes with querying
    • may not be enforced
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S10: Basic Object Data Model - 2
  • Classes
    • Template for a set of objects
      • with same attributes and methods
    • Persistent vs. non-persistent classes
    • Meta-classes, An object can change its class
  • Inheritance:
    • subclass inherits attributes, methods of its superclass
    • subclass may add new attributes/methods
      • or redefine inherited ones.
    • Overloading, overriding, late binding a method name
    • Common name for different operations
      • inside a class or a set of classes.
    • Late binding: resolve ambiguity at run-time
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S11: Semantic Extensions for data modeling
  • Semantics for references to other objects
    • Weak (normal) vs. composite (part-of relationship)
    • Exclusive/Shared part
    • Dependent/Independent parts
      • Q? Delete part if whole is deleted?
    • Modeling relationships
    • Associations b/w entities
    • has a degree, cardinality
    • Q? Compare association with references
      • Bi-directional
    • Integrity Constraints:
    • assertions to be satisfied by database objects .
    • Uniqueness constraint: Identical vs. Equality
    • Often support referential integrity constraints
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S12: Object Data Model Vs. Relational Model
  • Attribute/Field values
    • User defined types
    • Multi-valued attributes violates 1NF
    • Constructors - list, set, tuple
  • Object/Classes Vs. Tuples/Entities
    • Methods to model behaviour
    • Object-id vs. Primary Key
      • globally unique for lifetime
    • Encapsulation
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S13: Object Vs. Relational Model (Contd.)
  • Relationships
    • Inheritance, Generalization/Specialization
    • Classification/instantiation
    • Aggregation (part-of hierarchy)
    • References vs. foreign keys
    • Association
  • Model Inherent Integrity Constraints
    • Uniqueness constraint - 2 meanings (identical vs. equal)
    • Lack of automatic enforcement
    • Relies on trigger/rule system
    • Referential IC is hard-coded in methods
  • Management of changes over time
    • Object versions
    • Schema evolution (e.g. Orion, Itasca)
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S14: Denormalization Example
  • Object paradigm keeps pieces together
    • even if it violates normal forms
  • Analogy : Parking a car in garage
    • However, determining the clustering unit is hard
    • Since different query may use different groups
  • Recommendation: Stick to 3rd normal form
    • Except for grouping weak entities with identifying ones.
    • Fixed length collection may be grouped
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S15: Normalize or not?
  • Consider time series data
  • Electricity industry in U.K.
    • Each day divided into 48 half-hour periods,
    • Demand and price depend on the time period
    • individual readings significant within a day
      CREATE TYPE hh_reading AS VARRAY (48) OF NUMBER (9,2);
      CREATE TABLE meter_days(1_date,
      meter_status, hh_reading,...);
      INSERT INTO meter_days VALUES (sysdate, 'OK',
      hh_reading(187263.23, 72653.98, ...), ...);
  • Beauty is in the eye of beholder
    • Monthly Billing: don't normalize
    • Load balancing: normalize to determine peak hour!
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S16: Physical data storage:
  • Issues:
    • complex values, methods, oid, Object reference,
    • complex object, inheritance
  • Relational DM (e.g. Oracle7)
    • Convert complex values to atomic one
    • Methods = stored procedures in PL/SQL
    • Oid = primary key
    • Object reference = foreign key
    • Complex object: Decompose into multiple tuples
      • Join relevant relations to compose an object
    • Ex. a car = 4 tires + 4 doors + 1 engine + ...
    • Inheritance = 3 table designs
  • Object DM (e.g. Oracle8)
    • One tuple per object
    • Methods = procedure-valued fields in SQL
    • Object-id generators
    • Index : oid - > location of object
    • Path Index
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S17: Query Languages/ Query Optimization in OODB
  • Query Language
    • User defined types/methods:
      • Ex. aStudent.address.distance(aDepartment.location)
    • Path expressions: aStudent.department.manager.name
    • Query across a class hierarchy: select-any* (Postgres)
    • Closer to programming languages
      • Ex. C++/Java API, ODMG standard
    • Query Optimization Issues
    • User defined methods
    • Class hierarchy based queries
    • Multi-valued attribute
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S18: Transactions, Locking, Recovery
  • Lock granularity and objects
    • lock on a complex object
    • lock on object parts
  • Design transaction
    • Long lived, many reads-write
    • Concurrency Control
      • check-in/check-out
      • versions (working, transient, released)
    • Ex. A designer edits a check design
      • edit watermark, print image, integrate
    • Other Issues
    • Schema evolution
    • Rule Systems
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S19: Three Aspects to Studying DBMS's
  • 1. Conceptual Modeling of data
    • Allows exploration of issues
    • before committing to an implementation.
  • 2. Logical Implementation and Programming:
    • Table design, normalization, integrity constraints
    • SQL queries and DB operations like update.
    • Other host languages, e.g. PL/SQL, C, Java, ...
  • 3. Physical DBMS implementation.
    • Storage allocation, Index selection
    • Database administration
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S20: Conceptual Modeling
  • Entity/Relationship Model
    • Entity like object, = "thing."
    • Entity set like class = set of "similar" entities/objects
    • Attribute = property of entities in an entity set,
      • similar to fields or "instance variables."
    • Diagrams to represent designs.
    • entity set = rectangle;
    • attribute = oval.
    • Relationsip = diamonds.
  • Relationships
    • Connect two or more entity sets.
  • Relationship Set
    • is a table with
    • One column for each of the connected entity sets.
    • One row for each list of entities,
      • one from each set, that are connected by the relationship.
    • Example
      Employee Project
      Sally S145
      Sally S244
      Joe S145
      ... ...
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S21: Multiway Relationships
  • Usually binary relationships suffice
  • However, sometimes three or more Entities
    • are connected by one relationship.
    • Consider: multi-party transactions
    • Ex. a customer buys a airline ticket from a travel agent
    • customer, < airline, flight > , travel agent, ...
  • Can we decompose it into binary relationships?
    • strategy: create a transaction entity
    • unique transaction identifier
    • binary relationship from transaction to other entities
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S22: ODL
  • Database modelling and implementation processes
    • Ideas -- ERD -- Relations -- Relational Database
    • Ideas -- ODL -- Relations -- Relational Database
    • Ideas -- ODL -- Object Database
  • Design language derived from the OO community:
    • Roots in ODMG
    • Used like E/R
    • conceptual design of a relational DB.
    • can be direct input to some OODBMS's.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S23: ODL Class Declarations
  • Syntax
    interface < name > {
    elements = attributes, relationships,
    methods
    }
  • Syntax for Element Declarations
    attribute < type > < name > ;
    relationship < rangetype > < name > ;
  • Method Example
    float gpa (in: Student) raises (noGrades)
    • float = return type.
    • Colon(:) = > Student argument is read-only.
    • Other options: out, inout
    • noGrades is an exception
      • that can be raised by method gpa.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S24: Books-Stores-Customers Example
  • One of the running example
    interface Books {
    attribute string name;
    attribute string author;
    relationship Set < Stores > soldAt
    inverse Stores::sells;
    relationship Set < Customers > fans
    inverse Customers::likes;
    }
  • Comments
    • Relationships have inverses
    • Element form another class is indicated by < class > ::
    • Form a set type with Set < type > .
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S25: Books-Stores-Customers Example
  • Another class
    interface Stores {
    attribute string name;
    attribute Stuct Addr
    {string street, string city, int zip}
    address;
    relationship Set < Customers > serves
    inverse Customers::frequents;
    relationship Set < Books > sells
    inverse Books::soldAt;
    }
  • Structured types
    • have names and bracketed lists of field-type pairs.
  • Enumerated types
    • have names and bracketed lists of values.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S26: Books-Stores-Customers Example
  • Another class
    interface Customers {
    attribute string name;
    attribute Struct Stores::Addr
    address;
    relationship Set < Books > likes
    inverse Books::fans;
    relationship Set < Stores > frequents
    inverse Stores::serves;
    }
  • Note reuse of Addr type.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S27: Books-Stores-Customers Example
  • Q? Compare ODL exmaple with Entity Relationship Model.
    • Identify the entities
    • Identify the relationships among entities
    • Identify the cardinality constraints on relationships
    • Draw an Entity Relationship Diagram
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S28: ODL Type System
  • Basic types:
    • int, real/float, string,
    • enumerated types,
    • classes.
  • Type constructors:
    • Struct for structures
    • four collection types:
      • Set
      • Bag - duplicates allowed
      • List - element have a position
      • Array - element have index
    • Limitation on Nesting
    • Consider the following collections
    • S1 = (1, 2), S2 = (1, 2, 1), S3 = (2, 1, 1)
    • set S1 = set S2 = set S3
    • bag S1 != bag S2 = bag S3
    • list S2 != list S3
    • lists and arrays differ in storage management
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S29: Cardinality constraints on Relationships
  • Cardinality constraints
    • One-One (Employee - preferred-email-address )
    • Many-One (Employee - Department)
    • Many-Many (Employee - Project)
  • Participation constraints
    • Optional
    • Mandatory
  • Combined constraints
    • Range of values: 0..1, 1, 0..N, N
    • Example: One to One
    • 0..N
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S30: Cardinality constraints
  • E.R. Diagram:
    • no arrow = > many (0..N)
    • arrow pointing to "one." (0..1)
    • Rounded arrow = "exactly one." (1)
  • ODL: don't use a collection type
    • for relationship in the "many" class.
    • Collection type remains in "one."
  • Example:
    • likes/fan, sells/soldAt, serves/frequents
    • many to many in E.R. diagram
    • one to one in ODL
  • One-One Relationships
    • E/R: arrows in both directions.
    • ODL: omit collection types in both directions.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S31: ODL and Cardinality constraints
  • Implication of Collection types
  • Example: Customers Have Favorite Books
    interface Customers{
    attribute string name;
    attribute Struct Bars::Addr address;
    relationship Set < Books > likes inverse Books::fans;
    relationship Books favoriteBook inverse Books::realFans;
    relationship Set < Stores > frequents inverse Stores:: serves;
    }
  • Also add to Books:
    relationship Set < Customers > realFans
    inverse Customers::favoriteBook;
  • Q? Cardinality Constraint for "likes/realFans"
    • one to many in E.R.
    • one to one in ODL
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S32: Another Example
  • Consider the following ODL description:
    interface Movie{
    attribute string title;
    attribute integer year;
    attribute integer length;
    attribue enum Film (color, monochrome) filmType;
    relationship < R1 > ;
    relationship Studio ownedBy
    inverse Studio:owns;
    };
    interface Star{
    attribute string name;
    attribute Struct addr
    {string street, string city} address;
    relationship Set < Movie > starredIn
    inverse Movie::stars;
    }
    interface Studio{
    attribute string name;
    attribute string address;
    relationship < R2 > ;
    }
  • Ex. Fill out the description for relationships R1 and R2
  • Ex. Refine schema to model the "Sequel" relationship among Movies.
  • Ex. Draw an equivalent Entity Relationship diagram.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S33: Classification
  • Chapter on classification from the
  • OO Concepts course.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S34: ODL Subclasses
  • Subclass = special case
    • fewer entities/objects in each subclass
    • more properties
  • Example: Biographies are a kind of Books.
    • Has the attributes, relationships, methods of Books
    • Also has "Person" attribute for Biographies.
  • ODL Syntax
    • subclass-name colon superclass-name.
      interface Biographies:Books {
      attribute String Person;
      }
  • Semantics: Objects of the Biographies class
    • acquire all the attributes and
    • relationships of the Books class.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S35: Subclasses in E.R. vs. ODL
  • E. R. notation
    • Syntax: triangles indicate the subclass relation.
    • Semantics: entity has "representation"
      • in all the subclasses to which it logically belongs.
    • Its properties are the union of
      • the properties of these classes.
    • Difference in Subclass Viewpoints
    • In ODL, an object is in exactly one class.
      • It inherits properties of its superclass(es).
    • The distinction matters,
      • when we convert ODL and E/R to relations.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S36: Multiple Inheritance
  • Issue:
    • A class/E.S. could be a subclass of several classes.
    • Ex. Orthogonal classifications
  • Problems
    • (How should conflicts be resolved?)
  • Example:
    • manf means grower for wines,
      • bottler for beers.
    • What does manf mean for "grape beers"?
  • E/R mute on multiple inheritance
  • ODL leaves it to implementer.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S37: Domain Constraints
  • Keys
    • = set of attributes
    • which uniquesly identifies an entity or object
  • E. R. Model
    • Syntax: underline all key attributes.
    • Each Entity Set must have a key.
    • Example: Suppose name is key for Books.
    • Book name is also key for Biographies.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S38: Domain Constraints in ODL
  • Keys in ODL
    • Syntax: key( a list of attrbutes ) clause
      • following the class name,
    • Alternative keys: Several lists may be used
      • Parentheses group members of a key,
    • Example:
    • (key(a1, a2,...,an)) = "one key consisting of all n attributes."
    • (key a1, a2,...,an) = "each ai is a key by itself.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S39: Domain Constraints in ODL
  • Example:
    interface Books
    (key name)
    {
    attribute string name...
  • Example of a Multiattribute Key
    interface Transaction
    (key (transaction-id) (airline, flight, date, customer))
    {
    ...
  • E/R requires exactly one key,
    • but ODL allows zero or more keys.
  • NOTE: In ODL, no key at all is necessary,
    • since "object identity" distinguishes objects.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S40: Exercise
  • Revisit the Movie-Star-Studio schema
  • Identify keys for each class
    • modify the classes to identify the keys
  • Refine the schema to add the following information
    • Casting for a movie consists of a list
      • mapping Stars to the movie characters.
      • Ignore credits to non-stars.
    • Refine the schema to add the following information
    • Stars are classified into Actors and Actresses
    • Movies are classified into Comedy, Suspense, Drama.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S41: Weak Entity Sets
  • Weak entity set
    • has no natural unique identifier.
    • Its key includes the keys of another entity set E2
  • Example:
    • Dependents of an employees
    • Flights of an airline
  • E. R Diagram
    • a double rectangle around the weak E.S.
    • a double diamond around identifying relationship
  • NOTE: no "weak class" in ODL.
    • Because object-identifiers are unique
    • No need to "borrow" keys from related classes.
    • Q? Is it consistent with business rules?
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S42: Examples: Logins (Email Addresses)
  • Login name = user name + host name,
    • e.g., ullman@shalmaneser.stanford.edu
  • A "login" entity
    • corresponds to a user name on a particular host,
    • but the password table doesn't record the host,
      • just the username, e.g., ullman.
    • Key for a login =
    • the user name at the host
      • unique for that host only
    • + the IP address fo the host
      • unique globally.
Copyright: S. Shekhar, C. S. Dept., University of Minnesota, Minneapolis, MN 55455. ,,.(home)