An Introduction to Data Modeling
Shashi Shekhar
Computer Science Faculty
University of Minnesota, Minneapolis, MN 55455
Presented
to Post-doctoral researchers
at Institute of Mathematics and Its Applications
University of Minnesota, Minneapolis, MN 55455
January 25th, 2001.
S1: Introductions
- Audience
- Name
- Current Group
- Goals
- Background
- Data Modeling (e.g. ERDs, Table design)
- Objects (e.g. C++, Java)
- Instructor
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S2: Course Material
- Course Notes
- Butterfly book
- Oracle8 Design Tips, D. Ensor & I. Stevenson,
- O' Reilly Press, 1998 (ISBN 1-56592-361-8).
- Reference Book
- A First Course in Database Systems
- J. Ullman, J. Widom, Prentice Hall, 1998.
- ISBN 0-13-861337-0
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S3: Other Announcements
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S4: Schedule
- Introductions, Overview
- A. Object Oriented Databases
- B. Data Modeling; ODL interfaces, relationships
- C. Classification
- D. ODL Subclasses and Keys
- Summary
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S5: What is a Database Management System?
- 1. Manages very large amounts of data.
- 2. Supports efficient access to very large amounts of data.
- 3. Supports concurrent access to v.l.a.d.
- Example: Bank and its ATM machines.
- 4. Supports secure, atomic access to v.l.a.d.
- Contrast two people editing the same UNIX file-
- last to write "wins"- with the problem
- if two people withdraw money from an account
- via 2 ATM machines at the same time
- new balance may be wrong
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S6: Relational Model
- Based on tables, as:
acct# name balance
12345 Sally 1000.21
34567 Sue 285.48
... ... ...
- "Big three" (relational) DBMS companies
- - Oracle, Informix, Sybase -
- among largest software companies in the world.
- Other Players: IBM DB2, MS SQL Server/Access
- Relational companies also challenged by
- startup "object- oriented DB" companies.
- But countered with "object-relational" systems,
- which retain the relational core
- while allowing type extension as in OO systems.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S7: Object Oriented Databases
- SQL2 Column Type Limitations
- Few types - numbers, strings, date, currency, ..
- No user-defined types or methods
- No type hierarchy (inheritance, part-of)
- Column types are atomic
- Ex. Employees has multiple private phones
- = > create a separate phone table
- Normalization = > Segmentation and Performance penalty
- SQL2 Table type limitations
- tables = sets of records (mostly)
- Not bags, lists, arrays
- Ex.: Bags allow Duplicates.
- Which SQL operations treat duplicates correctly?
- UNION, INTERSECTION, SELECT, ...
- Inadequate for many applications
- Business, CAD/CAM, GIS, CASE...
- Impedance mismatch b/w DB and application languages
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S8: Two Approaches to Next Generation Databases
- Extended Relational DBs (object-relational)
- Ex. Informix UDB, Oracle8, DB2 UDB
- Refine SQL
- object = A field within a row of a table
- User defined row/column types
- User defined methods as columns
- May support inheritance
- Object Oriented DBs
- Ex. ObjectStore, Versant, O2, Gemstone
- Abandon SQL
- Add persistent objects to C++/Java/Smalltalk
- OQL to query sets of objects
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S9: Basic Object Data Model
- Basic Object Model is similar to object programming model
- Objects and object identifiers
- Object has state (attributes) and behaviour (methods)
- OID, independent of attribute-values, location, structure
- Complex values and types
- Attributes: primitive (integer), or non-primitive (list)
- Constructors (e.g. set, tuple, list, ...)
- Encapsulation
- interferes with querying
- may not be enforced
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S10: Basic Object Data Model - 2
- Classes
- Template for a set of objects
- with same attributes and methods
- Persistent vs. non-persistent classes
- Meta-classes, An object can change its class
- Inheritance:
- subclass inherits attributes, methods of its superclass
- subclass may add new attributes/methods
- or redefine inherited ones.
- Overloading, overriding, late binding a method name
- Common name for different operations
- inside a class or a set of classes.
- Late binding: resolve ambiguity at run-time
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S11: Semantic Extensions for data modeling
- Semantics for references to other objects
- Weak (normal) vs. composite (part-of relationship)
- Exclusive/Shared part
- Dependent/Independent parts
- Q? Delete part if whole is deleted?
- Modeling relationships
- Associations b/w entities
- has a degree, cardinality
- Q? Compare association with references
- Integrity Constraints:
- assertions to be satisfied by database objects .
- Uniqueness constraint: Identical vs. Equality
- Often support referential integrity constraints
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S12: Object Data Model Vs. Relational Model
- Attribute/Field values
- User defined types
- Multi-valued attributes violates 1NF
- Constructors - list, set, tuple
- Object/Classes Vs. Tuples/Entities
- Methods to model behaviour
- Object-id vs. Primary Key
- globally unique for lifetime
- Encapsulation
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S13: Object Vs. Relational Model (Contd.)
- Relationships
- Inheritance, Generalization/Specialization
- Classification/instantiation
- Aggregation (part-of hierarchy)
- References vs. foreign keys
- Association
- Model Inherent Integrity Constraints
- Uniqueness constraint - 2 meanings (identical vs. equal)
- Lack of automatic enforcement
- Relies on trigger/rule system
- Referential IC is hard-coded in methods
- Management of changes over time
- Object versions
- Schema evolution (e.g. Orion, Itasca)
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S14: Denormalization Example
- Object paradigm keeps pieces together
- even if it violates normal forms
- Analogy : Parking a car in garage
- However, determining the clustering unit is hard
- Since different query may use different groups
- Recommendation: Stick to 3rd normal form
- Except for grouping weak entities with identifying ones.
- Fixed length collection may be grouped
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S15: Normalize or not?
- Consider time series data
- Electricity industry in U.K.
- Each day divided into 48 half-hour periods,
- Demand and price depend on the time period
- individual readings significant within a day
CREATE TYPE hh_reading AS VARRAY (48) OF NUMBER (9,2);
CREATE TABLE meter_days(1_date,
meter_status, hh_reading,...);
INSERT INTO meter_days VALUES (sysdate, 'OK',
hh_reading(187263.23, 72653.98, ...), ...);
- Beauty is in the eye of beholder
- Monthly Billing: don't normalize
- Load balancing: normalize to determine peak hour!
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S16: Physical data storage:
- Issues:
- complex values, methods, oid, Object reference,
- complex object, inheritance
- Relational DM (e.g. Oracle7)
- Convert complex values to atomic one
- Methods = stored procedures in PL/SQL
- Oid = primary key
- Object reference = foreign key
- Complex object: Decompose into multiple tuples
- Join relevant relations to compose an object
- Ex. a car = 4 tires + 4 doors + 1 engine + ...
- Inheritance = 3 table designs
- Object DM (e.g. Oracle8)
- One tuple per object
- Methods = procedure-valued fields in SQL
- Object-id generators
- Index : oid - > location of object
- Path Index
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S17: Query Languages/ Query Optimization in OODB
- Query Language
- User defined types/methods:
- Ex. aStudent.address.distance(aDepartment.location)
- Path expressions: aStudent.department.manager.name
- Query across a class hierarchy: select-any* (Postgres)
- Closer to programming languages
- Ex. C++/Java API, ODMG standard
- Query Optimization Issues
- User defined methods
- Class hierarchy based queries
- Multi-valued attribute
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S18: Transactions, Locking, Recovery
- Lock granularity and objects
- lock on a complex object
- lock on object parts
- Design transaction
- Long lived, many reads-write
- Concurrency Control
- check-in/check-out
- versions (working, transient, released)
- Ex. A designer edits a check design
- edit watermark, print image, integrate
- Other Issues
- Schema evolution
- Rule Systems
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S19: Three Aspects to Studying DBMS's
- 1. Conceptual Modeling of data
- Allows exploration of issues
- before committing to an implementation.
- 2. Logical Implementation and Programming:
- Table design, normalization, integrity constraints
- SQL queries and DB operations like update.
- Other host languages, e.g. PL/SQL, C, Java, ...
- 3. Physical DBMS implementation.
- Storage allocation, Index selection
- Database administration
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S20: Conceptual Modeling
- Entity/Relationship Model
- Entity like object, = "thing."
- Entity set like class = set of "similar" entities/objects
- Attribute = property of entities in an entity set,
- similar to fields or "instance variables."
- Diagrams to represent designs.
- entity set = rectangle;
- attribute = oval.
- Relationsip = diamonds.
- Relationships
- Connect two or more entity sets.
- Relationship Set
- is a table with
- One column for each of the connected entity sets.
- One row for each list of entities,
- one from each set, that are connected by the relationship.
- Example
Employee Project
Sally S145
Sally S244
Joe S145
... ...
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S21: Multiway Relationships
- Usually binary relationships suffice
- However, sometimes three or more Entities
- are connected by one relationship.
- Consider: multi-party transactions
- Ex. a customer buys a airline ticket from a travel agent
- customer, < airline, flight > , travel agent, ...
- Can we decompose it into binary relationships?
- strategy: create a transaction entity
- unique transaction identifier
- binary relationship from transaction to other entities
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S22: ODL
- Database modelling and implementation processes
- Ideas -- ERD -- Relations -- Relational Database
- Ideas -- ODL -- Relations -- Relational Database
- Ideas -- ODL -- Object Database
- Design language derived from the OO community:
- Roots in ODMG
- Used like E/R
- conceptual design of a relational DB.
- can be direct input to some OODBMS's.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S23: ODL Class Declarations
- Syntax
interface < name > {
elements = attributes, relationships,
methods
}
- Syntax for Element Declarations
attribute < type > < name > ;
relationship < rangetype > < name > ;
- Method Example
float gpa (in: Student) raises (noGrades)
- float = return type.
- Colon(:) = > Student argument is read-only.
- Other options: out, inout
- noGrades is an exception
- that can be raised by method gpa.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S24: Books-Stores-Customers Example
- One of the running example
interface Books {
attribute string name;
attribute string author;
relationship Set < Stores > soldAt
inverse Stores::sells;
relationship Set < Customers > fans
inverse Customers::likes;
}
- Comments
- Relationships have inverses
- Element form another class is indicated by < class > ::
- Form a set type with Set < type > .
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S25: Books-Stores-Customers Example
- Another class
interface Stores {
attribute string name;
attribute Stuct Addr
{string street, string city, int zip}
address;
relationship Set < Customers > serves
inverse Customers::frequents;
relationship Set < Books > sells
inverse Books::soldAt;
}
- Structured types
- have names and bracketed lists of field-type pairs.
- Enumerated types
- have names and bracketed lists of values.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S26: Books-Stores-Customers Example
- Another class
interface Customers {
attribute string name;
attribute Struct Stores::Addr
address;
relationship Set < Books > likes
inverse Books::fans;
relationship Set < Stores > frequents
inverse Stores::serves;
}
- Note reuse of Addr type.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S27: Books-Stores-Customers Example
- Q? Compare ODL exmaple with Entity Relationship Model.
- Identify the entities
- Identify the relationships among entities
- Identify the cardinality constraints on relationships
- Draw an Entity Relationship Diagram
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S28: ODL Type System
- Basic types:
- int, real/float, string,
- enumerated types,
- classes.
- Type constructors:
- Struct for structures
- four collection types:
- Set
- Bag - duplicates allowed
- List - element have a position
- Array - element have index
- Limitation on Nesting
- Consider the following collections
- S1 = (1, 2), S2 = (1, 2, 1), S3 = (2, 1, 1)
- set S1 = set S2 = set S3
- bag S1 != bag S2 = bag S3
- list S2 != list S3
- lists and arrays differ in storage management
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S29: Cardinality constraints on Relationships
- Cardinality constraints
- One-One (Employee - preferred-email-address )
- Many-One (Employee - Department)
- Many-Many (Employee - Project)
- Participation constraints
- Combined constraints
- Range of values: 0..1, 1, 0..N, N
- Example: One to One
- 0..N
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S30: Cardinality constraints
- E.R. Diagram:
- no arrow = > many (0..N)
- arrow pointing to "one." (0..1)
- Rounded arrow = "exactly one." (1)
- ODL: don't use a collection type
- for relationship in the "many" class.
- Collection type remains in "one."
- Example:
- likes/fan, sells/soldAt, serves/frequents
- many to many in E.R. diagram
- one to one in ODL
- One-One Relationships
- E/R: arrows in both directions.
- ODL: omit collection types in both directions.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S31: ODL and Cardinality constraints
- Implication of Collection types
- Example: Customers Have Favorite Books
interface Customers{
attribute string name;
attribute Struct Bars::Addr address;
relationship Set < Books > likes inverse Books::fans;
relationship Books favoriteBook inverse Books::realFans;
relationship Set < Stores > frequents inverse Stores:: serves;
}
- Also add to Books:
relationship Set < Customers > realFans
inverse Customers::favoriteBook;
- Q? Cardinality Constraint for "likes/realFans"
- one to many in E.R.
- one to one in ODL
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S32: Another Example
- Consider the following ODL description:
interface Movie{
attribute string title;
attribute integer year;
attribute integer length;
attribue enum Film (color, monochrome) filmType;
relationship < R1 > ;
relationship Studio ownedBy
inverse Studio:owns;
};
interface Star{
attribute string name;
attribute Struct addr
{string street, string city} address;
relationship Set < Movie > starredIn
inverse Movie::stars;
}
interface Studio{
attribute string name;
attribute string address;
relationship < R2 > ;
}
- Ex. Fill out the description for relationships R1 and R2
- Ex. Refine schema to model the "Sequel" relationship among Movies.
- Ex. Draw an equivalent Entity Relationship diagram.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S33: Classification
- Chapter on classification from the
- OO Concepts course.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S34: ODL Subclasses
- Subclass = special case
- fewer entities/objects in each subclass
- more properties
- Example: Biographies are a kind of Books.
- Has the attributes, relationships, methods of Books
- Also has "Person" attribute for Biographies.
- ODL Syntax
- subclass-name colon superclass-name.
interface Biographies:Books {
attribute String Person;
}
- Semantics: Objects of the Biographies class
- acquire all the attributes and
- relationships of the Books class.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S35: Subclasses in E.R. vs. ODL
- E. R. notation
- Syntax: triangles indicate the subclass relation.
- Semantics: entity has "representation"
- in all the subclasses to which it logically belongs.
- Its properties are the union of
- the properties of these classes.
- Difference in Subclass Viewpoints
- In ODL, an object is in exactly one class.
- It inherits properties of its superclass(es).
- The distinction matters,
- when we convert ODL and E/R to relations.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S36: Multiple Inheritance
- Issue:
- A class/E.S. could be a subclass of several classes.
- Ex. Orthogonal classifications
- Problems
- (How should conflicts be resolved?)
- Example:
- manf means grower for wines,
- What does manf mean for "grape beers"?
- E/R mute on multiple inheritance
- ODL leaves it to implementer.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S37: Domain Constraints
- Keys
- = set of attributes
- which uniquesly identifies an entity or object
- E. R. Model
- Syntax: underline all key attributes.
- Each Entity Set must have a key.
- Example: Suppose name is key for Books.
- Book name is also key for Biographies.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S38: Domain Constraints in ODL
- Keys in ODL
- Syntax: key( a list of attrbutes ) clause
- following the class name,
- Alternative keys: Several lists may be used
- Parentheses group members of a key,
- Example:
- (key(a1, a2,...,an)) = "one key consisting of all n attributes."
- (key a1, a2,...,an) = "each ai is a key by itself.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S39: Domain Constraints in ODL
- Example:
interface Books
(key name)
{
attribute string name...
- Example of a Multiattribute Key
interface Transaction
(key (transaction-id) (airline, flight, date, customer))
{
...
- E/R requires exactly one key,
- but ODL allows zero or more keys.
- NOTE: In ODL, no key at all is necessary,
- since "object identity" distinguishes objects.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S40: Exercise
- Revisit the Movie-Star-Studio schema
- Identify keys for each class
- modify the classes to identify the keys
- Refine the schema to add the following information
- Casting for a movie consists of a list
- mapping Stars to the movie characters.
- Ignore credits to non-stars.
- Refine the schema to add the following information
- Stars are classified into Actors and Actresses
- Movies are classified into Comedy, Suspense, Drama.
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S41: Weak Entity Sets
- Weak entity set
- has no natural unique identifier.
- Its key includes the keys of another entity set E2
- Example:
- Dependents of an employees
- Flights of an airline
- E. R Diagram
- a double rectangle around the weak E.S.
- a double diamond around identifying relationship
- NOTE: no "weak class" in ODL.
- Because object-identifiers are unique
- No need to "borrow" keys from related classes.
- Q? Is it consistent with business rules?
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)
S42: Examples: Logins (Email Addresses)
- Login name = user name + host name,
- e.g., ullman@shalmaneser.stanford.edu
- A "login" entity
- corresponds to a user name on a particular host,
- but the password table doesn't record the host,
- just the username, e.g., ullman.
- Key for a login =
- the user name at the host
- unique for that host only
- + the IP address fo the host
Copyright: S. Shekhar, C. S. Dept.,
University of Minnesota, Minneapolis, MN 55455. ,,.(home)