Database design

From Sinfronteras
Revision as of 13:02, 7 September 2024 by Adelo Vieira (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search



Los siguietes archivos corresponden a las clase del Prof. Greg South at CCT. De estas clases se deriva gran parte del contenido de esta sección. Especialmente el contenido relacionado con la creación de Entity relationship diagrams:



Methodology

  • Gather Requirements
  • Conceptual design: Build a model independent of the choice of DBMS
  • Logical design: Create the Relational Model
  • Physical design: (How the database is stored in hardware) Produce MySQL create statements for each relation



Conceptual design


Entity relationship modelling

  • Entity relationship (ER) model is a popular high-level conceptual data model.
  • The main component of the model is the Entity relationship diagram.



Entity relationship diagram - ERD

La siguiente página proporciona una aplicación que permite crear ERD: https://www.draw.io/

The E-R diagram is a simple way of representing the data entities being modelled and the relationships between these data entities. It's Composed of entities, attributes and relationships.

Numerous ERD styles exist. The differences between the styles are only on how the entities and their relationships are illustrated. This module focuses on Chen's ERD style.

  • Entity: an entity is any object in the system that we want to model and store information about. They are objects or items of interest
    • In a University Database we might have: Entity for «Students», «Modules» and «Lecturers».
  • Attributes: an attribute is a property of an entity.
    • Students attributes such as: «StudentID», «Name» and «Course»
  • Relationship: links between entities.



Example 1

Design an ER diagram for such a bookstore:

  • The database must store book, author, publisher and warehouse information.
  • For every book you must capture the title, isbn, year and price information. The isbn value is unique for a book.
  • For every author you must store an id, name, address and the URL of their homepage. Each author can write many books, and each book can have many authors, for example.
  • For every publisher you must store an id, name, address, phone number and an URL of their website.
  • Books are stored at several warehouses, each of which has a code, address and phone number.
  • A book has only one publisher.
  • The warehouse stocks many different books. A book may be stocked at multiple warehouses.
  • The database records the number of copies of a book stocked at various warehouses.

Con la información proporcionada, debemos identificar the entities, attributes and the relationships between entities.

Entities and attributes:

  • book: b_isbn , b_title, b_year, b_price
  • author: au_id, au_name, au_address, au_url
  • publisher: pu_id, pu_name, pu_address, pu_phone, pu_url
  • warehouse: w_code, w_address, w_phone
  • num_copies (The database records the number of copies...) (este atributo no ha sido especificado como parte de un entidad en particular)

Relationships:

  • Each author can «write» many books, and each book can have many authors: El verbo que en este caso describe la relación entre author y book es «write»
  • A book has only one publisher: «publish»
  • Books are «stored» at several warehouses: «stores/stocke»
  • The warehouse «stocks» many different books. A book may be stocked at multiple warehouses: «stores/stocke»
  • The database records the number of copies of a book «stocked» at various warehouses. «stores/stocke»
Entity relationship diagram of the bookstore. File:Er_diagram-bookstore.xml es el correspondiente archivo .xml creado con la aplicación https://www.draw.io/ Se debe notar que las figuras seleccionadas (rectángulo, óvalo, diamante) son estándares usados en este tipo de diagramas y deben, por lo tanto, respetarse. También cabe destacar que las primary key (en este ejemplo b_isbn, au_id, pu_id, w_code) deben subrayarse también en el diagrama. Debido a un error en la exportación del archivo .xml, estos atributos no se encuentran subrayados en la imagen.



Example 2: the Company database example
  • The company is organized into departments. Each department has a unique name, a unique number, and a particular employee who manages the department. We keep track of the start date when that employee began managing the department. A department may have several locations.
  • A department controls a number of projects, each of which has a unique name, a unique number, and a single location.
  • The database will store each employee’s name, Social Security number, address, salary, sex (gender), and birth date. An employee is assigned to one department, but may work on several projects, which are not necessarily controlled by the same department. It is required to keep track of the cur- rent number of hours per week that an employee works on each project, as well as the direct supervisor of each employee (who is another employee).
  • The database will keep track of the dependents of each employee for insurance purposes, including each dependent’s first name, sex, birth date, and relationship to the employee.
The ER conceptual schema diagram for the company database



The Enhanced Entity-Relation Model

  • The basic E-R model, while appropriate for many situations, has its limitations.
  • A number of extensions to the E-R model allow for more advanced modeling:
    • Subclasses, Superclasses and Inheritance
    • Generalisation and specialisation
    • Aggregation
    • Modelling of UNION types using categories
  • These extensions can also be converted to a relational model.



Subclasses - Superclasses - Inheritance

  • Inheritance is when subclass entity shares all the attributes and relationships of its superclass
  • Single inheritance is where a subclass inherits from one and only one superclass. Some modeling tools are limited to single inheritance.
  • Mutiple inheritance is where a subclass inherits from more than one superclass.



Generalisation and specialisation


Generalisation

Generalization is a bottom-up approach in which two lower level entities combine to form a higher level entity. In generalization, the higher level entity can also combine with other lower level entities to make further higher level entity.

It's more like Superclass and Subclass system, but the only difference is the approach, which is bottom-up. Hence, entities are combined to form a more generalised entity, in other words, sub-classes are combined to form a super-class.

  • Example 1: a car (with attributes colour, engine size and price) and truck (with attributes colour, engine size, tonnage and price) can be generalised into vehicle (with attributes colour, engine size and price).
  • Example2: Saving and Current account types entities can be generalised and an entity with name Account can be created, which covers both.
Generalization



Specialisation

Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity can be broken down into two lower level entity. In specialization, a higher level entity may not have any lower-level entity sets, it's possible.

  • Example: Employees. They have attributes in common but also distinct attributes:
    • Secretaries
    • Managers
    • Engineers
Specialization



Constraints on Specialisation and Generalisation


Disjointness Constraint

This specifies whether a supertype can be a member of two or more subtypes. It has two rules:

  • Disjoint Rule: an instance of a supertype may not simultaneously be a member of two (or more) subtypes.
    • This specifies that the subclasses of a specialisation must be disjointed. This means that an entity must be a member of at most one of the specialisation's subclasses.
    • The disjoint rule is specified by adding a 'd' at the junction between the superclass and the subclasses.
An EER diagram with disjoint subclasses. In a company, an employee can either be (for instance) a secretary, a programmer or a manager, but not more than one.


  • Overlap Rule: an instance of a supertype may simultaneously be a member of two (or more) subtypes.
    • If the specialisation is not disjointed, then it is overlapped.
    • An overlapped specialisation allows an entity to be a member of more than one subclass of the specialisation.
    • The overlapped constraint is specified by adding a 'o' at the junction between the superclass and the subclasses.
An EER diagram with overlapping subclasses. In a university, a person can be a student, a member of faculty. They can also be both, for instance, when a student is also a teaching assistant.



Completeness Constraint
  • A total completeness constraint specifies that every entity in the superclass must be a member of some subclass in the specialisation or generalisation.
  • A total completeness constraint is denoted in the EER diagram using a double line.
  • When an entity does not have to belong to any of the subclasses, this is known as partial completeness constraint.
  • A partial completeness constraint is shown in EER diagrams by a single line



Aggregration

Aggregration is a process when relation between two entities is treated as a single entity.

Aggregration

In the diagram above, the relationship between Center and Course together, is acting as an Entity, which is in relationship with another entity Visitor. Now in real world, if a Visitor or a Student visits a Coaching Center, he/she will never enquire about the center only or just about the course, rather he/she will ask enquire about both.



Union type

Partial union: Not every member of the sets of superclasses must participate. In the example on the previous slide, not every social fund or employer has to be a sponsor. This is shown as a single line between the union circle and the subclass (sponsor, in this example).

Total union: Every member of the sets of superclasses must participate. This is shown as a double line between the union circle and the subclass. In the example on the previous slide, if every employer and social fund had to be a sponsor, the line between the union circle and the sponsor entity would be doubled.


An EER diagram with a union subclass. A sponsor paying a student’s fees is either an employer, a government social fund, but not both.



Strong - Weak - Associative Entities

A strong entity can exist independent of any other entity in the schema. It has a primary key and does not have to have a total participation in a relationship. It is represented by a single rectangle.

A weak entity can not exist without a related strong entity. It is dependent on the strong entity for its existence and always has a total participation constraint. It cannot be uniquely identified by its attributes alone and must use a foreign key in conjunction with its attributes to create a primary key. It is represented by a double rectangle.

An associative entity is used to relate two entities as part of a many-to-many relationship, by referencing the primary key of each relation. It is denoted by a dotted rectangle.

Strong, Weak and Associative Entities - Left: A loan payment can not exist without a loan. Right: A student can take many courses. A course can have many students.



Idenfifying and Non-Idenfifying Relationships

In an identifying relationship, a row in a child table depends on a row in a parent table and can not exist without it. For example, a Person can have one or more phone numbers. In order to facilitate this, a second table named PhoneNumbers is needed. The primary key includes the PersonID referencing the primary key in the Person table. An identifying relationship is shown with a double diamond

Idenfifying Relationships


A non-identifying relationship is when the primary key of the parent does not form part of the primary key of the child. For example, in a lookup table for cities, with a foreign key on the Person entity referencing the primary key of the City lookup table. Person is a child table with respect to City. However, a row in Person can not be considered to be identified by its city attribute.

A non-identifying relationship can be either optional or mandatory, where the foreign key column in the child table permits NULLs or not. A non-identifying relationship is shown with a single diamond

Non-Idenfifying Relationships



Mapping an EERD to a Relational Schema

  • For each entity type without subclasses, create a relation and assign it a primary key.
  • For each weak entity type, create a relation. The primary key will be a combination of the identifier of the parent entity and the identifier of the weak entity. In other words, it will be a composite primary key.
  • For a single optional subclass, create a relation for the superclass and give it a primary key, then create a relation for the single subclass, giving it the same primary key as the superclass, and add any attributes specific to the subclass.
  • For multiple disjoint optional subclasses, create a relation for the superclass and give it a primary key. Create a relation for each subclass, giving each one the same primary key as the superclass. Then add any attributes specific to each subclass.
  • For multiple disjoint mandatory subclasses, create a relation for each of the subclasses using the same primary key you would have chosen if you had created a superclass, then add the all the attributes specific to each subclass.
  • For each overlapping optional subclass, create one single relation to represent both the superclass and all its subclasses. Give it a primary key and an additional attribute to specify class membership. This is not recommended if each subclass has many specific attributes, as it will result in numerous null values for specific subclass attributes.
  • For each overlapping mandatory subclass, create a single relation to represent the superclass and all its subclasses. Provide a primary key and logical attributes for each subclass. At least one of these logical attributes should be set to TRUE.



Exercises


Exercise 1

Translate the following EERD to a relational schema. Are there any elements of the design that do not appear in the schema?

Exercise1-Translate the following EERD to a relational schema.png



Exercise 2

Translate the following EERD to a relational schema. Are there any elements of the design that do not appear in the schema?

Exercise2-Translate the following EERD to a relational schema.png



Logical design


Relational Model

Transform the ER Model to Relational Model (logical data model).

Para explicar la tranformación del ER Model en el Relational Model nos vamos a basar en el Example 2: the Company database example

Seven step algorithm to convert the basic ER model constructs into relations:



Step 1: Mapping of regular entity types

  • For each regular (strong) entity type E in the ER schema, create a relation R that includes all the simple attributes of E.
  • Include only the simple component attributes of a composite attribute. E.g. see name attribute in Employee
  • Choose one of the key attributes of E as the primary key for R.

Remember:

  • Strong Entity Type - Entity type that is not dependent on some other entity type.
  • Weak Entity Type - Entity type that is dependent on some other entity type.
Employee
Fname Minit Lname Ssn Bdate Address Sex Salary
Department
Dname Dnumber
Project
Pname Pnumber Plocation

Note: The foreign key, relationship attributes and multivalued attributes, if any, are not included yet; they will be added during subsequent steps.



Step 2: Mapping of weak entity type

For each weak entity type:

  • Create a relation that includes all simple attributes.
  • Include the simple component attributes of any composite attributes.
  • Include as a foreign key, the primary key of the owner entity.
  • The primary key of the new relation will be the combination of this foreign key and the partial key of the weak entity.

Side note on simple/composite attributes (back on er design):

  • It is important to note whether an attribute is simple or composite
  • Composite attributes are made up of simple attributes
  • E.g, the address attribute can be simple and hold all the details of an address as a single value, such as, ’30 Westland St, Glasgow, G11 93Y’
  • However, the address attribute may also represent a composite attribute made up of simple attributes that hold the address details as separate values in the street (30 Westland St’), city (‘Glasgow’), and postcode (‘G11 93Y’)
  • The option to represent address details as a simple or composite attribute is determined by the users’ requirements:
    • If the user doesn’t need to access the separate components of an address, we represent the address attribute as a simple attribute
    • On the other hand if the user does need to access the individual components of an address, we represent the address attribute as being composite, made up of the required simple attributes


In our example, we create the weak entity type DEPENDENT:

Dependent
Essn Dependent_name Sex Bdate Relationship
  • We include the primary key Ssn of the EMPLOYEE relation -which corresponds to the owner entity type- as a foreign key attribute of DEPENDENT; we rename it Essn, although this is not necessary.
  • The primary key of the DEPENDENT relation is the combination {Essn, Dependent_name}, because Dependent_name (also renamed from Name in ER model) is the partial key of DEPENDENT.



Step 3: Mapping of binary 1:1 relationship types

For each binary 1:1 relationship type:

  • Choose one of the relations and include the other’s PK in it as a FK.
  • Include any simple attributes of the relationship (relationship attributes)
  • In our example, we map the 1:1 relationship type MANAGES from our ER diagram by choosing the participating entity type DEPARTMENT to serve in the role of foreign key because its participation in the MANAGES relationship type is total (every department has a manager).
    • The participation in the relationship is full on the Department side and partial on the Employee side. For this reason, IT IS BETTER to insert the PK of Employee as a FK in the Department Relation.
      • Note: EMPLOYEE participation is partial. DEPARTMENT participation is not clear from the requirements. We question the users, who say that a department must have a manager at all times, which implies total participation. So, every department has a manager
  • We include the primary key of the EMPLOYEE relation as foreign key in the DEPARTMENT relation and rename it to Mgr_ssn.
  • We also include the simple attribute Start_date of the MANAGES relationship type in the DEPARTMENT relation and rename it Mgr_start_date
Department
Dname Dnumber Mgr_ssn Mgr_start_date

FURTHER NOTE:

  • Note that it is possible to include the primary key of department as a foreign key in Employee table instead.
  • In our example, this amounts to having a foreign key attribute, say Department_managed in the EMPLOYEE relation, but it will have a NULL value for employee tuples who do not manage a department.
  • This would be a bad choice, because if only 2% of employees manage a department, then 98% of the foreign keys would be NULL in this case.



Step 4: Mapping of binary 1 to N relationship types

  • For each binary 1 to N relationship in the ER diagram, include, as a foreign key, in the entity on the N side the primary key of the 1 side.
  • Include simple attributes of 1:N relationship type as attributes of the relation on the N side.

In our example:

  • For WORKS_FOR we include the primary key Dnumber of the DEPARTMENT relation as foreign key in the EMPLOYEE relation and call it Dno
  • For SUPERVISION we include the primary key of the EMPLOYEE relation as foreign key in the EMPLOYEE relation itself -because the relationship is recursive- and call it Super_ssn.
Employee
Fname Minit Lname Ssn Bdate Address Sex Salary Super_ssn Dno


  • The CONTROLS relationship is mapped to the foreign key attribute Dnum of PROJECT, which references the primary key Dnumber of the DEPARTMENT relation.
Project
Pname Pnumber Plocation Dnum



Step 5: Mapping of binary m:n relationship types

  • For each binary M:N relationship type, create a new relation:
    • Include in the new relation as foreign keys, the primary keys of the two participating entities.
    • The primary key of the new relation will be the combination of these two foreign keys.
    • Include any simple attributes of M:N relationship type.
  • In our example, we map the M:N relationship type WORKS_ON from our ERD, by creating the relation WORKS_ON:
    • We include the primary keys of the PROJECT and EMPLOYEE relations as foreign keys in WORKS_ON and rename them Pno and Essn, respectively (renaming is not required; it is a design choice).
    • We also include an attribute Hours in WORKS_ON to represent the Hours attribute of the relationship type.
    • The primary key of the WORKS_ON relation is the combination of the foreign key attributes {Essn, Pno}.
Works_on
Essn Pno Hours



Step 6: Mapping of multivalued attributes

  • For each multivalued attribute A, create a new relation R:
    • This relation R will include an attribute corresponding to A, plus the primary key attribute K -as a foreign key in R- of the relation that represents the entity type OR relationship type that has A as a multivalued attribute.
    • The primary key of R is the combination of A and K. If the multivalued attribute is composite, we include its simple components.
  • In our example, we create a relation DEPT_LOCATIONS:
    • The attribute Dlocation represents the multivalued attribute LOCATIONS of DEPARTMENT, whereas Dnumber -as foreign key- represents the primary key of the DEPARTMENT relation.
    • The primary key of DEPT_LOCATIONS is the combination of {Dnumber, Dlocation}.
Dept_locations
Dnumber Dlocation



Step 7: Mapping of n-ary relationship types

  • For each n-ary relationship type R, where n > 2 (relationship que envuelven más de dos entities) create a new relationship relation S to represent R:
    • Include as foreign key attributes in S the primary keys of the relations that represent the participating entity types.
    • Also include any simple attributes of the n-ary relationship type
  • Este caso no se encuentra en nuestro ejemplo (Company ER). Considere la siguiente relación:
N-ary relationship type (n>2)
Offers
course_no instr_no semestre_no room_no



Resultado

Relationship mapping for the company example
Relational database schema for the company example



Discussion and summary

ER Model Relational model
Entity type Entity relation
1:1 or 1:N relationship type Foreign key (or relationship relation)
M:N relationship type Relationship relation and tow foreign keys
n-ary relationship type Relationship relation and n foreign keys
Simple attribute Attribute
Composite attribute Set of simple component attributes
Multivalued attribute Relation and foreign key
Value set Domain
Key attribute Primary (or secondary) key
  • One of the main points to note in a relational schema, in contrast to an ER schema, is that relationship types are not represented explicitly; instead, they are represented by having two attributes A and B, one a primary key and the other a foreign key (over the same domain) included in two relations S and T.
  • Two tuples in S and T are related when they have the same value for A and B.



Normalisation

Validate the Relational Model using Normalisation.

Normalization is a technique for producing a suitable set of relations that support the data requirements of an nterprise.



A suitable set of relations include

  • The minimal number of attributes necessary to support the data requirements of the enterprise;
  • Attributes with a close logical relationship are found in the same relation;
  • Minimal redundancy with each attribute represented only once with the important exception of attributes that form all or part of foreign keys. (these are essential for the joining of related relations)

Potential benefits of using a database that has a suitable set of relations is that the database will be:

  • Easier for the user to access and maintain the data:
    • If we minimize data redundancy:
      • Updates to the data stored in the database are achieved with a minimal number of operations thus reducing the opportunities for data inconsistencies.
      • Take up minimal storage space on the computer - Reduction in the file storage space required by the relations thus minimizing costs.



We highlight two main approaches for using normalization Approach

  • Bottom-up approach: Bottom-up standalone database design technique Approach
  • Top-down approach: Validation technique which may have been created using a top-down approach such as ER modeling

No matter which approach is used, the goal is the same – creating a set of well-designed relations that meet the data requirements of the enterprise

Bottom-up vs Top down:

  • Bottom-up design methodology (also called design by synthesis) considers the basic relationships among individual attributes as the starting point and uses those to construct relation schemas. This approach is not very popular in practice because it suffers from the problem of having to collect a large number of binary relationships among attributes as the starting point. For practical situations, it is next to impossible to capture binary relationships among all such pairs of attributes.
  • In contrast, a top-down design methodology (also called design by analysis) starts with a number of groupings of attributes into relations that exist together naturally, for example, on an invoice, a form, or a report. The relations are then analyzed individually and collectively, leading to further decomposition until all desirable properties are met.



Informal design guidelines for relation schemas

Before discussing the formal theory of relational database design, we discuss informal guidelines that may be used as measures to determine the quality of relation schema design:

  • Making sure that the semantics of the attributes is clear in the schema
  • Reducing the redundant information in tuples



Imparting clear semantics to attributes in relations
  • Guideline 1: Design a relation schema so that it is easy to explain its meaning.
    • Do not combine attributes from multiple entity types and relationship types into a single relation.


  • Whenever we group attributes to form a relation schema, we assume that attributes belonging to one relation have certain real-world meaning and a proper interpretation associated with them.
  • In general, the easier it is to explain the semantics of the relation -or in other words, what a relation exactly means and stands for- the better the relation schema design will be.
  • To illustrate this, consider Figure 1, a simplified version of the COMPANY relational database schema and Figure 2, which presents an example of populated relation states of this schema.
A simplified relational database schema for the company example
  • The meaning of the EMPLOYEE relation schema is simple: Each tuple represents an employee, with values for the employee’s name (Ename), Social Security number (Ssn),... and the number of the department that the employee works for (Dnumber).
  • The Dnumber attribute is a foreign key that represents an implicit relationship between EMPLOYEE and DEPARTMENT.

Examples of violating guideline 1

  • The tuple in the EMP_DEPT relation schema represents a single employee but includes, along with the Dnumber (the identifier for the department he/she works for), additional information-namely, the name (Dname) of the department for which the employee works and the Social Security number (Dmgr_ssn) of the department manager.
    • Esta tuple contiene información de dos entidades distintas (Employee and Department). It violates

Guideline 1 by mixing attributes from distinct real-world entities. Esto no es apropiado...

EMP_DEPT
Ename Ssn Bdate Address Dnumber Dname Dmgr_ssn



Data redundancy
  • Major aim of relational database design is to group attributes into relations to minimize data redundancy.

Grouping attributes into relation schemas has a significant effect on storage space.

For example, compare the space used by the two relations EMPLOYEE and DEPARTMENT with that for an EMP_DEPT base relation (which is the result of applying the NATURAL JOIN operation to EMPLOYEE and DEPARTMENT).

  • In EMP_DEPT, the attribute values pertaining to a particular department (Dnumber, Dname, Dmgr_ssn) are repeated for every employee who works for that department. In contrast, each department’s information appears only once in the DEPARTMENT relation in Figure 2. Only the department number (Dnumber) is repeated in the EMPLOYEE relation for each employee who works in that department as a foreign key.
Employee and department relations of the Company example
Relation schemas suffering from redundant anomalies

An anomaly is anything we try to do with a database that leads to unexpected and/or unpredictable results.

These anomalies can be classified into:

  • Insertion anomalies,
  • Deletion anomalies,
  • and Modification (Update) anomalies.

Insertion anomalies

  • To insert a new employee tuple into EMP_DEPT, we must include either the attribute values for the department that the employee works for, or NULLs (if the employee does not work for a department as yet).
  • For example, to insert a new tuple for an employee who works in department number 5, we must enter all the attribute values of department 5 correctly so that they are consistent with the corresponding values for department 5 in other tuples in EMP_DEPT.
  • In the EMPLOYEE and DEPARTMENT design, we do not have to worry about this consistency problem because we enter only the department number in the employee tuple; all other attribute values of department 5 are recorded only once in the database, as a single tuple in the DEPARTMENT relation


  • It is difficult to insert a new department that has no employees as yet in the EMP_DEPT relation. The only way to do this is to place NULL values in the attributes for employee. This violates the entity integrity for EMP_DEPT because its primary key Ssn cannot be null.
  • Moreover, when the first employee is assigned to that department, we do not need this tuple with NULL values anymore. This problem does not occur in the EMPLOYEE and DEPARTMENT design because a department is entered in the DEPARTMENT relation whether or not any employees work for it, and whenever an employee is assigned to that department, a corresponding tuple is inserted in EMPLOYEE.

Deletion anomalies

  • If we delete from EMP_DEPT an employee tuple that happens to represent the last employee working for a particular department, the information concerning that department is lost inadvertently from the database.
  • This problem does not occur in the EMPLOYEE and DEPARTMENT design because DEPARTMENT tuples are stored separately.

Modification (Update) anomalies

  • In EMP_DEPT, if we change the value of one of the attributes of a particular department -say, the manager of department 5- we must update the tuples of all employees who work in that department; otherwise, the database will become inconsistent.
  • If we fail to update some tuples, the same department will be shown to have two different values for manager in different employee tuples, which would be wrong


Storing natural joins of relations leads to an additional problem referred to as Update anomalies.



Normalisation

Ver esta página: https://www.1keydata.com/database-normalization/first-normal-form-1nf.php

pag. 8, 14

FUNCTIONAL DEPENDENCIES Previously, we have dealt with the informal measures of database design. We now introduce a formal tool for analysis of relational schemas that enables us to detect and describe some of the above-mentioned problems in precise terms

The single most important concept in relational schema design theory is that of a functional dependency. Functional dependency describes the relationship between attributes.

For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A --> B), if each value of A in R is associated with exactly one value of B in R.

An alternative way to describe the relationship between attributes A and B is to say that "A functionally determines B"

When we say: A --> B:

  • A and B may each consist of one or more attributes.
  • The determinant of a functional dependency refers to the attribute or group of attributes on the left-hand side of the arrow. (For example, A is the determinant of B)

Example functional dependency that holds: Consider the values shown in staffNo and sName attributes of the Staff relation. Based on sample data, the following functional dependencies appear to hold:

  • staffNo --> sName
  • sName --> staffNo

However, if the values shown in the Staff relation simply represent a set of values at a given moment in time, then we are not so interested in such relationships between attributes. The reason is that we want to identify functional dependencies that hold for all possible values for attributes of a relation as these represent the types of integrity constraints that we need to identify.

If we know the staff number(staffNo) of a member of staff, we can determine the name of the member of staff (sName). However, it is possible for the sName attribute to hold duplicate values for members of staff with the same name, then we would not be able to determine the staff number of some members of staff.

The only functional dependency that remains true for all possible values for the staffNo and sName attributes of the Staff relation is:

  • staffNo --> sName
  • staffNo functionally determines sName

Characteristics of functional dependencies:

  • Determinants should have the minimal number of attributes necessary to maintain the functional dependency with the attribute(s) on the right hand-side.
  • This requirement is called full functional dependency.
  • So that if A and B are attributes of a relation, B is functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A. [[[WHATTT!!!!]]]

Example partial and full functional dependency:

  • Consider the following functional dependency that exists in the Staff relation:
    • staffNo, sName --> branchNo

It is correct to say that each value of (staffNo, sName) is associated with a single value of branchNo. However, it is NOT a full functional dependency, because branchNo is also functionally dependent on as subset of (staffNo, sName), namely staffNo.

In other words the functional dependency shown is an example of partial dependency.

Type of functional dependency that we are interested in identifying is a full functional dependency:

  • staffNo→ branchNo


Transitive dependency: So far we have discussed functional dependencies that we are interested in for the purposes of normalization.

However, there is an additional type of functional dependency called a transitive dependency that we need to recognize because its existence in a relation can potentially cause update anomalies.

Transitive dependency describes a condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is transitively dependent on A via B (provided that A is not functionally dependent on B or C).

Example of transitive functional dependency:

  • Consider functional dependencies in the StaffBranch relation
    • staffNo --> sName, position, salary, branchNo, bAddress
    • branchNo --> bAddress
    • Transitive dependency, branchNo --> bAddress exists on staffNo via branchNo
      • In other words, the StaffNo attribute functionally determines the bAddress via the branchNo attribute and neither branchNo or bAddress functionally determines staffNo

Remember: In StaffBranch relation there is redundant data, the details of a branch are repeated for every member of staff located at a branch.

Normalization of relations:

  • The normalization process, as first proposed by Codd, takes a relation schema through a series of tests to certify whether it satisfies a certain normal form.
  • The process, which proceeds in a top-down fashion by evaluating each relation against the criteria for normal forms and decomposing relations as necessary, can thus be considered as relational design by analysis.
  • Initially, Codd proposed three normal forms, which he called first (1NF), second (2NF), and third normal form (3NF).

Why normalize: Normalization of data can be considered a process of analyzing the given relation schemas based on their functional dependencies and primary keys to achieve the desirable properties of:

  • Minimizing redundancy and
  • minimizing the insertion, deletion, and update anomalies discussed

Normalization is often executed as a series of steps. Each step corresponds to a specific normal form.

Normalization approach: 1st Normal Form:

  • Each table contains all atomic data items(no repeating values), no repeating groups, and a designated primary key (no duplicated rows)

2nd Normal Form (composite keys):

  • In 1NF
  • Includes no partial dependencies
    • No attribute dependent on a portion of primary key

3rd Normal Form:

  • In 2NF
  • Contains no transitive dependencies
    • Attributes are functionally dependent on a non-key attributes (no non-key values based on other non-key values)

For a relational data model, it is important to recognize that it is only First Normal Form (1NF) that is critical in creating relations; all subsequent normal forms are optional. However, to avoid the update anomalies discussed previously, it is generally recommended that we proceed to at least Third Normal Form (3NF)


1NF - FIRST NORMAL FORM: A database is in first normal form if it satisfies the following conditions:

  • Contains only atomic values:
    • An atomic value is a value that cannot be divided. For example, in the table shown below, the values in the [Color] column in the first row can be divided into "red" and "green", hence [TABLE_PRODUCT] is not in 1NF.
  • There are no repeating groups:
    • A repeating group means that a table contains two or more columns that are closely related. For example, a table that records data on a book and its author(s) with the following columns: [Book ID], [Author 1], [Author 2], [Author 3] is not in 1NF because [Author 1], [Author 2], and [Author 3] are all repeating the same attribute.

2NF – SECOND NORMAL FORM: