3.1 Importance of a good schema design | unit 3 database design

Unit-3

Database Design

3.1 Importance of a good schema design

A database includes bulk information deposited in a framework, making it easier to locate and explore relevant information. A well-designed database contains accurate and up-to-date information, allowing and data to be fetched easily whenever needed.

In this article, we’ll discuss the importance of a good database design and how it can benefit your data endeavors. We’ll also go through some of the most common database design techniques. Lastly, we’ll cover basic steps and best practices to help you design a good database for your organization.

What is Database Design?

Database Design is defined as a collection of steps that help with designing, creating, implementing, and maintaining a business’s data management systems. The main purpose of designing a database is to produce physical and logical models of designs for the proposed database system.

What is a Good Database Design?

A good database design process is governed by specific rules. The first rule dictates that redundant data must be avoided; as it wastes space and increases the probability of faults and discrepancies within the database. The next rule is that the accuracy and comprehensiveness of information is extremely imperative. If the database contains erroneous information, any documents that fetch data from such a database will also include inaccurate information. Consequently, any decisions based on those documents will be misleading.

So, how can you ensure that your database design is good? A well-designed database is the one that:

Distributes your data into tables based on specific subject areas to decrease data redundancy

Delivers the database the information needed to link the data in the tables

Provides support, and guarantees the precision and reliability of data

Caters to your information processing and reporting requirements

Functions interactively with the database operators as much as possible

ETL Your Enterprise Data Using MYSQL Connector

Move MySQL Data to a Destination of Your Choice Through a Codeless, Drag-and-Drop UI.

Importance of Database Design

Database design defines the database structure used for planning, storing, and managing information. Accuracy in data can only be accomplished if a database is designed to store only valuable and necessary information.

A well-designed database is imperative in guaranteeing information consistency, eliminating redundant data, efficiently executing queries, and improving the performance of the database. Meticulously designing a database saves you from wasting time and getting frustrated during the database development phase. A good database design also allows you to easily access and retrieve data whenever needed.

The reliability of data depends on the table structure; whereas creating primary and unique keys guarantees uniformity in the stored information. Data replication can be avoided by forming a table of probable values and using a key to denote the value. So, whenever the value changes, the alteration happens only once in the main table.

As the general performance of a database depends on its design, a good database design uses simple queries and faster implementation. It is easy to maintain and update; whereas fixing trivial interruptions in a poor database design may harm stored events, views, and utilities.

Database Development Life Cycle

There are various stages in database development. However, it is not necessary to follow each of the steps sequentially. The life cycle can be broadly divided into three steps: requirement analysis, database designing, and implementation.

1- Requirement Analysis

Requirement analysis requires two steps:

Planning: In this stage, the plan of the entire Database Development Life Cycle is decided. It also requires an analysis of the organization’s information systems strategy.

Defining the system: This stage explains and lays out the proposed database system’s scope.

2- Database designing

The actual database designing takes into account two key models:

Logical model: It is concerned with using the given requirements to create a database model. The complete design is laid out on paper at this stage, without considering any specific DBMS requirement or physically implementing it.

Physical model: This stage comes after the logical model and therefore involves physically implementing the logical model. It takes the DBMS and other physical implementation factors into consideration.

3- Implementation

The implementation stage of database development life cycle is concerned with:

Data conversion and loading: It comprises data importation and data conversion coming from the old system into the new database.

Testing: Finally, this stage identifies errors in the new system and make sure if all the database requirement specification are met.

Database Designing Techniques

The two most common techniques used to design a database include:

Normalization: Tables are organized in such a way that it decreases data redundancy and dependency. Larger tables are divided into smaller tables and are linked together using relationships.

Entity-Relationship (ER) Modeling: It’s a graphical database design approach that models entities, their attributes, and defines relationships among these entities to signify real-life objects. An entity is any real-world item that’s different or unique from the surroundings.

Implement Virtual Database

Implement Virtual Database Tools to Provide Business-Users the Accessibility and Flexibility They Requires to Analyze Data.

How to Design Database: Steps of Designing Database

Database designing generally starts with identifying the purpose of your database. The relevant data is then collected and organized into tables. Next, you specify the primary keys and analyze relationships between different tables. After refining the tables, the last step is to apply normalization rules for table standardization.

Let’s look at these steps of database design in detail:

Define the objective of your database

The first step is to determine the purpose of your database. For example, if you are a small home-based business, you could be designing a customer database that maintains a list of consumer info to generate emails and reports.

At the end of this step, you’ll have a strong mission statement that you can refer to throughout the database design process. It’ll help you concentrate on your objectives when making important decisions.

Locate and consolidate the necessary data

The next step is to collect all kinds of information that you might want to store in the database. Begin with the current information. Mull over the questions you want your database to answer, and it’ll help you decide which data needs to be recorded.

Distribute the data into tables

Once you’ve amassed all the necessary data items, the next step is to divide them into main entities or subject areas. For example, if you are a retailer, some of your main entities could be products, customers, suppliers, and orders. Each entity will then become a separate table.

Here’s an example of how you can divide data into different entities:

Change data items into columns

Data is segregated into tables, such that every data item becomes a field and is shown as a column. For instance, a Customer table might include fields like name, address, e-mail address, and city.

After determining the preliminary set of columns for every table, you can refine them. For instance, customer name can be recorded as two distinct columns: first name and last name. Likewise, you can store the address in five distinct columns based on address, town, state, zip code, and region. This will make it convenient for you to filter information.

Streamline Cross-Database Analysis

Access the MariaDB Database and Create a Holistic View of Your Data Through Codeless Integration.

Identify primary keys

The next step to improve your database design is to select a primary key for every table. This primary key is a column or a set of column that’s used to distinctively pinpoint each row. For instance, in your customer table, the primary key could be customer ID. This will allow you to uniquely identify each row based on the customer ID.

More than one primary key can also exist, called a composite key, including multiple columns. For example, in your Order Details table, primary keys could be order ID and product ID. The composite key can be made using fields with similar or varying data types.

Similarly, if you wish to get an idea of your product sales, you can identify the product ID from the Products table and the order number or ID from the Orders table.

Determine how tables are related

After dividing data into tables, information needs to be brought together in a meaningful manner. So, explore each table and determine how the data in one table is linked with the data in another table. If needed, you can add fields or form new tables to simplify the relationship based on the types of information.

Below is an example of different entity types and relation types.

All You Need to Know About Database Design

Fig 1 - Example of different entity types and relation types

In this step, you’ll create one-to-one, one-to-many, and/or many-to-many relationships between different table entries.

When only one item from a table is associated with an item from another table, it’s called a one-to-one (1:1) relationship. In a one-to-many (1:M) relationship, an item in one table is related to many items in the other table, such as one customer placing several orders. A many-to-many (M:N) relationship occurs if many items from one table are related to many items in other tables.

All You Need to Know About Database Design1

Fig 2 - Example

Enhance your database design

Now that you have all the required tables, fields, and relationships, the next step is to refine your database design by creating and populating your tables with mockup information. Experiment with the sample data by creating queries or adding new items. This will help you analyze your design for faults and you’ll be able to highlight possible errors. If needed, adjust your design to mitigate those problems.

Implement the normalization rules

The last step is to implement the normalization rules for your database design. It is a systematic approach that removes redundancy and unwanted characteristics, such as Insertion, Update, and Deletion irregularities.

This multi-step process stores data in a tabular form, eliminating redundant data from the relation tables.

Final Words

The database design process helps you simplify the design, development, execution, and maintenance of your corporate data management system.

A good database design can help save disk storage space by reducing data redundancy. Along with maintaining data precision and reliability, it allows you to access data in various ways. Moreover, a well-designed database is easier to use and maintain, making integration a breeze.

Key takeaway

3.2 Motivation for normal forms

Functional Dependency

Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two tuples must have to have same values for attributes B1, B2, ..., Bn.

Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The left-hand side attributes determine the values of attributes on the right-hand side.

Armstrong's Axioms

If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all functional dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when applied repeatedly, generates a closure of functional dependencies.

Reflexive rule− If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds beta.

Augmentation rule− If a → b holds and y is attribute set, then ay → by also holds. That is adding attributes in dependencies, does not change the basic dependencies.

Transitivity rule−Same as transitive rule in algebra, if a → b holds and b → c holds, then a → c also holds. a→ b is called as a functionally that determines b.

Trivial Functional Dependency

Trivial− If a functional dependency (FD) X → Y holds, where Y is a subset of X, then it is called a trivial FD. Trivial FDs always hold.

Non-trivial− If an FD X → Y holds, where Y is not a subset of X, then it is called a non-trivial FD.

Completely non-trivial− If an FD X → Y holds, where x intersect Y = Φ, it is said to be a completely non-trivial FD.

Normalization

If a database design is not perfect, it may contain anomalies, which are like a bad dream for any database administrator. Managing a database with anomalies is next to impossible.

Update anomalies− If data items are scattered and are not linked to each other properly, then it could lead to strange situations. For example, when we try to update one data item having its copies scattered over several places, a few instances get updated properly while a few others are left with old values. Such instances leave the database in an inconsistent state.

Deletion anomalies−We tried to delete a record, but parts of it was left undeleted because of unawareness, the data is also saved somewhere else.

Insert anomalies−We tried to insert data in a record that does not exist at all.

Normalization is a method to remove all these anomalies and bring the database to a consistent state.

First Normal Form

First Normal Form is defined in the definition of relations (tables) itself. This rule defines that all the attributes in a relation must have atomic domains. The values in an atomic domain are indivisible units.

unorganized relation

Fig 3 – 1NF

We re-arrange the relation (table) as below, to convert it to First Normal Form.

Relation in 1NF

Fig 4 - Re-arrange the relation 1NF

Each attribute must contain only a single value from its pre-defined domain.

Second Normal Form

Before we learn about the second normal form, we need to understand the following −

Prime attribute−An attribute, which is a part of the candidate-key, is known as a prime attribute.

Non-prime attribute−An attribute, which is not a part of the prime-key, is said to be a non-prime attribute.

If we follow second normal form, then every non-prime attribute should be fully functionally dependent on prime key attribute. That is, if X →A holds, then there should not be any proper subset Y of X, for which Y → A also holds true.

Relation not in 2NF

Fig 5 – 2 NF

We see here in Student_Project relation that the prime key attributes are Stu_ID and Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent upon both and not on any of the prime key attribute individually. But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID independently. This is called partial dependency, which is not allowed in Second Normal Form.

Relation in 2NF

Fig 6 - Partial dependency

We broke the relation in two as depicted in the above picture. So there exists no partial dependency.

Third Normal Form

For a relation to be in Third Normal Form, it must be in Second Normal form and the following must satisfy −

No non-prime attribute is transitively dependent on prime key attribute.

For any non-trivial functional dependency, X → A, then either −

X is a superkey or,
A is prime attribute.

Relation not in 3NF

Fig 7 – 3 NF

We find that in the above Student_detail relation, Stu_ID is the key and only prime key attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is City a prime attribute. Additionally, Stu_ID→ Zip → City, so there existstransitive dependency.

To bring this relation into third normal form, we break the relation into two relations as follows −

Relation in 3NF

Boyce-Codd Normal Form

Boyce-Codd Normal Form (BCNF) is an extension of Third Normal Form on strict terms. BCNF states that −

For any non-trivial functional dependency, X → A, X must be a super-key.

In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the super-key in the relation ZipCodes. So,

Stu_ID→Stu_Name, Zip

and

Zip → City

Which confirms that both the relations are in BCNF.

Key takeaway

Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally determines Y. The left-hand side attributes determine the values of attributes on the right-hand side.

3.3 Dependency theory - functional dependencies

What is Functional Dependency?

Functional Dependency (FD) is a constraint that determines the relation of one attribute to another attribute in a Database Management System (DBMS). Functional Dependency helps to maintain the quality of data in the database. It plays a vital role to find the difference between good and bad database design.

A functional dependency is denoted by an arrow "→". The functional dependency of X on Y is represented by X → Y. Let's understand Functional Dependency in DBMS with example.

Example:

Employee number	Employee Name	Salary	City
1	Dana	50000	San Francisco
2	Francis	38000	London
3	Andrew	25000	Tokyo

In this example, if we know the value of Employee number, we can obtain Employee Name, city, salary, etc. By this, we can say that the city, Employee Name, and salary are functionally depended on Employee number.

Key terms

Here, are some key terms for Functional Dependency in Database:

Key Terms	Description
Axiom	Axioms is a set of inference rules used to infer all the functional dependencies on a relational database.
Decomposition	It is a rule that suggests if you have a table that appears to contain two entities which are determined by the same primary key then you should consider breaking them up into two different tables.
Dependent	It is displayed on theright side of the functional dependency diagram.
Determinant	It is displayed on the left side of the functional dependency Diagram.
Union	It suggests that if two tables are separate, and the PK is the same, you should consider putting them. together

Rules of Functional Dependencies

Below are the Three most important rules for Functional Dependency in Database:

Reflexive rule –. If X is a set of attributes and Y is_subset_of X, then X holds a value of Y.

Augmentation rule: When x -> y holds, and c is attribute set, then ac ->bc also holds. That is adding attributes which do not change the basic dependencies.

Transitivity rule: This rule is very much similar to the transitive rule in algebra if x -> y holds and y -> z holds, then x -> z also holds. X -> y is called as functionally that determines y.

Types of Functional Dependencies in DBMS

There are mainly four types of Functional Dependency in DBMS. Following are the types of Functional Dependencies in DBMS:

Multivalued Dependency

Trivial Functional Dependency

Non-Trivial Functional Dependency

Transitive Dependency

Multivalued Dependency in DBMS

Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes in a single table. A multivalued dependency is a complete constraint between two sets of attributes in a relation. It requires that certain tuples be present in a relation. Consider the following Multivalued Dependency Example to understand.

Example:

Car_model	Maf_year	Color
H001	2017	Metallic
H001	2017	Green
H005	2018	Metallic
H005	2018	Blue
H010	2015	Metallic
H033	2012	Gray

In this example, maf_year and color are independent of each other but dependent on car_model. In this example, these two columns are said to be multivalue dependent on car_model.

This dependence can be represented like this:

car_model ->maf_year

car_model-> colour

Trivial Functional Dependency in DBMS

The Trivial dependency is a set of attributes which are called a trivial if the set of attributes are included in that attribute.

So, X -> Y is a trivial functional dependency if Y is a subset of X. Let's understand with a Trivial Functional Dependency Example.

For example:

Emp_id	Emp_name
AS555	Harry
AS811	George
AS999	Kevin

Consider this table of with two columns Emp_id and Emp_name.

{Emp_id, Emp_name} ->Emp_id is a trivial functional dependency as Emp_id is a subset of {Emp_id,Emp_name}.

Non Trivial Functional Dependency in DBMS

Functional dependency which also known as a nontrivial dependency occurs when A->B holds true where B is not a subset of A. In a relationship, if attribute B is not a subset of attribute A, then it is considered as a non-trivial dependency.

Company	CEO	Age
Microsoft	Satya Nadella	51
Google	SundarPichai	46
Apple	Tim Cook	57

Example:

(Company} -> {CEO} (if we know the Company, we knows the CEO name)

But CEO is not a subset of Company, and hence it's non-trivial functional dependency.

Transitive Dependency in DBMS

A Transitive Dependency is a type of functional dependency which happens when t is indirectly formed by two functional dependencies. Let's understand with the following Transitive Dependency Example.

Example:

Company	CEO	Age
Microsoft	Satya Nadella	51
Google	SundarPichai	46
Alibaba	Jack Ma	54

{Company} -> {CEO} (if we know the compay, we know its CEO's name)

{CEO } -> {Age} If we know the CEO, we know the Age

Therefore according to the rule of rule of transitive dependency:

{ Company} -> {Age} should hold, that makes sense because if we know the company name, we can know his age.

Note: You need to remember that transitive dependency can only occur in a relation of three or more attributes.

What is Normalization?

Normalization is a method of organizing the data in the database which helps you to avoid data redundancy, insertion, update& deletion anomaly. It is a process of analyzing the relation schemas based on their different functional dependencies and primary key.

Normalization is inherent to relational database theory. It may have the effect of duplicating the same data within the database which may result in the creation of additional tables.

Advantages of Functional Dependency

Functional Dependency avoids data redundancy. Therefore same data do not repeat at multiple locations in that database

It helps you to maintain the quality of data in the database

It helps you to defined meanings and constraints of databases

It helps you to identify bad designs

It helps you to find the facts regarding the database design

Summary

Functional Dependency is when one attribute determines another attribute in a DBMS system.

Axiom, Decomposition, Dependent, Determinant, Union are key terms for functional dependency

Four types of functional dependency are 1) Multivalued 2) Trivial 3) Non-trivial 4) Transitive

Multivalued dependency occurs in the situation where there are multiple independent multivalued attributes in a single table

The Trivial dependency occurs when a set of attributes which are called a trivial if the set of attributes are included in that attribute

Nontrivial dependency occurs when A->B holds true where B is not a subset of A

A transitive is a type of functional dependency which happens when it is indirectly formed by two functional dependencies

Normalization is a method of organizing the data in the database which helps you to avoid data redundancy

Key takeaway

A functional dependency is denoted by an arrow "→". The functional dependency of X on Y is represented by X → Y. Let's understand Functional Dependency in DBMS with example.

3.4 Closure of a set of FD's

Closure of an Attribute: Closure of an Attribute can be defined as a set of attributes that can be functionally determined from it.

Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F

Closure of a set of attributes X concerning F is the set X+ of all attributes that are functionally determined by X

Pseudocode to find Closure of an Attribute?

Determine X+, the closure of X under functional dependency set F

X Closure : = will contain X itself;

Repeat the process as:

old X Closure : = X Closure;

for each functional dependency P → Q in FD set do

if X Closure is subset of P then X Closure := X Closure U Q ;

Repeat until ( X Closure = old X Closure);

Algorithm of Determining X+, the Closure of X under F

Input: A set F of FDs on a relation schema R, and a set of attributes X, which is a subset of R.

X+ := X;

repeat

oldX+ := X+ ;

for each functional dependency Y → Z in F do

if X+ ⊇ Y then X+ := X+ ∪ Z;

until (X+ = oldX+ );

QUESTIONS ON CLOSURE SET OF ATTRIBUTE:

1) Given relational schema R( P Q R S T U V) having following attribute P Q R S T U and V, also there is a set of functional dependency denoted by FD = { P->Q, QR->ST, PTV->V }.

Determine Closure of (QR)+ and (PR)+

a) QR+ = QR (as the closure of an attribute or set of attributes contain same).

Now as per algorithm look into a set of FD that complete the left side of any FD contains either Q, R, or QR since in FD QR→ST has complete QR.

Hence QR+ = QRST

Again, trace the remaining two FD that any left part of FD contains any Q, R, S, T.

Since no complete left side of the remaining two FD{P->Q, PTV->V} contain Q, R, S, T.

Therefore QR+ = QRST (Answer)

Note: In FD PTV→V, T is in QRST but that cannot be entertained, as complete PTV should be a subset of QRST

b) PR + = PR (as the closure of an attribute or set of attributes contain same)

Now as per algorithm look into a set of FD, and check that complete left side of any FD contains either P, R, or PR. Since in FD P→Q, P is a subset of PR, Hence PR+ = PRQ

Again, trace the remaining two FD that any left part of FD contains any P, R, Q, Since, in FD QR → ST has its complete left part QR in PQR

Hence PR+ = PRQST

Again trace the remaining one FD { PTV->V } that its complete left belongs to PRQST. Since complete PTV is not in PRQST, hence we ignore it.

Therefore PR+ = PRQST ( Answer)

2. Given relational schema R( P Q R S T) having following attributes P Q R S and T, also there is a set of functional dependency denoted by FD = { P->QR, RS->T, Q->S, T-> P }.

Determine Closure of ( T )+

T + = T (as the closure of an attribute or set of attributes contain same) Now as per algorithm look into a set of FD that complete the left side of any FD contains T since, in FD T → P, T is in T, Hence T+ = TP Again trace the remaining three FD that any left part of FD contain any TP, Since in FD P → QR has its complete left part P in TP, Hence T+ = TPQR Again trace the remaining two FD { RS->T, Q->S } that any of its Complete left belongs to TPQR, Since in FD Q → S has its complete left part Q in TPQR, Hence T+ = TPQRS Again trace the remaining one FD { RS->T } that its complete left belongs to TPQRS, Since in FD RS → T has its complete left part RS in TPQRS Hence T+ = TPQRS ( no changes, as T, is already in TPQRS) Therefore T+ = TPQRS ( Answer).

Key takeaway

Closure of an Attribute: Closure of an Attribute can be defined as a set of attributes that can be functionally determined from it.

Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F

Closure of a set of attributes X concerning F is the set X+ of all attributes that are functionally determined by X

3.5 Definitions of 1NF, 2NF, 3NF and BCNF

Normalization

Normalization is the process of organizing the data in the database.

Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.

Normalization divides the larger table into the smaller table and links them using relationship.

The normal form is used to reduce redundancy from the database table.

Types of Normal Forms

There are the four types of normal forms:

DBMS Normalization

Fig 8 – Normal Forms

Normal Form	Description
1NF	A relation is in 1NF if it contains an atomic value.
2NF	A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional dependent on the primary key.
3NF	A relation will be in 3NF if it is in 2NF and no transition dependency exists.
4NF	A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
5NF	A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.

First Normal Form (1NF)

A relation will be 1NF if it contains an atomic value.

It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.

First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID	EMP_NAME	EMP_PHONE	EMP_STATE
14	John	7272826385, 9064738238	UP
20	Harry	8574783832	Bihar
12	Sam	7390372389, 8589830302	Punjab

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID	EMP_NAME	EMP_PHONE	EMP_STATE
14	John	7272826385	UP
14	John	9064738238	UP
20	Harry	8574783832	Bihar
12	Sam	7390372389	Punjab
12	Sam	8589830302	Punjab

Second Normal Form (2NF)

In the 2NF, relational must be in 1NF.

In the second normal form, all non-key attributes are fully functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID	SUBJECT	TEACHER_AGE
25	Chemistry	30
25	Biology	30
47	English	35
83	Math	38
83	Computer	38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID	TEACHER_AGE
25	30
47	35
83	38

TEACHER_SUBJECT table:

TEACHER_ID	SUBJECT
25	Chemistry
25	Biology
47	English
83	Math
83	Computer

Third Normal Form (3NF)

A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.

3NF is used to reduce the data duplication. It is also used to achieve the data integrity.

If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X → Y.

X is a super key.

Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID	EMP_NAME	EMP_ZIP	EMP_STATE	EMP_CITY
222	Harry	201010	UP	Noida
333	Stephan	02228	US	Boston
444	Lan	60007	US	Chicago
555	Katharine	06389	UK	Norwich
666	John	462007	MP	Bhopal

Super key in the table above:

{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID	EMP_NAME	EMP_ZIP
222	Harry	201010
333	Stephan	02228
444	Lan	60007
555	Katharine	06389
666	John	462007

EMPLOYEE_ZIP table:

EMP_ZIP	EMP_STATE	EMP_CITY
201010	UP	Noida
02228	US	Boston
60007	US	Chicago
06389	UK	Norwich
462007	MP	Bhopal

Boyce Codd normal form (BCNF)

BCNF is the advance version of 3NF. It is stricter than 3NF.

A table is in BCNF if every functional dependency X → Y, X is the super key of the table.

For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID	EMP_COUNTRY	EMP_DEPT	DEPT_TYPE	EMP_DEPT_NO
264	India	Designing	D394	283
264	India	Testing	D394	300
364	UK	Stores	D283	232
364	UK	Developing	D283	549

In the above table Functional dependencies are as follows:

EMP_ID → EMP_COUNTRY

EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID	EMP_COUNTRY
264	India
264	India

EMP_DEPT table:

EMP_DEPT	DEPT_TYPE	EMP_DEPT_NO
Designing	D394	283
Testing	D394	300
Stores	D283	232
Developing	D283	549

EMP_DEPT_MAPPING table:

EMP_ID	EMP_DEPT
D394	283
D394	300
D283	232
D283	549

Functional dependencies:

EMP_ID → EMP_COUNTRY

EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Fourth normal form (4NF)

A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.

For a dependency A→ B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency.

Example

STUDENT

STU_ID	COURSE	HOBBY
21	Computer	Dancing
21	Math	Singing
34	Chemistry	Dancing
74	Biology	Cricket
59	Physics	Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID	COURSE
21	Computer
21	Math
34	Chemistry
74	Biology
59	Physics

STUDENT_HOBBY

STU_ID	HOBBY
21	Dancing
21	Singing
34	Dancing
74	Cricket
59	Hockey

Fifth normal form (5NF)

A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.

5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.

5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT	LECTURER	SEMESTER
Computer	Anshika	Semester 1
Computer	John	Semester 1
Math	John	Semester 1
Math	Akash	Semester 2
Chemistry	Praveen	Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

SEMESTER	SUBJECT
Semester 1	Computer
Semester 1	Math
Semester 1	Chemistry
Semester 2	Math

SUBJECT	LECTURER
Computer	Anshika
Computer	John
Math	John
Math	Akash
Chemistry	Praveen

SEMSTER	LECTURER
Semester 1	Anshika
Semester 1	John
Semester 1	John
Semester 2	Akash
Semester 1	Praveen

Key takeaway

Normalization is the process of organizing the data in the database.

Normalization is used to minimize the redundancy from a relation or set of relations. It is also used to eliminate the undesirable characteristics like Insertion, Update and Deletion Anomalies.

Normalization divides the larger table into the smaller table and links them using relationship.

The normal form is used to reduce redundancy from the database table.

3.6 Decompositions and desirable properties of them

Relational Decomposition

When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is required.

In a database, it breaks the table into multiple tables.

If the relation has no proper decomposition, then it may lead to problems like loss of information.

Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.

Types of Decomposition

DBMS Relational Decomposition

Fig 9 - Decomposition

Lossless Decomposition

If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.

The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed.

The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID	EMP_NAME	EMP_AGE	EMP_CITY	DEPT_ID	DEPT_NAME
22	Denim	28	Mumbai	827	Sales
33	Alina	25	Delhi	438	Marketing
46	Stephan	30	Bangalore	869	Finance
52	Katherine	36	Mumbai	575	Production
60	Jack	40	Noida	678	Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID	EMP_NAME	EMP_AGE	EMP_CITY
22	Denim	28	Mumbai
33	Alina	25	Delhi
46	Stephan	30	Bangalore
52	Katherine	36	Mumbai
60	Jack	40	Noida

DEPARTMENT table

DEPT_ID	EMP_ID	DEPT_NAME
827	22	Sales
438	33	Marketing
869	46	Finance
575	52	Production
678	60	Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look like:

Employee ⋈ Department

EMP_ID	EMP_NAME	EMP_AGE	EMP_CITY	DEPT_ID	DEPT_NAME
22	Denim	28	Mumbai	827	Sales
33	Alina	25	Delhi	438	Marketing
46	Stephan	30	Bangalore	869	Finance
52	Katherine	36	Mumbai	575	Production
60	Jack	40	Noida	678	Testing

Hence, the decomposition is Lossless join decomposition.

Dependency Preserving

It is an important constraint of the database.

In the dependency preservation, at least one decomposed table must satisfy every dependency.

If a relation R is decomposed into relation R1 and R2, then the dependencies of R either must be a part of R1 or R2 or must be derivable from the combination of functional dependencies of R1 and R2.

For example, suppose there is a relation R (A, B, C, D) with functional dependency set (A->BC). The relational R is decomposed into R1(ABC) and R2(AD) which is dependency preserving because FD A->BC is a part of relation R1(ABC).

Key takeaway

When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is required.

In a database, it breaks the table into multiple tables.

If the relation has no proper decomposition, then it may lead to problems like loss of information.

Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.

3.7 Multi-valued dependencies and 4NF

Multivalued Dependency

Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute.

A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always requires at least three attributes.

Example: Suppose there is a bike manufacturer company which produces two colors(white and black) of each model every year.

BIKE_MODEL	MANUF_YEAR	COLOR
M2011	2008	White
M2001	2008	Black
M3001	2013	White
M3001	2013	Black
M4006	2017	White
M4006	2017	Black

Here columns COLOR and MANUF_YEAR are dependent on BIKE_MODEL and independent of each other.

In this case, these two columns can be called as multivalued dependent on BIKE_MODEL. The representation of these dependencies is shown below:

BIKE_MODEL → → MANUF_YEAR

BIKE_MODEL → → COLOR

This can be read as "BIKE_MODEL multidetermined MANUF_YEAR" and "BIKE_MODEL multidetermined COLOR".

Fourth normal form (4NF)

A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.

For a dependency A→ B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency.

Example

STUDENT

STU_ID	COURSE	HOBBY
21	Computer	Dancing
21	Math	Singing
34	Chemistry	Dancing
74	Biology	Cricket
59	Physics	Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID	COURSE
21	Computer
21	Math
34	Chemistry
74	Biology
59	Physics

STUDENT_HOBBY

STU_ID	HOBBY
21	Dancing
21	Singing
34	Dancing
74	Cricket
59	Hockey

Key takeaway

Multivalued dependency occurs when two attributes in a table are independent of each other but, both depend on a third attribute.

A multivalued dependency consists of at least two attributes that are dependent on a third attribute that's why it always requires at least three attributes.

3.8 Join dependencies and definition of 5NF

Join Dependency

Join decomposition is a further generalization of Multivalued dependencies.

If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists.

Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).

Alternatively, R1 and R2 are a lossless decomposition of R.

A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.

The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R.

Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.

Fifth normal form (5NF)

A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.

5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.

5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT	LECTURER	SEMESTER
Computer	Anshika	Semester 1
Computer	John	Semester 1
Math	John	Semester 1
Math	Akash	Semester 2
Chemistry	Praveen	Semester 1

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

SEMESTER	SUBJECT
Semester 1	Computer
Semester 1	Math
Semester 1	Chemistry
Semester 2	Math

SUBJECT	LECTURER
Computer	Anshika
Computer	John
Math	John
Math	Akash
Chemistry	Praveen

SEMSTER	LECTURER
Semester 1	Anshika
Semester 1	John
Semester 1	John
Semester 2	Akash
Semester 1	Praveen

Key takeaway

Join decomposition is a further generalization of Multivalued dependencies.

If the join of R1 and R2 over C is equal to relation R, then we can say that a join dependency (JD) exists.

Where R1 and R2 are the decompositions R1(A, B, C) and R2(C, D) of a given relations R (A, B, C, D).

Alternatively, R1 and R2 are a lossless decomposition of R.

A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.

The *(A, B, C, D), (C, D) will be a JD of R if the join of join's attribute is equal to the relation R.

Here, *(R1, R2, R3) is used to indicate that relation R1, R2, R3 and so on are a JD of R.

Reference Books

1. “Database Management Systems”, Raghu Ramakrishnan and Johannes Gehrke, 2002, 3rd Edition.

2. “Fundamentals of Database Systems”, RamezElmasri and ShamkantNavathe, Benjamin Cummings, 1999, 3rd Edition.

3. “Database System Concepts”, Abraham Silberschatz, Henry F. Korth and S.Sudarshan, Mc Graw Hill, 2002, 4th Edition.

Sign Up

Index

Notes

Highlighted

Underlined

Browse by Topics

Notes

Highlighted

Underlined