Unit – 3 | unit 3 data base design normalization

DBMS

Unit – 3

Data Base Design & Normalization

3.1 Functional dependencies

Functional Dependency (FD) is a constraint in a database management system system that specifies the relationship of one attribute to another attribute (DBMS). Functional Dependency helps to maintain the database's data quality. Finding the difference between good and poor database design plays a critical role.

The arrow "→" signifies a functional dependence. X→ Y is defined by the functional dependence of X on Y.

Rules of functional dependencies

The three most important rules for Database Functional Dependence are below:

● Reflexive law: X holds a value of Y if X is a set of attributes and Y is a subset of X.

● Augmentation rule: If x -> y holds, and c is set as an attribute, then ac -> bc holds as well. That is to add attributes that do not modify the fundamental dependencies.

● Transitivity law: If x -> y holds and y -> z holds, this rule is very similar to the transitive rule in algebra, then x -> z also holds. X -> y is referred to as functionally evaluating y.

Types of functional dependency

Fig 1: Types of functional dependency

1. Trivial functional dependency

● A → B has trivial functional dependency if B is a subset of A.

● The following dependencies are also trivial like: A → A, B → B

Example

Consider a table with two columns Employee_Id and Employee_Name.

{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as

Employee_Id is a subset of {Employee_Id, Employee_Name}.

Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.

2. Non - trivial functional dependencies

● A → B has a non-trivial functional dependency if B is not a subset of A.

● When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

ID → Name,

Name → DOB

Key takeaway:

The arrow "→" signifies a functional dependence.
Functional Dependency helps to maintain the database's data quality.

3.2 Normal forms, first, second, third normal forms, BCNF

Normalization is often executed as a series of different forms. Each normal form has its own properties. As normalization proceeds, the relations become progressively more restricted in format, and also less vulnerable to update anomalies. For the relational data model, it is important to bring the relation only in first normal form (1NF) that is critical in creating relations. All the remaining forms are optional.

A relation R is said to be normalized if it does not create any anomaly for three basic operations: insert, delete and update

Fig 2: Types of normal forms

● First Normal Form (1NF):

A relation r is in 1NF if and only if every tuple contains only atomic attributes means exactly one value for each attribute. As per the rule of first normal form, an attribute of a table cannot hold multiple values. It should hold only atomic values.

Example: suppose Company has created an employee table to store name, address and mobile number of employees.

Emp_id	Emp_name	Emp_address	Emp_mobile
101	Rachel	Mumbai	9817312390
102	John	Pune	7812324252, 9890012244
103	Kim	Chennai	9878381212
104	Mary	Bangalore	9895000123, 7723455987

In the above table, two employees John, Mary have two mobile numbers. So, the attribute emp_mobile is a Multivalued attribute.

So, this table is not in 1NF as the rule says, “Each attribute of a table must have atomic values”. But in the above table the attribute, emp_mobile, is not an atomic attribute as it contains multiple values. So, it violates the rule of 1 NF.

Following solution brings an employee table in 1NF.

Emp_id	Emp_name	Emp_address	Emp_mobile
101	Rachel	Mumbai	9817312390
102	John	Pune	7812324252
102	John	Pune	9890012244
103	Kim	Chennai	9878381212
104	Mary	Bangalore	9895000123
104	Mary	Bangalore	7723455987

● Second Normal Form (2NF)

A table is said to be in 2NF if both of the following conditions are satisfied:

● Table is in 1 NF.

● No non-prime attribute is dependent on the proper subset of any candidate key of table. It means a non-prime attribute should fully functionally depend on the whole candidate key of a table. It should not depend on part of the key.

An attribute that is not part of any candidate key is known as a non-prime attribute.

Example: Suppose a school wants to store data of teachers and the subjects they teach. Since a teacher can teach more than one subject, the table can have multiple rows for the same teacher.

Teacher_id	Subject	Teacher_age
111	DSF	28
111	DBMS	28
222	CNT	35
333	OOPL	38
333	FDS	38

For above table:

Candidate Keys: {Teacher_Id, Subject}

Non prime attribute: Teacher_Age

The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non-prime attribute Teacher_Age is dependent on Teacher_Id alone which is a proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.

To bring above table in 2NF we can break it in two tables (Teacher_Detalis and

Teacher_Subject) like this:

Teacher_Details table:

Teacher_id	Teacher_age
111	28
222	35
333	38

Teacher_Subject table:

Teacher_id	Subject
111	DSF
111	DBMS
222	CNT
333	OOPL
333	FDS

Now these two tables are in 2NF.

● Third Normal Form (3NF)

A table design is said to be in 3NF if both the following conditions hold:

● Table must be in 2NF.

● Transitive functional dependency from the relation must be removed.

So it can be stated that, a table is in 3NF if it is in 2NF and for each functional dependency

P->Q at least one of the following conditions hold:

● P is a super key of table

● Q is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as a prime attribute.

Transitive functional dependency:

A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies.

For example:

P->R is a transitive dependency if the following three functional dependencies hold true:

1) P->Q and

2) Q->R

Example: suppose a company wants to store information about employees. Then the table Employee_Details looks like this:

Emp_id	Emp_name	Manager_id	Mgr_Dept	Mgr_Name
E1	Hary	M1	IT	William
E2	John	M1	IT	William
E3	Nil	M2	SALES	Stephen
E4	Mery	M3	HR	Johnson
E5	Steve	M2	SALSE	Stephen

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, Manager_id}

Candidate Key: {emp_id}

Non-prime attributes: All attributes except emp_id are non-prime as they are not subpart part of any candidate keys.

Here, Mgr_Dept, Mgr_Name depend on Manager_id. And, Manager_id is dependent on emp_id that makes non-prime attributes (Mgr_Dept, Mgr_Name) transitively dependent on super key (emp_id). This violates the rule of 3NF.

To bring this table in 3NF we have to break into two tables to remove transitive dependency.

Employee_Details table:

Emp_id	Emp_name	Manager_id
E1	Hary	M1
E2	John	M1
E3	Nil	M2
E4	Mery	M3
E5	Steve	M2

Manager_Details table:

Manager_id	Mgr_Dept	Mgr_Name
M1	IT	William
M2	SALSE	Stephen
M3	HR	Johnson

● Boyce Codd normal form (BCNF)

It is an advanced version of 3NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.

Example: Suppose there is a company wherein employees work in more than one

Department. They store the data like this:

Emp_id	Emp_nationality	Emp_dept	Dept_type	Dept_no_of_emp
101	Indian	Planning	D01	100
101	Indian	Accounting	D01	50
102	Japanese	Technical support	D14	300
102	Japanese	Sales	D14	100

Functional dependencies in the table above:

Emp_id ->emp_nationality

Emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys. To bring this table in BCNF we can break this table in three tables like:

Emp_nationality table:

Emp_id	Emp_nationality
101	Indian
102	Japanese

Emp_dept table:

Emp_dept	Dept_type	Dept_no_of_emp
Planning	D01	100
Accounting	D01	50
Technical support	D14	300
Sales	D14	100

Emp_dept_mapping table:

Emp_id	Emp_dept
101	Planning
101	Accounting
102	Technical support
102	Sales

Functional dependencies:

Emp_id ->; emp_nationality

Emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:

For first table: emp_id

For second table: emp_dept

For third table: {emp_id, emp_dept}

This is now in BCNF as in both functional dependencies the left side is a key.

Key takeaway:

A relation r is in 1NF if and only if every tuple contains only atomic attributes means exactly one value for each attribute.
A relation is in the 4NF if it is in BCNF and has no multi valued dependencies.

3.3 Inclusion dependence

● Although multivalued and join dependencies are less popular than functional dependencies, they can be used to direct database design.

● Dependencies on inclusion are very common. They usually have little say in how the database is designed.

● A statement in which certain columns of a relation are contained in other columns is known as an inclusion dependence.

● A foreign key is an example of inclusion dependence. The referencing relation is found in the primary main column(s) of the referenced relation in one relation.

● Let's say we have two relations R and S that were created by translating two entity sets so that any R entity is also a S entity.

● If projecting R on its key attributes yields a relation that is embedded in the relation obtained by projecting S on its key attributes, inclusion dependence exists.

● We should not break groups of attributes that participate in an inclusion dependence in inclusion dependency.

● In fact, the majority of inclusion dependencies are key-based, meaning that only keys are involved.

Key takeaway:

A statement in which certain columns of a relation are contained in other columns is known as an inclusion dependence.
We should not break groups of attributes that participate in an inclusion dependence in inclusion dependency.

3.4 Lossless join decompositions

● If the data is not lost from the decomposing relationship, then the decomposition is lossless.

● The lossless decomposition ensures that the union of relationships can lead to the decomposition of the same relationship.

● If natural joins of all the decomposition give the original relation, the relationship is said to be lossless decomposition.

Now let's see an instance:

Emp_ID	Emp_Name	Emp_Age	Emp_Location	Dept_ID	Dept_Name
E101	Jerry	29	Newyork	Dpt1	Operations
E102	Henry	32	Newyork	Dpt2	HR
E103	Tom	22	Texas	Dpt3	Finance

Decompose the table listed above into two tables:

Emp_ID	Emp_Name	Emp_Age	Emp_Location
E101	Jerry	29	Newyork
E102	Henry	32	Newyork
E103	Tom	22	Texas

Dept_ID	Emp_ID	Dept_Name
Dpt1	E101	Operations
Dpt2	E102	HR
Dpt3	E103	Finance

Now, Natural Join is added to the two tables above,

The outcome will be −

EmpDetails ⋈ DeptDetails

Emp_ID	Emp_Name	Emp_Age	Emp_Location	Dept_ID	Dept_Name
E101	Jerry	29	Newyork	Dpt1	Operations
E102	Henry	32	Newyork	Dpt2	HR
E103	Tom	22	Texas	Dpt3	Finance

Therefore, the above relationship had decomposition without loss, i.e. no data loss.

Key takeaway:

If the data is not lost from the decomposing relationship, then the decomposition is lossless.

3.5 Normalization using FD, MVD, and JDs, alternative approaches to database design

MVD

When two attributes in a table are independent of one another but both rely on a third attribute, this is known as multivalued dependency.

A multivalued dependency includes at least three attributes since it consists of at least two attributes that are based on a third attribute.

Example: Assume that a bicycle manufacturer produces two colors (white and black) of each model each year.

BIKE_MODEL	MANUF_YEAR	COLOR
M2011	2008	White
M2001	2008	Black
M3001	2013	White
M3001	2013	Black
M4006	2017	White
M4006	2017	Black

COLOR and MANUF YEAR are both based on BIKE MODEL and independent of each other in this case.

These two columns can be considered multivalued in this case since they are based on BIKE MODEL. The following diagram depicts these dependencies:

BIKE_MODEL → → MANUF_YEAR

BIKE_MODEL → → COLOR

"BIKE MODEL multi determined MANUF YEAR" and "BIKE MODEL multi determined COLOR" can be read as "BIKE MODEL multi determined MANUF YEAR" and "BIKE MODEL multi determined COLOR."

JDs

● Multivalued dependencies are further generalized by Join decomposition.

● We can assume that a join dependence (JD) exists if the join of R1 and R2 over C equals relation R.

● Where R1(A, B, C) and R2(C, D) are the decompositions of a given relation R. (A, B, C, D).

● R1 and R2, on the other hand, are lossless decompositions of R.

● A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.

● If the join of join's attribute equals the relation R, the *(A, B, C, D), (C, D) would be a JD of R.

● The symbol *(R1, R2, R3) denotes the relation R1, R2, R3, and so on are all JDs of R.

Key takeaway:

When two attributes in a table are independent of one another but both rely on a third attribute, this is known as multivalued dependency.
A multivalued dependency includes at least three attributes since it consists of at least two attributes that are based on a third attribute.
Multivalued dependencies are further generalized by Join decomposition.

References:

Korth, Silbertz, Sudarshan,” Database Concepts”, McGraw Hill

2. Date C J, “An Introduction to Database Systems”, Addision Wesley

3. Elmasri, Navathe, “ Fundamentals of Database Systems”, Addision Wesley

4. O’Neil, Databases, Elsevier Pub.

Unit – 3

Data Base Design & Normalization

3.1 Functional dependencies

The arrow "→" signifies a functional dependence. X→ Y is defined by the functional dependence of X on Y.

Rules of functional dependencies

The three most important rules for Database Functional Dependence are below:

● Reflexive law: X holds a value of Y if X is a set of attributes and Y is a subset of X.

● Augmentation rule: If x -> y holds, and c is set as an attribute, then ac -> bc holds as well. That is to add attributes that do not modify the fundamental dependencies.

● Transitivity law: If x -> y holds and y -> z holds, this rule is very similar to the transitive rule in algebra, then x -> z also holds. X -> y is referred to as functionally evaluating y.

Types of functional dependency

Fig 1: Types of functional dependency

1. Trivial functional dependency

● A → B has trivial functional dependency if B is a subset of A.

● The following dependencies are also trivial like: A → A, B → B

Example

Consider a table with two columns Employee_Id and Employee_Name.

{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as

Employee_Id is a subset of {Employee_Id, Employee_Name}.

Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.

2. Non - trivial functional dependencies

● A → B has a non-trivial functional dependency if B is not a subset of A.

● When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

ID → Name,

Name → DOB

Key takeaway:

The arrow "→" signifies a functional dependence.
Functional Dependency helps to maintain the database's data quality.

3.2 Normal forms, first, second, third normal forms, BCNF

A relation R is said to be normalized if it does not create any anomaly for three basic operations: insert, delete and update

Fig 2: Types of normal forms

● First Normal Form (1NF):

Example: suppose Company has created an employee table to store name, address and mobile number of employees.

Emp_id	Emp_name	Emp_address	Emp_mobile
101	Rachel	Mumbai	9817312390
102	John	Pune	7812324252, 9890012244
103	Kim	Chennai	9878381212
104	Mary	Bangalore	9895000123, 7723455987

In the above table, two employees John, Mary have two mobile numbers. So, the attribute emp_mobile is a Multivalued attribute.

Following solution brings an employee table in 1NF.

Emp_id	Emp_name	Emp_address	Emp_mobile
101	Rachel	Mumbai	9817312390
102	John	Pune	7812324252
102	John	Pune	9890012244
103	Kim	Chennai	9878381212
104	Mary	Bangalore	9895000123
104	Mary	Bangalore	7723455987

● Second Normal Form (2NF)

A table is said to be in 2NF if both of the following conditions are satisfied:

● Table is in 1 NF.

An attribute that is not part of any candidate key is known as a non-prime attribute.

Example: Suppose a school wants to store data of teachers and the subjects they teach. Since a teacher can teach more than one subject, the table can have multiple rows for the same teacher.

Teacher_id	Subject	Teacher_age
111	DSF	28
111	DBMS	28
222	CNT	35
333	OOPL	38
333	FDS	38

For above table:

Candidate Keys: {Teacher_Id, Subject}

Non prime attribute: Teacher_Age

To bring above table in 2NF we can break it in two tables (Teacher_Detalis and

Teacher_Subject) like this:

Teacher_Details table:

Teacher_id	Teacher_age
111	28
222	35
333	38

Teacher_Subject table:

Teacher_id	Subject
111	DSF
111	DBMS
222	CNT
333	OOPL
333	FDS

Now these two tables are in 2NF.

● Third Normal Form (3NF)

A table design is said to be in 3NF if both the following conditions hold:

● Table must be in 2NF.

● Transitive functional dependency from the relation must be removed.

So it can be stated that, a table is in 3NF if it is in 2NF and for each functional dependency

P->Q at least one of the following conditions hold:

● P is a super key of table

● Q is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as a prime attribute.

Transitive functional dependency:

A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies.

For example:

P->R is a transitive dependency if the following three functional dependencies hold true:

1) P->Q and

2) Q->R

Example: suppose a company wants to store information about employees. Then the table Employee_Details looks like this:

Emp_id	Emp_name	Manager_id	Mgr_Dept	Mgr_Name
E1	Hary	M1	IT	William
E2	John	M1	IT	William
E3	Nil	M2	SALES	Stephen
E4	Mery	M3	HR	Johnson
E5	Steve	M2	SALSE	Stephen

Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, Manager_id}

Candidate Key: {emp_id}

Non-prime attributes: All attributes except emp_id are non-prime as they are not subpart part of any candidate keys.

To bring this table in 3NF we have to break into two tables to remove transitive dependency.

Employee_Details table:

Emp_id	Emp_name	Manager_id
E1	Hary	M1
E2	John	M1
E3	Nil	M2
E4	Mery	M3
E5	Steve	M2

Manager_Details table:

Manager_id	Mgr_Dept	Mgr_Name
M1	IT	William
M2	SALSE	Stephen
M3	HR	Johnson

● Boyce Codd normal form (BCNF)

It is an advanced version of 3NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.

Example: Suppose there is a company wherein employees work in more than one

Department. They store the data like this:

Emp_id	Emp_nationality	Emp_dept	Dept_type	Dept_no_of_emp
101	Indian	Planning	D01	100
101	Indian	Accounting	D01	50
102	Japanese	Technical support	D14	300
102	Japanese	Sales	D14	100

Functional dependencies in the table above:

Emp_id ->emp_nationality

Emp_dept -> {dept_type, dept_no_of_emp}

Candidate key: {emp_id, emp_dept}

The table is not in BCNF as neither emp_id nor emp_dept alone are keys. To bring this table in BCNF we can break this table in three tables like:

Emp_nationality table:

Emp_id	Emp_nationality
101	Indian
102	Japanese

Emp_dept table:

Emp_dept	Dept_type	Dept_no_of_emp
Planning	D01	100
Accounting	D01	50
Technical support	D14	300
Sales	D14	100

Emp_dept_mapping table:

Emp_id	Emp_dept
101	Planning
101	Accounting
102	Technical support
102	Sales

Functional dependencies:

Emp_id ->; emp_nationality

Emp_dept -> {dept_type, dept_no_of_emp}

Candidate keys:

For first table: emp_id

For second table: emp_dept

For third table: {emp_id, emp_dept}

This is now in BCNF as in both functional dependencies the left side is a key.

Key takeaway:

A relation r is in 1NF if and only if every tuple contains only atomic attributes means exactly one value for each attribute.
A relation is in the 4NF if it is in BCNF and has no multi valued dependencies.