Unit – 3
Data Base Design & Normalization
Functional Dependency (FD) is a constraint in a database management system system that specifies the relationship of one attribute to another attribute (DBMS). Functional Dependency helps to maintain the database's data quality. Finding the difference between good and poor database design plays a critical role.
The arrow "→" signifies a functional dependence. X→ Y is defined by the functional dependence of X on Y.
Rules of functional dependencies
The three most important rules for Database Functional Dependence are below:
● Reflexive law: X holds a value of Y if X is a set of attributes and Y is a subset of X.
● Augmentation rule: If x -> y holds, and c is set as an attribute, then ac -> bc holds as well. That is to add attributes that do not modify the fundamental dependencies.
● Transitivity law: If x -> y holds and y -> z holds, this rule is very similar to the transitive rule in algebra, then x -> z also holds. X -> y is referred to as functionally evaluating y.
Types of functional dependency
Fig 1: Types of functional dependency
1. Trivial functional dependency
● A → B has trivial functional dependency if B is a subset of A.
● The following dependencies are also trivial like: A → A, B → B
Example
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.
2. Non - trivial functional dependencies
● A → B has a non-trivial functional dependency if B is not a subset of A.
● When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
Key takeaway:
- The arrow "→" signifies a functional dependence.
- Functional Dependency helps to maintain the database's data quality.
Normalization is often executed as a series of different forms. Each normal form has its own properties. As normalization proceeds, the relations become progressively more restricted in format, and also less vulnerable to update anomalies. For the relational data model, it is important to bring the relation only in first normal form (1NF) that is critical in creating relations. All the remaining forms are optional.
A relation R is said to be normalized if it does not create any anomaly for three basic operations: insert, delete and update
Fig 2: Types of normal forms
● First Normal Form (1NF):
A relation r is in 1NF if and only if every tuple contains only atomic attributes means exactly one value for each attribute. As per the rule of first normal form, an attribute of a table cannot hold multiple values. It should hold only atomic values.
Example: suppose Company has created an employee table to store name, address and mobile number of employees.
Emp_id | Emp_name | Emp_address | Emp_mobile |
101 | Rachel | Mumbai | 9817312390 |
102 | John | Pune | 7812324252, 9890012244 |
103 | Kim | Chennai | 9878381212 |
104 | Mary | Bangalore | 9895000123, 7723455987 |
In the above table, two employees John, Mary have two mobile numbers. So, the attribute emp_mobile is a Multivalued attribute.
So, this table is not in 1NF as the rule says, “Each attribute of a table must have atomic values”. But in the above table the attribute, emp_mobile, is not an atomic attribute as it contains multiple values. So, it violates the rule of 1 NF.
Following solution brings an employee table in 1NF.
Emp_id | Emp_name | Emp_address | Emp_mobile |
101 | Rachel | Mumbai | 9817312390 |
102 | John | Pune | 7812324252 |
102 | John | Pune | 9890012244 |
103 | Kim | Chennai | 9878381212 |
104 | Mary | Bangalore | 9895000123 |
104 | Mary | Bangalore | 7723455987 |
● Second Normal Form (2NF)
A table is said to be in 2NF if both of the following conditions are satisfied:
● Table is in 1 NF.
● No non-prime attribute is dependent on the proper subset of any candidate key of table. It means a non-prime attribute should fully functionally depend on the whole candidate key of a table. It should not depend on part of the key.
An attribute that is not part of any candidate key is known as a non-prime attribute.
Example: Suppose a school wants to store data of teachers and the subjects they teach. Since a teacher can teach more than one subject, the table can have multiple rows for the same teacher.
Teacher_id | Subject | Teacher_age |
111 | DSF | 28 |
111 | DBMS | 28 |
222 | CNT | 35 |
333 | OOPL | 38 |
333 | FDS | 38 |
For above table:
Candidate Keys: {Teacher_Id, Subject}
Non prime attribute: Teacher_Age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non-prime attribute Teacher_Age is dependent on Teacher_Id alone which is a proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.
To bring above table in 2NF we can break it in two tables (Teacher_Detalis and
Teacher_Subject) like this:
Teacher_Details table:
Teacher_id | Teacher_age |
111 | 28 |
222 | 35 |
333 | 38 |
Teacher_Subject table:
Teacher_id | Subject |
111 | DSF |
111 | DBMS |
222 | CNT |
333 | OOPL |
333 | FDS |
Now these two tables are in 2NF.
● Third Normal Form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
● Table must be in 2NF.
● Transitive functional dependency from the relation must be removed.
So it can be stated that, a table is in 3NF if it is in 2NF and for each functional dependency
P->Q at least one of the following conditions hold:
● P is a super key of table
● Q is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as a prime attribute.
Transitive functional dependency:
A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies.
For example:
P->R is a transitive dependency if the following three functional dependencies hold true:
1) P->Q and
2) Q->R
Example: suppose a company wants to store information about employees. Then the table Employee_Details looks like this:
Emp_id | Emp_name | Manager_id | Mgr_Dept | Mgr_Name |
E1 | Hary | M1 | IT | William |
E2 | John | M1 | IT | William |
E3 | Nil | M2 | SALES | Stephen |
E4 | Mery | M3 | HR | Johnson |
E5 | Steve | M2 | SALSE | Stephen |
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, Manager_id}
Candidate Key: {emp_id}
Non-prime attributes: All attributes except emp_id are non-prime as they are not subpart part of any candidate keys.
Here, Mgr_Dept, Mgr_Name depend on Manager_id. And, Manager_id is dependent on emp_id that makes non-prime attributes (Mgr_Dept, Mgr_Name) transitively dependent on super key (emp_id). This violates the rule of 3NF.
To bring this table in 3NF we have to break into two tables to remove transitive dependency.
Employee_Details table:
Emp_id | Emp_name | Manager_id |
E1 | Hary | M1 |
E2 | John | M1 |
E3 | Nil | M2 |
E4 | Mery | M3 |
E5 | Steve | M2 |
Manager_Details table:
Manager_id | Mgr_Dept | Mgr_Name |
M1 | IT | William |
M2 | SALSE | Stephen |
M3 | HR | Johnson |
● Boyce Codd normal form (BCNF)
It is an advanced version of 3NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one
Department. They store the data like this:
Emp_id | Emp_nationality | Emp_dept | Dept_type | Dept_no_of_emp |
101 | Indian | Planning | D01 | 100 |
101 | Indian | Accounting | D01 | 50 |
102 | Japanese | Technical support | D14 | 300 |
102 | Japanese | Sales | D14 | 100 |
Functional dependencies in the table above:
Emp_id ->emp_nationality
Emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys. To bring this table in BCNF we can break this table in three tables like:
Emp_nationality table:
Emp_id | Emp_nationality |
101 | Indian |
102 | Japanese |
Emp_dept table:
Emp_dept | Dept_type | Dept_no_of_emp |
Planning | D01 | 100 |
Accounting | D01 | 50 |
Technical support | D14 | 300 |
Sales | D14 | 100 |
Emp_dept_mapping table:
Emp_id | Emp_dept |
101 | Planning |
101 | Accounting |
102 | Technical support |
102 | Sales |
Functional dependencies:
Emp_id ->; emp_nationality
Emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both functional dependencies the left side is a key.
Key takeaway:
- A relation r is in 1NF if and only if every tuple contains only atomic attributes means exactly one value for each attribute.
- A relation is in the 4NF if it is in BCNF and has no multi valued dependencies.
● Although multivalued and join dependencies are less popular than functional dependencies, they can be used to direct database design.
● Dependencies on inclusion are very common. They usually have little say in how the database is designed.
● A statement in which certain columns of a relation are contained in other columns is known as an inclusion dependence.
● A foreign key is an example of inclusion dependence. The referencing relation is found in the primary main column(s) of the referenced relation in one relation.
● Let's say we have two relations R and S that were created by translating two entity sets so that any R entity is also a S entity.
● If projecting R on its key attributes yields a relation that is embedded in the relation obtained by projecting S on its key attributes, inclusion dependence exists.
● We should not break groups of attributes that participate in an inclusion dependence in inclusion dependency.
● In fact, the majority of inclusion dependencies are key-based, meaning that only keys are involved.
Key takeaway:
- A statement in which certain columns of a relation are contained in other columns is known as an inclusion dependence.
- We should not break groups of attributes that participate in an inclusion dependence in inclusion dependency.
● If the data is not lost from the decomposing relationship, then the decomposition is lossless.
● The lossless decomposition ensures that the union of relationships can lead to the decomposition of the same relationship.
● If natural joins of all the decomposition give the original relation, the relationship is said to be lossless decomposition.
Now let's see an instance:
<EmpInfo>
Emp_ID | Emp_Name | Emp_Age | Emp_Location | Dept_ID | Dept_Name |
E101 | Jerry | 29 | Newyork | Dpt1 | Operations |
E102 | Henry | 32 | Newyork | Dpt2 | HR |
E103 | Tom | 22 | Texas | Dpt3 | Finance |
Decompose the table listed above into two tables:
<EmpDetails>
Emp_ID | Emp_Name | Emp_Age | Emp_Location |
E101 | Jerry | 29 | Newyork |
E102 | Henry | 32 | Newyork |
E103 | Tom | 22 | Texas |
<DeptDetails>
Dept_ID | Emp_ID | Dept_Name |
Dpt1 | E101 | Operations |
Dpt2 | E102 | HR |
Dpt3 | E103 | Finance |
Now, Natural Join is added to the two tables above,
The outcome will be −
EmpDetails ⋈ DeptDetails
Emp_ID | Emp_Name | Emp_Age | Emp_Location | Dept_ID | Dept_Name |
E101 | Jerry | 29 | Newyork | Dpt1 | Operations |
E102 | Henry | 32 | Newyork | Dpt2 | HR |
E103 | Tom | 22 | Texas | Dpt3 | Finance |
Therefore, the above relationship had decomposition without loss, i.e. no data loss.
Key takeaway:
- If the data is not lost from the decomposing relationship, then the decomposition is lossless.
MVD
When two attributes in a table are independent of one another but both rely on a third attribute, this is known as multivalued dependency.
A multivalued dependency includes at least three attributes since it consists of at least two attributes that are based on a third attribute.
Example: Assume that a bicycle manufacturer produces two colors (white and black) of each model each year.
BIKE_MODEL | MANUF_YEAR | COLOR |
M2011 | 2008 | White |
M2001 | 2008 | Black |
M3001 | 2013 | White |
M3001 | 2013 | Black |
M4006 | 2017 | White |
M4006 | 2017 | Black |
COLOR and MANUF YEAR are both based on BIKE MODEL and independent of each other in this case.
These two columns can be considered multivalued in this case since they are based on BIKE MODEL. The following diagram depicts these dependencies:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
"BIKE MODEL multi determined MANUF YEAR" and "BIKE MODEL multi determined COLOR" can be read as "BIKE MODEL multi determined MANUF YEAR" and "BIKE MODEL multi determined COLOR."
JDs
● Multivalued dependencies are further generalized by Join decomposition.
● We can assume that a join dependence (JD) exists if the join of R1 and R2 over C equals relation R.
● Where R1(A, B, C) and R2(C, D) are the decompositions of a given relation R. (A, B, C, D).
● R1 and R2, on the other hand, are lossless decompositions of R.
● A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.
● If the join of join's attribute equals the relation R, the *(A, B, C, D), (C, D) would be a JD of R.
● The symbol *(R1, R2, R3) denotes the relation R1, R2, R3, and so on are all JDs of R.
Key takeaway:
- When two attributes in a table are independent of one another but both rely on a third attribute, this is known as multivalued dependency.
- A multivalued dependency includes at least three attributes since it consists of at least two attributes that are based on a third attribute.
- Multivalued dependencies are further generalized by Join decomposition.
References:
- Korth, Silbertz, Sudarshan,” Database Concepts”, McGraw Hill
2. Date C J, “An Introduction to Database Systems”, Addision Wesley
3. Elmasri, Navathe, “ Fundamentals of Database Systems”, Addision Wesley
4. O’Neil, Databases, Elsevier Pub.
Unit – 3
Data Base Design & Normalization
Functional Dependency (FD) is a constraint in a database management system system that specifies the relationship of one attribute to another attribute (DBMS). Functional Dependency helps to maintain the database's data quality. Finding the difference between good and poor database design plays a critical role.
The arrow "→" signifies a functional dependence. X→ Y is defined by the functional dependence of X on Y.
Rules of functional dependencies
The three most important rules for Database Functional Dependence are below:
● Reflexive law: X holds a value of Y if X is a set of attributes and Y is a subset of X.
● Augmentation rule: If x -> y holds, and c is set as an attribute, then ac -> bc holds as well. That is to add attributes that do not modify the fundamental dependencies.
● Transitivity law: If x -> y holds and y -> z holds, this rule is very similar to the transitive rule in algebra, then x -> z also holds. X -> y is referred to as functionally evaluating y.
Types of functional dependency
Fig 1: Types of functional dependency
1. Trivial functional dependency
● A → B has trivial functional dependency if B is a subset of A.
● The following dependencies are also trivial like: A → A, B → B
Example
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.
2. Non - trivial functional dependencies
● A → B has a non-trivial functional dependency if B is not a subset of A.
● When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
Key takeaway:
- The arrow "→" signifies a functional dependence.
- Functional Dependency helps to maintain the database's data quality.
Normalization is often executed as a series of different forms. Each normal form has its own properties. As normalization proceeds, the relations become progressively more restricted in format, and also less vulnerable to update anomalies. For the relational data model, it is important to bring the relation only in first normal form (1NF) that is critical in creating relations. All the remaining forms are optional.
A relation R is said to be normalized if it does not create any anomaly for three basic operations: insert, delete and update
Fig 2: Types of normal forms
● First Normal Form (1NF):
A relation r is in 1NF if and only if every tuple contains only atomic attributes means exactly one value for each attribute. As per the rule of first normal form, an attribute of a table cannot hold multiple values. It should hold only atomic values.
Example: suppose Company has created an employee table to store name, address and mobile number of employees.
Emp_id | Emp_name | Emp_address | Emp_mobile |
101 | Rachel | Mumbai | 9817312390 |
102 | John | Pune | 7812324252, 9890012244 |
103 | Kim | Chennai | 9878381212 |
104 | Mary | Bangalore | 9895000123, 7723455987 |
In the above table, two employees John, Mary have two mobile numbers. So, the attribute emp_mobile is a Multivalued attribute.
So, this table is not in 1NF as the rule says, “Each attribute of a table must have atomic values”. But in the above table the attribute, emp_mobile, is not an atomic attribute as it contains multiple values. So, it violates the rule of 1 NF.
Following solution brings an employee table in 1NF.
Emp_id | Emp_name | Emp_address | Emp_mobile |
101 | Rachel | Mumbai | 9817312390 |
102 | John | Pune | 7812324252 |
102 | John | Pune | 9890012244 |
103 | Kim | Chennai | 9878381212 |
104 | Mary | Bangalore | 9895000123 |
104 | Mary | Bangalore | 7723455987 |
● Second Normal Form (2NF)
A table is said to be in 2NF if both of the following conditions are satisfied:
● Table is in 1 NF.
● No non-prime attribute is dependent on the proper subset of any candidate key of table. It means a non-prime attribute should fully functionally depend on the whole candidate key of a table. It should not depend on part of the key.
An attribute that is not part of any candidate key is known as a non-prime attribute.
Example: Suppose a school wants to store data of teachers and the subjects they teach. Since a teacher can teach more than one subject, the table can have multiple rows for the same teacher.
Teacher_id | Subject | Teacher_age |
111 | DSF | 28 |
111 | DBMS | 28 |
222 | CNT | 35 |
333 | OOPL | 38 |
333 | FDS | 38 |
For above table:
Candidate Keys: {Teacher_Id, Subject}
Non prime attribute: Teacher_Age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non-prime attribute Teacher_Age is dependent on Teacher_Id alone which is a proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.
To bring above table in 2NF we can break it in two tables (Teacher_Detalis and
Teacher_Subject) like this:
Teacher_Details table:
Teacher_id | Teacher_age |
111 | 28 |
222 | 35 |
333 | 38 |
Teacher_Subject table:
Teacher_id | Subject |
111 | DSF |
111 | DBMS |
222 | CNT |
333 | OOPL |
333 | FDS |
Now these two tables are in 2NF.
● Third Normal Form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
● Table must be in 2NF.
● Transitive functional dependency from the relation must be removed.
So it can be stated that, a table is in 3NF if it is in 2NF and for each functional dependency
P->Q at least one of the following conditions hold:
● P is a super key of table
● Q is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as a prime attribute.
Transitive functional dependency:
A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies.
For example:
P->R is a transitive dependency if the following three functional dependencies hold true:
1) P->Q and
2) Q->R
Example: suppose a company wants to store information about employees. Then the table Employee_Details looks like this:
Emp_id | Emp_name | Manager_id | Mgr_Dept | Mgr_Name |
E1 | Hary | M1 | IT | William |
E2 | John | M1 | IT | William |
E3 | Nil | M2 | SALES | Stephen |
E4 | Mery | M3 | HR | Johnson |
E5 | Steve | M2 | SALSE | Stephen |
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, Manager_id}
Candidate Key: {emp_id}
Non-prime attributes: All attributes except emp_id are non-prime as they are not subpart part of any candidate keys.
Here, Mgr_Dept, Mgr_Name depend on Manager_id. And, Manager_id is dependent on emp_id that makes non-prime attributes (Mgr_Dept, Mgr_Name) transitively dependent on super key (emp_id). This violates the rule of 3NF.
To bring this table in 3NF we have to break into two tables to remove transitive dependency.
Employee_Details table:
Emp_id | Emp_name | Manager_id |
E1 | Hary | M1 |
E2 | John | M1 |
E3 | Nil | M2 |
E4 | Mery | M3 |
E5 | Steve | M2 |
Manager_Details table:
Manager_id | Mgr_Dept | Mgr_Name |
M1 | IT | William |
M2 | SALSE | Stephen |
M3 | HR | Johnson |
● Boyce Codd normal form (BCNF)
It is an advanced version of 3NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one
Department. They store the data like this:
Emp_id | Emp_nationality | Emp_dept | Dept_type | Dept_no_of_emp |
101 | Indian | Planning | D01 | 100 |
101 | Indian | Accounting | D01 | 50 |
102 | Japanese | Technical support | D14 | 300 |
102 | Japanese | Sales | D14 | 100 |
Functional dependencies in the table above:
Emp_id ->emp_nationality
Emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys. To bring this table in BCNF we can break this table in three tables like:
Emp_nationality table:
Emp_id | Emp_nationality |
101 | Indian |
102 | Japanese |
Emp_dept table:
Emp_dept | Dept_type | Dept_no_of_emp |
Planning | D01 | 100 |
Accounting | D01 | 50 |
Technical support | D14 | 300 |
Sales | D14 | 100 |
Emp_dept_mapping table:
Emp_id | Emp_dept |
101 | Planning |
101 | Accounting |
102 | Technical support |
102 | Sales |
Functional dependencies:
Emp_id ->; emp_nationality
Emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both functional dependencies the left side is a key.
Key takeaway:
- A relation r is in 1NF if and only if every tuple contains only atomic attributes means exactly one value for each attribute.
- A relation is in the 4NF if it is in BCNF and has no multi valued dependencies.
● Although multivalued and join dependencies are less popular than functional dependencies, they can be used to direct database design.
● Dependencies on inclusion are very common. They usually have little say in how the database is designed.
● A statement in which certain columns of a relation are contained in other columns is known as an inclusion dependence.
● A foreign key is an example of inclusion dependence. The referencing relation is found in the primary main column(s) of the referenced relation in one relation.
● Let's say we have two relations R and S that were created by translating two entity sets so that any R entity is also a S entity.
● If projecting R on its key attributes yields a relation that is embedded in the relation obtained by projecting S on its key attributes, inclusion dependence exists.
● We should not break groups of attributes that participate in an inclusion dependence in inclusion dependency.
● In fact, the majority of inclusion dependencies are key-based, meaning that only keys are involved.
Key takeaway:
- A statement in which certain columns of a relation are contained in other columns is known as an inclusion dependence.
- We should not break groups of attributes that participate in an inclusion dependence in inclusion dependency.
● If the data is not lost from the decomposing relationship, then the decomposition is lossless.
● The lossless decomposition ensures that the union of relationships can lead to the decomposition of the same relationship.
● If natural joins of all the decomposition give the original relation, the relationship is said to be lossless decomposition.
Now let's see an instance:
<EmpInfo>
Emp_ID | Emp_Name | Emp_Age | Emp_Location | Dept_ID | Dept_Name |
E101 | Jerry | 29 | Newyork | Dpt1 | Operations |
E102 | Henry | 32 | Newyork | Dpt2 | HR |
E103 | Tom | 22 | Texas | Dpt3 | Finance |
Decompose the table listed above into two tables:
<EmpDetails>
Emp_ID | Emp_Name | Emp_Age | Emp_Location |
E101 | Jerry | 29 | Newyork |
E102 | Henry | 32 | Newyork |
E103 | Tom | 22 | Texas |
<DeptDetails>
Dept_ID | Emp_ID | Dept_Name |
Dpt1 | E101 | Operations |
Dpt2 | E102 | HR |
Dpt3 | E103 | Finance |
Now, Natural Join is added to the two tables above,
The outcome will be −
EmpDetails ⋈ DeptDetails
Emp_ID | Emp_Name | Emp_Age | Emp_Location | Dept_ID | Dept_Name |
E101 | Jerry | 29 | Newyork | Dpt1 | Operations |
E102 | Henry | 32 | Newyork | Dpt2 | HR |
E103 | Tom | 22 | Texas | Dpt3 | Finance |
Therefore, the above relationship had decomposition without loss, i.e. no data loss.
Key takeaway:
- If the data is not lost from the decomposing relationship, then the decomposition is lossless.
MVD
When two attributes in a table are independent of one another but both rely on a third attribute, this is known as multivalued dependency.
A multivalued dependency includes at least three attributes since it consists of at least two attributes that are based on a third attribute.
Example: Assume that a bicycle manufacturer produces two colors (white and black) of each model each year.
BIKE_MODEL | MANUF_YEAR | COLOR |
M2011 | 2008 | White |
M2001 | 2008 | Black |
M3001 | 2013 | White |
M3001 | 2013 | Black |
M4006 | 2017 | White |
M4006 | 2017 | Black |
COLOR and MANUF YEAR are both based on BIKE MODEL and independent of each other in this case.
These two columns can be considered multivalued in this case since they are based on BIKE MODEL. The following diagram depicts these dependencies:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
"BIKE MODEL multi determined MANUF YEAR" and "BIKE MODEL multi determined COLOR" can be read as "BIKE MODEL multi determined MANUF YEAR" and "BIKE MODEL multi determined COLOR."
JDs
● Multivalued dependencies are further generalized by Join decomposition.
● We can assume that a join dependence (JD) exists if the join of R1 and R2 over C equals relation R.
● Where R1(A, B, C) and R2(C, D) are the decompositions of a given relation R. (A, B, C, D).
● R1 and R2, on the other hand, are lossless decompositions of R.
● A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.
● If the join of join's attribute equals the relation R, the *(A, B, C, D), (C, D) would be a JD of R.
● The symbol *(R1, R2, R3) denotes the relation R1, R2, R3, and so on are all JDs of R.
Key takeaway:
- When two attributes in a table are independent of one another but both rely on a third attribute, this is known as multivalued dependency.
- A multivalued dependency includes at least three attributes since it consists of at least two attributes that are based on a third attribute.
- Multivalued dependencies are further generalized by Join decomposition.
References:
- Korth, Silbertz, Sudarshan,” Database Concepts”, McGraw Hill
2. Date C J, “An Introduction to Database Systems”, Addision Wesley
3. Elmasri, Navathe, “ Fundamentals of Database Systems”, Addision Wesley
4. O’Neil, Databases, Elsevier Pub.
Unit – 3
Data Base Design & Normalization
Functional Dependency (FD) is a constraint in a database management system system that specifies the relationship of one attribute to another attribute (DBMS). Functional Dependency helps to maintain the database's data quality. Finding the difference between good and poor database design plays a critical role.
The arrow "→" signifies a functional dependence. X→ Y is defined by the functional dependence of X on Y.
Rules of functional dependencies
The three most important rules for Database Functional Dependence are below:
● Reflexive law: X holds a value of Y if X is a set of attributes and Y is a subset of X.
● Augmentation rule: If x -> y holds, and c is set as an attribute, then ac -> bc holds as well. That is to add attributes that do not modify the fundamental dependencies.
● Transitivity law: If x -> y holds and y -> z holds, this rule is very similar to the transitive rule in algebra, then x -> z also holds. X -> y is referred to as functionally evaluating y.
Types of functional dependency
Fig 1: Types of functional dependency
1. Trivial functional dependency
● A → B has trivial functional dependency if B is a subset of A.
● The following dependencies are also trivial like: A → A, B → B
Example
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.
2. Non - trivial functional dependencies
● A → B has a non-trivial functional dependency if B is not a subset of A.
● When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
Key takeaway:
- The arrow "→" signifies a functional dependence.
- Functional Dependency helps to maintain the database's data quality.
Normalization is often executed as a series of different forms. Each normal form has its own properties. As normalization proceeds, the relations become progressively more restricted in format, and also less vulnerable to update anomalies. For the relational data model, it is important to bring the relation only in first normal form (1NF) that is critical in creating relations. All the remaining forms are optional.
A relation R is said to be normalized if it does not create any anomaly for three basic operations: insert, delete and update
Fig 2: Types of normal forms
● First Normal Form (1NF):
A relation r is in 1NF if and only if every tuple contains only atomic attributes means exactly one value for each attribute. As per the rule of first normal form, an attribute of a table cannot hold multiple values. It should hold only atomic values.
Example: suppose Company has created an employee table to store name, address and mobile number of employees.
Emp_id | Emp_name | Emp_address | Emp_mobile |
101 | Rachel | Mumbai | 9817312390 |
102 | John | Pune | 7812324252, 9890012244 |
103 | Kim | Chennai | 9878381212 |
104 | Mary | Bangalore | 9895000123, 7723455987 |
In the above table, two employees John, Mary have two mobile numbers. So, the attribute emp_mobile is a Multivalued attribute.
So, this table is not in 1NF as the rule says, “Each attribute of a table must have atomic values”. But in the above table the attribute, emp_mobile, is not an atomic attribute as it contains multiple values. So, it violates the rule of 1 NF.
Following solution brings an employee table in 1NF.
Emp_id | Emp_name | Emp_address | Emp_mobile |
101 | Rachel | Mumbai | 9817312390 |
102 | John | Pune | 7812324252 |
102 | John | Pune | 9890012244 |
103 | Kim | Chennai | 9878381212 |
104 | Mary | Bangalore | 9895000123 |
104 | Mary | Bangalore | 7723455987 |
● Second Normal Form (2NF)
A table is said to be in 2NF if both of the following conditions are satisfied:
● Table is in 1 NF.
● No non-prime attribute is dependent on the proper subset of any candidate key of table. It means a non-prime attribute should fully functionally depend on the whole candidate key of a table. It should not depend on part of the key.
An attribute that is not part of any candidate key is known as a non-prime attribute.
Example: Suppose a school wants to store data of teachers and the subjects they teach. Since a teacher can teach more than one subject, the table can have multiple rows for the same teacher.
Teacher_id | Subject | Teacher_age |
111 | DSF | 28 |
111 | DBMS | 28 |
222 | CNT | 35 |
333 | OOPL | 38 |
333 | FDS | 38 |
For above table:
Candidate Keys: {Teacher_Id, Subject}
Non prime attribute: Teacher_Age
The table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non-prime attribute Teacher_Age is dependent on Teacher_Id alone which is a proper subset of candidate key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper subset of any candidate key of the table”.
To bring above table in 2NF we can break it in two tables (Teacher_Detalis and
Teacher_Subject) like this:
Teacher_Details table:
Teacher_id | Teacher_age |
111 | 28 |
222 | 35 |
333 | 38 |
Teacher_Subject table:
Teacher_id | Subject |
111 | DSF |
111 | DBMS |
222 | CNT |
333 | OOPL |
333 | FDS |
Now these two tables are in 2NF.
● Third Normal Form (3NF)
A table design is said to be in 3NF if both the following conditions hold:
● Table must be in 2NF.
● Transitive functional dependency from the relation must be removed.
So it can be stated that, a table is in 3NF if it is in 2NF and for each functional dependency
P->Q at least one of the following conditions hold:
● P is a super key of table
● Q is a prime attribute of table
An attribute that is a part of one of the candidate keys is known as a prime attribute.
Transitive functional dependency:
A functional dependency is said to be transitive if it is indirectly formed by two functional dependencies.
For example:
P->R is a transitive dependency if the following three functional dependencies hold true:
1) P->Q and
2) Q->R
Example: suppose a company wants to store information about employees. Then the table Employee_Details looks like this:
Emp_id | Emp_name | Manager_id | Mgr_Dept | Mgr_Name |
E1 | Hary | M1 | IT | William |
E2 | John | M1 | IT | William |
E3 | Nil | M2 | SALES | Stephen |
E4 | Mery | M3 | HR | Johnson |
E5 | Steve | M2 | SALSE | Stephen |
Super keys: {emp_id}, {emp_id, emp_name}, {emp_id, emp_name, Manager_id}
Candidate Key: {emp_id}
Non-prime attributes: All attributes except emp_id are non-prime as they are not subpart part of any candidate keys.
Here, Mgr_Dept, Mgr_Name depend on Manager_id. And, Manager_id is dependent on emp_id that makes non-prime attributes (Mgr_Dept, Mgr_Name) transitively dependent on super key (emp_id). This violates the rule of 3NF.
To bring this table in 3NF we have to break into two tables to remove transitive dependency.
Employee_Details table:
Emp_id | Emp_name | Manager_id |
E1 | Hary | M1 |
E2 | John | M1 |
E3 | Nil | M2 |
E4 | Mery | M3 |
E5 | Steve | M2 |
Manager_Details table:
Manager_id | Mgr_Dept | Mgr_Name |
M1 | IT | William |
M2 | SALSE | Stephen |
M3 | HR | Johnson |
● Boyce Codd normal form (BCNF)
It is an advanced version of 3NF. BCNF is stricter than 3NF. A table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the super key of the table.
Example: Suppose there is a company wherein employees work in more than one
Department. They store the data like this:
Emp_id | Emp_nationality | Emp_dept | Dept_type | Dept_no_of_emp |
101 | Indian | Planning | D01 | 100 |
101 | Indian | Accounting | D01 | 50 |
102 | Japanese | Technical support | D14 | 300 |
102 | Japanese | Sales | D14 | 100 |
Functional dependencies in the table above:
Emp_id ->emp_nationality
Emp_dept -> {dept_type, dept_no_of_emp}
Candidate key: {emp_id, emp_dept}
The table is not in BCNF as neither emp_id nor emp_dept alone are keys. To bring this table in BCNF we can break this table in three tables like:
Emp_nationality table:
Emp_id | Emp_nationality |
101 | Indian |
102 | Japanese |
Emp_dept table:
Emp_dept | Dept_type | Dept_no_of_emp |
Planning | D01 | 100 |
Accounting | D01 | 50 |
Technical support | D14 | 300 |
Sales | D14 | 100 |
Emp_dept_mapping table:
Emp_id | Emp_dept |
101 | Planning |
101 | Accounting |
102 | Technical support |
102 | Sales |
Functional dependencies:
Emp_id ->; emp_nationality
Emp_dept -> {dept_type, dept_no_of_emp}
Candidate keys:
For first table: emp_id
For second table: emp_dept
For third table: {emp_id, emp_dept}
This is now in BCNF as in both functional dependencies the left side is a key.
Key takeaway:
- A relation r is in 1NF if and only if every tuple contains only atomic attributes means exactly one value for each attribute.
- A relation is in the 4NF if it is in BCNF and has no multi valued dependencies.
● Although multivalued and join dependencies are less popular than functional dependencies, they can be used to direct database design.
● Dependencies on inclusion are very common. They usually have little say in how the database is designed.
● A statement in which certain columns of a relation are contained in other columns is known as an inclusion dependence.
● A foreign key is an example of inclusion dependence. The referencing relation is found in the primary main column(s) of the referenced relation in one relation.
● Let's say we have two relations R and S that were created by translating two entity sets so that any R entity is also a S entity.
● If projecting R on its key attributes yields a relation that is embedded in the relation obtained by projecting S on its key attributes, inclusion dependence exists.
● We should not break groups of attributes that participate in an inclusion dependence in inclusion dependency.
● In fact, the majority of inclusion dependencies are key-based, meaning that only keys are involved.
Key takeaway:
- A statement in which certain columns of a relation are contained in other columns is known as an inclusion dependence.
- We should not break groups of attributes that participate in an inclusion dependence in inclusion dependency.
● If the data is not lost from the decomposing relationship, then the decomposition is lossless.
● The lossless decomposition ensures that the union of relationships can lead to the decomposition of the same relationship.
● If natural joins of all the decomposition give the original relation, the relationship is said to be lossless decomposition.
Now let's see an instance:
<EmpInfo>
Emp_ID | Emp_Name | Emp_Age | Emp_Location | Dept_ID | Dept_Name |
E101 | Jerry | 29 | Newyork | Dpt1 | Operations |
E102 | Henry | 32 | Newyork | Dpt2 | HR |
E103 | Tom | 22 | Texas | Dpt3 | Finance |
Decompose the table listed above into two tables:
<EmpDetails>
Emp_ID | Emp_Name | Emp_Age | Emp_Location |
E101 | Jerry | 29 | Newyork |
E102 | Henry | 32 | Newyork |
E103 | Tom | 22 | Texas |
<DeptDetails>
Dept_ID | Emp_ID | Dept_Name |
Dpt1 | E101 | Operations |
Dpt2 | E102 | HR |
Dpt3 | E103 | Finance |
Now, Natural Join is added to the two tables above,
The outcome will be −
EmpDetails ⋈ DeptDetails
Emp_ID | Emp_Name | Emp_Age | Emp_Location | Dept_ID | Dept_Name |
E101 | Jerry | 29 | Newyork | Dpt1 | Operations |
E102 | Henry | 32 | Newyork | Dpt2 | HR |
E103 | Tom | 22 | Texas | Dpt3 | Finance |
Therefore, the above relationship had decomposition without loss, i.e. no data loss.
Key takeaway:
- If the data is not lost from the decomposing relationship, then the decomposition is lossless.
MVD
When two attributes in a table are independent of one another but both rely on a third attribute, this is known as multivalued dependency.
A multivalued dependency includes at least three attributes since it consists of at least two attributes that are based on a third attribute.
Example: Assume that a bicycle manufacturer produces two colors (white and black) of each model each year.
BIKE_MODEL | MANUF_YEAR | COLOR |
M2011 | 2008 | White |
M2001 | 2008 | Black |
M3001 | 2013 | White |
M3001 | 2013 | Black |
M4006 | 2017 | White |
M4006 | 2017 | Black |
COLOR and MANUF YEAR are both based on BIKE MODEL and independent of each other in this case.
These two columns can be considered multivalued in this case since they are based on BIKE MODEL. The following diagram depicts these dependencies:
BIKE_MODEL → → MANUF_YEAR
BIKE_MODEL → → COLOR
"BIKE MODEL multi determined MANUF YEAR" and "BIKE MODEL multi determined COLOR" can be read as "BIKE MODEL multi determined MANUF YEAR" and "BIKE MODEL multi determined COLOR."
JDs
● Multivalued dependencies are further generalized by Join decomposition.
● We can assume that a join dependence (JD) exists if the join of R1 and R2 over C equals relation R.
● Where R1(A, B, C) and R2(C, D) are the decompositions of a given relation R. (A, B, C, D).
● R1 and R2, on the other hand, are lossless decompositions of R.
● A JD ⋈ {R1, R2,..., Rn} is said to hold over a relation R if R1, R2,....., Rn is a lossless-join decomposition.
● If the join of join's attribute equals the relation R, the *(A, B, C, D), (C, D) would be a JD of R.
● The symbol *(R1, R2, R3) denotes the relation R1, R2, R3, and so on are all JDs of R.
Key takeaway:
- When two attributes in a table are independent of one another but both rely on a third attribute, this is known as multivalued dependency.
- A multivalued dependency includes at least three attributes since it consists of at least two attributes that are based on a third attribute.
- Multivalued dependencies are further generalized by Join decomposition.
References:
- Korth, Silbertz, Sudarshan,” Database Concepts”, McGraw Hill
2. Date C J, “An Introduction to Database Systems”, Addision Wesley
3. Elmasri, Navathe, “ Fundamentals of Database Systems”, Addision Wesley
4. O’Neil, Databases, Elsevier Pub.