Unit - 3
Database Design
Q1) Write short notes on dependencies and normal forms?
A1) Dependency is a constraint that determines how attributes are related. It also refers to a connection in which knowing the value of one characteristic in a database is sufficient to determine the value of another property in the same table.
Dependencies in DBMS is a relation between two or more attributes. It has the following types in DBMS −
● Functional Dependency
● Fully-Functional Dependency
● Transitive Dependency
● Multivalued Dependency
● Partial Dependency
Normal forms
A database is a collection of data that is stored in a structured format. These records can be accessed and altered in a variety of ways throughout the database. Data redundancy occurs when a portion of the same data information is saved or replicated in multiple locations within the database due to the vast range of storage available.
When it comes to storage, consistency, and data processing, a database with redundancy is at a disadvantage. Normalization is one of the approaches used to alleviate data redundancy issues within a database.
In a database management system, normalization is the process of properly structuring data into many relational tables in order to reduce data redundancy.
It is crucial to comprehend functional dependency before understanding the standard forms.
A functional dependency is a relationship between two qualities, usually a prime (primary key) and non-prime attributes. The functional dependency for a table X comprising characteristics A and B, among which attribute A is a primary key, is defined as:
The term "database normalization" refers to a method of structuring data in a database. Normalization is a method of systematically dissecting tables in order to remove data redundancy (repetition) and undesired characteristics such as Insertion, Update, and Deletion Anomalies. It's a multi-step procedure for converting data into tabular format and deleting duplicate data from relation tables.
Normalization is done for two main reasons.
● Data that is redundant (or unnecessary) is removed.
● Assuring that data dependencies make sense, i.e. that data is stored properly.
Types of Normal forms
● 1NF (First Normal Form)
● 2NF (Second Normal Form)
● 3NF (Third Normal Form)
● BCNF (Boyce-Codd Normal Form)
● 4NF (Fourth Normal Form)
● 5NF (Fifth Normal Form)
Q2) What is functional dependencies?
A2) Functional Dependency (FD) is a constraint in a database management system that specifies the relationship of one attribute to another attribute (DBMS). Functional Dependency helps to maintain the database's data quality. Finding the difference between good and poor database design plays a critical role.
The arrow "→" signifies a functional dependence. X→ Y is defined by the functional dependence of X on Y.
Rules of functional dependencies
The three most important rules for Database Functional Dependence are below:
● Reflexive law: X holds a value of Y if X is a set of attributes and Y is a subset of X.
● Augmentation rule: If x -> y holds, and c is set as an attribute, then ac -> bc holds as well. That is to add attributes that do not modify the fundamental dependencies.
● Transitivity law: If x -> y holds and y -> z holds, this rule is very similar to the transitive rule in algebra, then x -> z also holds. X -> y is referred to as functionally evaluating y.
Types of functional dependency
Fig: Types of functional dependency
1. Trivial functional dependency
● A → B has trivial functional dependency if B is a subset of A.
● The following dependencies are also trivial like: A → A, B → B
Example
Consider a table with two columns Employee_Id and Employee_Name.
{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as
Employee_Id is a subset of {Employee_Id, Employee_Name}.
Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.
2. Non - trivial functional dependencies
● A → B has a non-trivial functional dependency if B is not a subset of A.
● When A intersection B is NULL, then A → B is called as complete non-trivial.
Example:
ID → Name,
Name → DOB
Q3) What are the Armstrong’s axioms for FD?
A3) The Axioms of Armstrong are a series of rules. It provides a straightforward method for deducing functional dependencies. It was created in 1974 by William W. Armstrong. It's used to deduce all of a relational database's functional dependencies.
The phrase Armstrong axioms refers to William W.Armstrong's sound and full collection of inference rules or axioms for testing the logical implication of functional dependencies. If F is a collection of functional dependencies, then F+ is the collection of all functional dependencies logically suggested by F. Armstrong's Axioms are a set of rules that, when regularly followed, result in the closure of functional dependencies.
Various Axioms Rules
- Primary Rules
Rule 1- Reflexivity
If A is a set of attributes and B is a subset of A, then A holds B. { A → B }.
Rule 2 - Augmentation
If A hold B and C is a set of attributes, then AC holds BC. {AC → BC}.
It means that adding an attribute to a dependency does not affect the basic dependencies.
Rule 3 - Transitivity
If A is holding B and B is holding C, then A is holding C.
If {A → B} and {B → C}, then {A → C}
A holds B {A → B} means that A functionally determines B.
- Secondary Rules
Rule 1- Union
A holds BC if A holds B and A holds C.
If{A → B} and {A → C}, then {A → BC}
Rule 2 - Decomposition
A holds C if BC is held by A and B is held by A.
If{A → BC} and {A → B}, then {A → C}
Rule 3 - Pseudo Transitivity
If A is in possession of B and BC is in possession of D, then AC is in possession of D.
If{A → B} and {BC → D}, then {AC → D}
Q4) Write closure set of FD?
A4) Closure of an Attribute: Closure of an Attribute can be defined as a set of attributes that can be functionally determined from it.
OR
Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F
Closure of a set of attributes X concerning F is the set X+ of all attributes that are functionally determined by X
Pseudocode to find Closure of an Attribute
Determine X+, the closure of X under functional dependency set F
X Closure: = will contain X itself;
Repeat the process as:
Old X Closure: = X Closure;
For each functional dependency P → Q in FD set do
If X Closure is subset of P then X Closure: = X Closure U Q ;
Repeat until (X Closure = old X Closure);
Algorithm of Determining X+, the Closure of X under F
Input: A set F of FDs on a relation schema R, and a set of attributes X, which is a
Subset of R.
- X+: = X;
- Repeat
- OldX+: = X+;
- For each functional dependency Y → Z in F do
- If X+ ⊇ Y then X+: = X+ ∪ Z;
- Until (X+ = oldX+);
QUESTIONS ON CLOSURE SET OF ATTRIBUTE:
1) Given relational schema R( P Q R S T U V) having following attribute P Q R S T U and V, also there is a set of functional dependency denoted by FD = { P->Q, QR->ST, PTV->V }.
Determine Closure of (QR)+ and (PR)+
a) QR+ = QR (as the closure of an attribute or set of attributes contain same).
Now as per algorithm look into a set of FD that complete the left side of any FD contains either Q, R, or QR since in FD QR→ST has complete QR.
Hence QR+ = QRST
Again, trace the remaining two FD that any left part of FD contains any Q, R, S, T.
Since no complete left side of the remaining two FD{P->Q, PTV->V} contain Q, R, S, T.
Therefore QR+ = QRST (Answer)
Note: In FD PTV→V, T is in QRST but that cannot be entertained, as complete PTV should be a subset of QRST
b) PR + = PR (as the closure of an attribute or set of attributes contain same)
Now as per algorithm look into a set of FD, and check that complete left side of any FD contains either P, R, or PR. Since in FD P→Q, P is a subset of PR, Hence PR+ = PRQ
Again, trace the remaining two FD that any left part of FD contains any P, R, Q, Since, in FD QR → ST has its complete left part QR in PQR
Hence PR+ = PRQST
Again trace the remaining one FD {PTV->V } that its complete left belongs to PRQST. Since complete PTV is not in PRQST, hence we ignore it.
Therefore PR+ = PRQST (Answer)
2. Given relational schema R(P Q R S T) having following attributes P Q R S and T, also there is a set of functional dependency denoted by FD = { P->QR, RS->T, Q->S, T-> P }.
Determine Closure of (T)+
T + = T (as the closure of an attribute or set of attributes contain same) Now as per algorithm look into a set of FD that complete the left side of any FD contains T since, in FD T → P, T is in T, Hence T+ = TP Again trace the remaining three FD that any left part of FD contain any TP, Since in FD P → QR has its complete left part P in TP, Hence T+ = TPQR Again trace the remaining two FD { RS->T, Q->S } that any of its Complete left belongs to TPQR,
Since in FD Q → S has its complete left part Q in TPQR, Hence T+ = TPQRS Again trace the remaining one FD {RS->T } that its complete left belongs to TPQRS, Since in FD RS → T has its complete left part RS in TPQRS Hence T+ = TPQRS ( no changes, as T, is already in TPQRS) Therefore T+ = TPQRS ( Answer).
Q5) Explain minimal cover?
A5) A basic protection for a group of functional dependencies (FD) E is a collection of minimum dependencies F that is equivalent to E.
The formal definition is as follows: A set of FD F is minimal if it meets the following conditions:
● Every right-hand side dependence in F has a single attribute.
● We can't substitute any X->A dependency in F with a Y->A dependency, where Y is a suitable subset of X, and still have a collection of dependencies comparable to F.
● We can't take away any of F's dependencies and still have a set of dependencies that are comparable to F.
Canonical cover is referred to as minimal cover, and the minimum set of FD’s is referred to as the minimum set of FDs. If each FD in FC is a, a set of FD FC is termed canonical cover of F.
● Simple FD.
● Left reduced FD.
● Non-redundant FD.
Simple FD - If Y is a single attribute, X->Y is a simple FD.
Left reduced FD - If X has no unnecessary attributes, X->Y is a left reduced FD. {Extraneous attributes: If XA->Y, then A is an extraneous attribute.}
Non-reduced FD - If X->Y cannot be deduced from F- {X->y}, it is a non-redundant FD.
Example - Consider the following scenario for locating the canonical cover of F.
The following are the functional dependencies:
A -> BC
B -> C
A -> B
AB -> C
Minimal cover: The set of FD’s that are comparable to the supplied FD’s is known as the minimal cover.
Canonical cover: The LHS (Left Hand Side) of a canonical cover must be distinct.
The minimum cover will be found first, followed by the canonical cover.
First step - Toggle between RHS and singleton attributes.
A -> B
A -> C
B -> C
A -> B
AB -> C
Second step - Remove the LHS attribute that isn't needed.
Locate A's closure.
A+ = {A, B, C}
So, AB -> C can be converted into A -> C
A -> B
A -> C
B -> C
A -> B
A -> C
Third step - Remove the FD’s that are no longer needed.
A -> B
B -> C
The aforementioned set of FD’s will now be converted into canonical cover.
The following will be the canonical cover for the above set of FD’s:
A -> BC
B -> C
Q6) What is first normal forms(1NF)?
A6) A relation will be 1NF if it contains an atomic value.
● It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.
● First normal form disallows the multi-valued attribute, composite attribute, and their combinations.
Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.
EMPLOYEE table:
EMP_ID | EMP_NAME | EMP_PHONE | EMP_STATE |
14 | John | 7272826385, 9064738238 | UP |
20 | Harry | 8574783832 | Bihar |
12 | Sam | 7390372389, 8589830302 | Punjab |
The decomposition of the EMPLOYEE table into 1NF has been shown below:
EMP_ID | EMP_NAME | EMP_PHONE | EMP_STATE |
14 | John | 7272826385 | UP |
14 | John | 9064738238 | UP |
20 | Harry | 8574783832 | Bihar |
12 | Sam | 7390372389 | Punjab |
12 | Sam | 8589830302 | Punjab |
Q7) Define second normal forms (2NF)?
A7) In the 2NF, relational must be in 1NF.
In the second normal form, all non-key attributes are fully functional dependent on the primary key
Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject.
TEACHER table
TEACHER_ID | SUBJECT | TEACHER_AGE |
25 | Chemistry | 30 |
25 | Biology | 30 |
47 | English | 35 |
83 | Math | 38 |
83 | Computer | 38 |
In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF.
To convert the given table into 2NF, we decompose it into two tables:
TEACHER_DETAIL table:
TEACHER_ID | TEACHER_AGE |
25 | 30 |
47 | 35 |
83 | 38 |
TEACHER_SUBJECT table:
TEACHER_ID | SUBJECT |
25 | Chemistry |
25 | Biology |
47 | English |
83 | Math |
83 | Computer |
Q8) Write about third normal forms (3NF)?
A8) A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.
● 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.
● If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.
A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X → Y.
- X is a super key.
- Y is a prime attribute, i.e., each element of Y is part of some candidate key.
Example:
EMPLOYEE_DETAIL table:
EMP_ID | EMP_NAME | EMP_ZIP | EMP_STATE | EMP_CITY |
222 | Harry | 201010 | UP | Noida |
333 | Stephan | 02228 | US | Boston |
444 | Lan | 60007 | US | Chicago |
555 | Katharine | 06389 | UK | Norwich |
666 | John | 462007 | MP | Bhopal |
Super key in the table above:
- {EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on
Candidate key: {EMP_ID}
Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.
Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.
That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.
EMPLOYEE table:
EMP_ID | EMP_NAME | EMP_ZIP |
222 | Harry | 201010 |
333 | Stephan | 02228 |
444 | Lan | 60007 |
555 | Katharine | 06389 |
666 | John | 462007 |
EMPLOYEE_ZIP table:
EMP_ZIP | EMP_STATE | EMP_CITY |
201010 | UP | Noida |
02228 | US | Boston |
60007 | US | Chicago |
06389 | UK | Norwich |
462007 | MP | Bhopal |
Q9) What is Boyce code normal forms (BCNF)?
A9) BCNF is the advance version of 3NF. It is stricter than 3NF.
● A table is in BCNF if every functional dependency X → Y, X is the super key of the table.
● For BCNF, the table should be in 3NF, and for every FD, LHS is super key.
Example: Let's assume there is a company where employees work in more than one department.
EMPLOYEE table:
EMP_ID | EMP_COUNTRY | EMP_DEPT | DEPT_TYPE | EMP_DEPT_NO |
264 | India | Designing | D394 | 283 |
264 | India | Testing | D394 | 300 |
364 | UK | Stores | D283 | 232 |
364 | UK | Developing | D283 | 549 |
In the above table Functional dependencies are as follows:
- EMP_ID → EMP_COUNTRY
- EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate key: {EMP-ID, EMP-DEPT}
The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.
To convert the given table into BCNF, we decompose it into three tables:
EMP_COUNTRY table:
EMP_ID | EMP_COUNTRY |
264 | India |
264 | India |
EMP_DEPT table:
EMP_DEPT | DEPT_TYPE | EMP_DEPT_NO |
Designing | D394 | 283 |
Testing | D394 | 300 |
Stores | D283 | 232 |
Developing | D283 | 549 |
EMP_DEPT_MAPPING table:
EMP_ID | EMP_DEPT |
D394 | 283 |
D394 | 300 |
D283 | 232 |
D283 | 549 |
Functional dependencies:
- EMP_ID → EMP_COUNTRY
- EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}
Candidate keys:
For the first table: EMP_ID
For the second table: EMP_DEPT
For the third table: {EMP_ID, EMP_DEPT}
Now, this is in BCNF because left side part of both the functional dependencies is a key.
Q10) Explain fourth normal forms (4NF)?
A10) A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.
● For a dependency A→ B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency.
Example
STUDENT
STU_ID | COURSE | HOBBY |
21 | Computer | Dancing |
21 | Math | Singing |
34 | Chemistry | Dancing |
74 | Biology | Cricket |
59 | Physics | Hockey |
The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY.
In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.
So to make the above table into 4NF, we can decompose it into two tables:
STUDENT_COURSE
STU_ID | COURSE |
21 | Computer |
21 | Math |
34 | Chemistry |
74 | Biology |
59 | Physics |
STUDENT_HOBBY
STU_ID | HOBBY |
21 | Dancing |
21 | Singing |
34 | Dancing |
74 | Cricket |
59 | Hockey |
Q11) Define fifth normal forms (5NF)?
A11) A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.
● 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.
● 5NF is also known as Project-join normal form (PJ/NF).
Example
SUBJECT | LECTURER | SEMESTER |
Computer | Anshika | Semester 1 |
Computer | John | Semester 1 |
Math | John | Semester 1 |
Math | Akash | Semester 2 |
Chemistry | Praveen | Semester 1 |
In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In this case, combination of all these fields required to identify a valid data.
Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns blank.
So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:
P1
SEMESTER | SUBJECT |
Semester 1 | Computer |
Semester 1 | Math |
Semester 1 | Chemistry |
Semester 2 | Math |
P2
SUBJECT | LECTURER |
Computer | Anshika |
Computer | John |
Math | John |
Math | Akash |
Chemistry | Praveen |
P3
SEMSTER | LECTURER |
Semester 1 | Anshika |
Semester 1 | John |
Semester 1 | John |
Semester 2 | Akash |
Semester 1 | Praveen |
Q12) What is decompositions?
A12) Relational Decomposition
● When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is required.
● In a database, it breaks the table into multiple tables.
● If the relation has no proper decomposition, then it may lead to problems like loss of information.
● Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.
Types of Decomposition
Fig - Decomposition
Lossless Decomposition
● If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.
● The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed.
● The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.
Example:
EMPLOYEE_DEPARTMENT table:
EMP_ID | EMP_NAME | EMP_AGE | EMP_CITY | DEPT_ID | DEPT_NAME |
22 | Denim | 28 | Mumbai | 827 | Sales |
33 | Alina | 25 | Delhi | 438 | Marketing |
46 | Stephan | 30 | Bangalore | 869 | Finance |
52 | Katherine | 36 | Mumbai | 575 | Production |
60 | Jack | 40 | Noida | 678 | Testing |
The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT
EMPLOYEE table:
EMP_ID | EMP_NAME | EMP_AGE | EMP_CITY |
22 | Denim | 28 | Mumbai |
33 | Alina | 25 | Delhi |
46 | Stephan | 30 | Bangalore |
52 | Katherine | 36 | Mumbai |
60 | Jack | 40 | Noida |
DEPARTMENT table
DEPT_ID | EMP_ID | DEPT_NAME |
827 | 22 | Sales |
438 | 33 | Marketing |
869 | 46 | Finance |
575 | 52 | Production |
678 | 60 | Testing |
Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look like:
Employee ⋈ Department
EMP_ID | EMP_NAME | EMP_AGE | EMP_CITY | DEPT_ID | DEPT_NAME |
22 | Denim | 28 | Mumbai | 827 | Sales |
33 | Alina | 25 | Delhi | 438 | Marketing |
46 | Stephan | 30 | Bangalore | 869 | Finance |
52 | Katherine | 36 | Mumbai | 575 | Production |
60 | Jack | 40 | Noida | 678 | Testing |
Hence, the decomposition is Lossless join decomposition.
Q13) What are the decomposition algorithm for 3NF?
A13) By explicitly generating a schema for each dependency in the canonical cover, the decomposition procedure for 3NF ensures that dependencies are preserved. It assures that at least one schema has a candidate key for the one being decomposed, ensuring that the decomposition created is a lossless decomposition.
Decomposition Algorithm
Let Fc be a canonical cover for F;
i=0;
For each functional dependency α->β in Fc
i = i+1;
R = αβ;
If none of the schemas Rj, j=1,2,…I holds a candidate key for R
Then
i = i+1;
Ri= any candidate key for R;
/* Optionally, remove the repetitive relations*/
Repeat
If any schema Rj is contained in another schema Rk
Then
/* Delete Rj */
Rj = Ri;
i = i-1;
Until no more Rjs can be deleted
Return (R1, R2, . . . ,Ri)
The supplied relation is R, and the given collection of functional dependencies is F, for which Fc maintains the canonical cover. The decomposed portions of the given relation R are R1, R2,..., Ri. As a result, this technique preserves the dependency while also generating a lossless decomposition of R.
A 3NF synthesis algorithm is another name for a 3NF algorithm. It's called so because the regular form works with a dependency set and adds one schema at a time, rather than repeatedly dissecting the basic schema.
Q14) What are the decomposition algorithm for BCNF?
A14) It is important to check if the given relation is in Boyce-Codd Normal Form before applying the BCNF decomposition technique on it. If it is discovered that the supplied relation is not in BCNF after the test, we can decompose it further to produce BCNF relations.
The following situations necessitate determining whether the supplied relation schema R follows the BCNF rule:
Case 1: Evaluate and compute α+, i.e., the attribute closure of to see if a nontrivial dependency α -> β violates the BCNF rule. Check that + contains all of the attributes of the supplied relation R. It should, therefore, be the super key of relation R.
Case 2: It is not necessary to test all of the dependencies in F+ if the specified relation R is in BCNF. For the BCNF test, all that is required is detecting and checking the dependencies in the specified dependency list F. It's because if no dependent in F violates BCNF, then none of the F+ dependencies will as well.
Decomposition Algorithm
If the supplied relation R is deconstructed into numerous relations R1, R2,..., Rn since it was not found in the BCNF, this algorithm is employed. Thus,
We must validate that α+ (an attribute closure of under F) either includes all the attributes of the relation Ri or no attribute of Ri-α for each subset of attributes in the relation Ri.
Result={R};
Done=false;
Compute F+;
While (not done) do
If (there is a schema Ri in result that is not in BCNF)
Then begin
Let α->β be a nontrivial functional dependency that holds
On Ri such that α->Ri is not in F+, and α ꓵ β= ø;
Result=(result-Ri) U (Ri-β) U (α,β);
End
Else done=true;
This procedure is used to break down a given relation R into its decomposers. This approach does the breakdown of the relation R using dependencies that demonstrate the violation of BCNF. As a result, such an algorithm not only generates relation R decomposers in BCNF, but it is also a lossless decomposition. It signifies that no data is lost when the specified relation R is decomposed into R1, R2, and so on...
The time it takes for the BCNF decomposition procedure to complete is proportional to the size of the original relation schema R. As a result, one disadvantage of this technique is that it may excessively breakdown the given relation R, i.e., over-normalize it.
The decomposition methods for BCNF and 4NF are nearly identical, with one exception. The fourth normal form is concerned with multivalued dependencies, while BCNF is concerned with functional dependencies. The multivalued dependencies aid in reducing data repetition, which is difficult to comprehend in terms of functional relationships.