Unit 3 | unit 3 database design

DBMS

Unit - 3

Database Design

Q1) Write short notes on dependencies and normal forms?

A1) Dependency is a constraint that determines how attributes are related. It also refers to a connection in which knowing the value of one characteristic in a database is sufficient to determine the value of another property in the same table.

Dependencies in DBMS is a relation between two or more attributes. It has the following types in DBMS −

● Functional Dependency

● Fully-Functional Dependency

● Transitive Dependency

● Multivalued Dependency

● Partial Dependency

Normal forms

A database is a collection of data that is stored in a structured format. These records can be accessed and altered in a variety of ways throughout the database. Data redundancy occurs when a portion of the same data information is saved or replicated in multiple locations within the database due to the vast range of storage available.

When it comes to storage, consistency, and data processing, a database with redundancy is at a disadvantage. Normalization is one of the approaches used to alleviate data redundancy issues within a database.

In a database management system, normalization is the process of properly structuring data into many relational tables in order to reduce data redundancy.

It is crucial to comprehend functional dependency before understanding the standard forms.

A functional dependency is a relationship between two qualities, usually a prime (primary key) and non-prime attributes. The functional dependency for a table X comprising characteristics A and B, among which attribute A is a primary key, is defined as:

The term "database normalization" refers to a method of structuring data in a database. Normalization is a method of systematically dissecting tables in order to remove data redundancy (repetition) and undesired characteristics such as Insertion, Update, and Deletion Anomalies. It's a multi-step procedure for converting data into tabular format and deleting duplicate data from relation tables.

Normalization is done for two main reasons.

● Data that is redundant (or unnecessary) is removed.

● Assuring that data dependencies make sense, i.e. that data is stored properly.

Types of Normal forms

● 1NF (First Normal Form)

● 2NF (Second Normal Form)

● 3NF (Third Normal Form)

● BCNF (Boyce-Codd Normal Form)

● 4NF (Fourth Normal Form)

● 5NF (Fifth Normal Form)

Q2) What is functional dependencies?

A2) Functional Dependency (FD) is a constraint in a database management system that specifies the relationship of one attribute to another attribute (DBMS). Functional Dependency helps to maintain the database's data quality. Finding the difference between good and poor database design plays a critical role.

The arrow "→" signifies a functional dependence. X→ Y is defined by the functional dependence of X on Y.

Rules of functional dependencies

The three most important rules for Database Functional Dependence are below:

● Reflexive law: X holds a value of Y if X is a set of attributes and Y is a subset of X.

● Augmentation rule: If x -> y holds, and c is set as an attribute, then ac -> bc holds as well. That is to add attributes that do not modify the fundamental dependencies.

● Transitivity law: If x -> y holds and y -> z holds, this rule is very similar to the transitive rule in algebra, then x -> z also holds. X -> y is referred to as functionally evaluating y.

Types of functional dependency

Fig: Types of functional dependency

1. Trivial functional dependency

● A → B has trivial functional dependency if B is a subset of A.

● The following dependencies are also trivial like: A → A, B → B

Example

Consider a table with two columns Employee_Id and Employee_Name.

{Employee_id, Employee_Name} → Employee_Id is a trivial functional dependency as

Employee_Id is a subset of {Employee_Id, Employee_Name}.

Also, Employee_Id → Employee_Id and Employee_Name → Employee_Name are trivial dependencies too.

2. Non - trivial functional dependencies

● A → B has a non-trivial functional dependency if B is not a subset of A.

● When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

ID → Name,

Name → DOB

Q3) What are the Armstrong’s axioms for FD?

A3) The Axioms of Armstrong are a series of rules. It provides a straightforward method for deducing functional dependencies. It was created in 1974 by William W. Armstrong. It's used to deduce all of a relational database's functional dependencies.

The phrase Armstrong axioms refers to William W.Armstrong's sound and full collection of inference rules or axioms for testing the logical implication of functional dependencies. If F is a collection of functional dependencies, then F+ is the collection of all functional dependencies logically suggested by F. Armstrong's Axioms are a set of rules that, when regularly followed, result in the closure of functional dependencies.

Various Axioms Rules

Primary Rules

Rule 1- Reflexivity

If A is a set of attributes and B is a subset of A, then A holds B. { A → B }.

Rule 2 - Augmentation

If A hold B and C is a set of attributes, then AC holds BC. {AC → BC}.

It means that adding an attribute to a dependency does not affect the basic dependencies.

Rule 3 - Transitivity

If A is holding B and B is holding C, then A is holding C.

If {A → B} and {B → C}, then {A → C}

A holds B {A → B} means that A functionally determines B.

Secondary Rules

Rule 1- Union

A holds BC if A holds B and A holds C.

If{A → B} and {A → C}, then {A → BC}

Rule 2 - Decomposition

A holds C if BC is held by A and B is held by A.

If{A → BC} and {A → B}, then {A → C}

Rule 3 - Pseudo Transitivity

If A is in possession of B and BC is in possession of D, then AC is in possession of D.

If{A → B} and {BC → D}, then {AC → D}

Q4) Write closure set of FD?

A4) Closure of an Attribute: Closure of an Attribute can be defined as a set of attributes that can be functionally determined from it.

Closure of a set F of FDs is the set F+ of all FDs that can be inferred from F

Closure of a set of attributes X concerning F is the set X+ of all attributes that are functionally determined by X

Pseudocode to find Closure of an Attribute

Determine X+, the closure of X under functional dependency set F

X Closure: = will contain X itself;

Repeat the process as:

Old X Closure: = X Closure;

For each functional dependency P → Q in FD set do

If X Closure is subset of P then X Closure: = X Closure U Q ;

Repeat until (X Closure = old X Closure);

Algorithm of Determining X+, the Closure of X under F

Input: A set F of FDs on a relation schema R, and a set of attributes X, which is a

Subset of R.

X+: = X;
Repeat
OldX+: = X+;
For each functional dependency Y → Z in F do
If X+ ⊇ Y then X+: = X+ ∪ Z;
Until (X+ = oldX+);

QUESTIONS ON CLOSURE SET OF ATTRIBUTE:

1) Given relational schema R( P Q R S T U V) having following attribute P Q R S T U and V, also there is a set of functional dependency denoted by FD = { P->Q, QR->ST, PTV->V }.

Determine Closure of (QR)+ and (PR)+

a) QR+ = QR (as the closure of an attribute or set of attributes contain same).

Now as per algorithm look into a set of FD that complete the left side of any FD contains either Q, R, or QR since in FD QR→ST has complete QR.

Hence QR+ = QRST

Again, trace the remaining two FD that any left part of FD contains any Q, R, S, T.

Since no complete left side of the remaining two FD{P->Q, PTV->V} contain Q, R, S, T.

Therefore QR+ = QRST (Answer)

Note: In FD PTV→V, T is in QRST but that cannot be entertained, as complete PTV should be a subset of QRST

b) PR + = PR (as the closure of an attribute or set of attributes contain same)

Now as per algorithm look into a set of FD, and check that complete left side of any FD contains either P, R, or PR. Since in FD P→Q, P is a subset of PR, Hence PR+ = PRQ

Again, trace the remaining two FD that any left part of FD contains any P, R, Q, Since, in FD QR → ST has its complete left part QR in PQR

Hence PR+ = PRQST

Again trace the remaining one FD {PTV->V } that its complete left belongs to PRQST. Since complete PTV is not in PRQST, hence we ignore it.

Therefore PR+ = PRQST (Answer)

2. Given relational schema R(P Q R S T) having following attributes P Q R S and T, also there is a set of functional dependency denoted by FD = { P->QR, RS->T, Q->S, T-> P }.

Determine Closure of (T)+

T + = T (as the closure of an attribute or set of attributes contain same) Now as per algorithm look into a set of FD that complete the left side of any FD contains T since, in FD T → P, T is in T, Hence T+ = TP Again trace the remaining three FD that any left part of FD contain any TP, Since in FD P → QR has its complete left part P in TP, Hence T+ = TPQR Again trace the remaining two FD { RS->T, Q->S } that any of its Complete left belongs to TPQR,

Since in FD Q → S has its complete left part Q in TPQR, Hence T+ = TPQRS Again trace the remaining one FD {RS->T } that its complete left belongs to TPQRS, Since in FD RS → T has its complete left part RS in TPQRS Hence T+ = TPQRS ( no changes, as T, is already in TPQRS) Therefore T+ = TPQRS ( Answer).

Q5) Explain minimal cover?

A5) A basic protection for a group of functional dependencies (FD) E is a collection of minimum dependencies F that is equivalent to E.

The formal definition is as follows: A set of FD F is minimal if it meets the following conditions:

● Every right-hand side dependence in F has a single attribute.

● We can't substitute any X->A dependency in F with a Y->A dependency, where Y is a suitable subset of X, and still have a collection of dependencies comparable to F.

● We can't take away any of F's dependencies and still have a set of dependencies that are comparable to F.

Canonical cover is referred to as minimal cover, and the minimum set of FD’s is referred to as the minimum set of FDs. If each FD in FC is a, a set of FD FC is termed canonical cover of F.

● Simple FD.

● Left reduced FD.

● Non-redundant FD.

Simple FD - If Y is a single attribute, X->Y is a simple FD.

Left reduced FD - If X has no unnecessary attributes, X->Y is a left reduced FD. {Extraneous attributes: If XA->Y, then A is an extraneous attribute.}

Non-reduced FD - If X->Y cannot be deduced from F- {X->y}, it is a non-redundant FD.

Example - Consider the following scenario for locating the canonical cover of F.

The following are the functional dependencies:

A -> BC

B -> C

A -> B

AB -> C

Minimal cover: The set of FD’s that are comparable to the supplied FD’s is known as the minimal cover.

Canonical cover: The LHS (Left Hand Side) of a canonical cover must be distinct.

The minimum cover will be found first, followed by the canonical cover.

First step - Toggle between RHS and singleton attributes.

A -> B

A -> C

B -> C

A -> B

AB -> C

Second step - Remove the LHS attribute that isn't needed.

Locate A's closure.

A+ = {A, B, C}

So, AB -> C can be converted into A -> C

A -> B

A -> C

B -> C

A -> B

A -> C

Third step - Remove the FD’s that are no longer needed.

A -> B

B -> C

The aforementioned set of FD’s will now be converted into canonical cover.

The following will be the canonical cover for the above set of FD’s:

A -> BC

B -> C

Q6) What is first normal forms(1NF)?

A6) A relation will be 1NF if it contains an atomic value.

● It states that an attribute of a table cannot hold multiple values. It must hold only single-valued attribute.

● First normal form disallows the multi-valued attribute, composite attribute, and their combinations.

Example: Relation EMPLOYEE is not in 1NF because of multi-valued attribute EMP_PHONE.

EMPLOYEE table:

EMP_ID	EMP_NAME	EMP_PHONE	EMP_STATE
14	John	7272826385, 9064738238	UP
20	Harry	8574783832	Bihar
12	Sam	7390372389, 8589830302	Punjab

The decomposition of the EMPLOYEE table into 1NF has been shown below:

EMP_ID	EMP_NAME	EMP_PHONE	EMP_STATE
14	John	7272826385	UP
14	John	9064738238	UP
20	Harry	8574783832	Bihar
12	Sam	7390372389	Punjab
12	Sam	8589830302	Punjab

Q7) Define second normal forms (2NF)?

A7) In the 2NF, relational must be in 1NF.

In the second normal form, all non-key attributes are fully functional dependent on the primary key

Example: Let's assume, a school can store the data of teachers and the subjects they teach. In a school, a teacher can teach more than one subject.

TEACHER table

TEACHER_ID	SUBJECT	TEACHER_AGE
25	Chemistry	30
25	Biology	30
47	English	35
83	Math	38
83	Computer	38

In the given table, non-prime attribute TEACHER_AGE is dependent on TEACHER_ID which is a proper subset of a candidate key. That's why it violates the rule for 2NF.

To convert the given table into 2NF, we decompose it into two tables:

TEACHER_DETAIL table:

TEACHER_ID	TEACHER_AGE
25	30
47	35
83	38

TEACHER_SUBJECT table:

TEACHER_ID	SUBJECT
25	Chemistry
25	Biology
47	English
83	Math
83	Computer

Q8) Write about third normal forms (3NF)?

A8) A relation will be in 3NF if it is in 2NF and not contain any transitive partial dependency.

● 3NF is used to reduce the data duplication. It is also used to achieve the data integrity.

● If there is no transitive dependency for non-prime attributes, then the relation must be in third normal form.

A relation is in third normal form if it holds atleast one of the following conditions for every non-trivial function dependency X → Y.

X is a super key.
Y is a prime attribute, i.e., each element of Y is part of some candidate key.

Example:

EMPLOYEE_DETAIL table:

EMP_ID	EMP_NAME	EMP_ZIP	EMP_STATE	EMP_CITY
222	Harry	201010	UP	Noida
333	Stephan	02228	US	Boston
444	Lan	60007	US	Chicago
555	Katharine	06389	UK	Norwich
666	John	462007	MP	Bhopal

Super key in the table above:

{EMP_ID}, {EMP_ID, EMP_NAME}, {EMP_ID, EMP_NAME, EMP_ZIP}....so on

Candidate key: {EMP_ID}

Non-prime attributes: In the given table, all attributes except EMP_ID are non-prime.

Here, EMP_STATE & EMP_CITY dependent on EMP_ZIP and EMP_ZIP dependent on EMP_ID. The non-prime attributes (EMP_STATE, EMP_CITY) transitively dependent on super key(EMP_ID). It violates the rule of third normal form.

That's why we need to move the EMP_CITY and EMP_STATE to the new <EMPLOYEE_ZIP> table, with EMP_ZIP as a Primary key.

EMPLOYEE table:

EMP_ID	EMP_NAME	EMP_ZIP
222	Harry	201010
333	Stephan	02228
444	Lan	60007
555	Katharine	06389
666	John	462007

EMPLOYEE_ZIP table:

EMP_ZIP	EMP_STATE	EMP_CITY
201010	UP	Noida
02228	US	Boston
60007	US	Chicago
06389	UK	Norwich
462007	MP	Bhopal

Q9) What is Boyce code normal forms (BCNF)?

A9) BCNF is the advance version of 3NF. It is stricter than 3NF.

● A table is in BCNF if every functional dependency X → Y, X is the super key of the table.

● For BCNF, the table should be in 3NF, and for every FD, LHS is super key.

Example: Let's assume there is a company where employees work in more than one department.

EMPLOYEE table:

EMP_ID	EMP_COUNTRY	EMP_DEPT	DEPT_TYPE	EMP_DEPT_NO
264	India	Designing	D394	283
264	India	Testing	D394	300
364	UK	Stores	D283	232
364	UK	Developing	D283	549

In the above table Functional dependencies are as follows:

EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate key: {EMP-ID, EMP-DEPT}

The table is not in BCNF because neither EMP_DEPT nor EMP_ID alone are keys.

To convert the given table into BCNF, we decompose it into three tables:

EMP_COUNTRY table:

EMP_ID	EMP_COUNTRY
264	India
264	India

EMP_DEPT table:

EMP_DEPT	DEPT_TYPE	EMP_DEPT_NO
Designing	D394	283
Testing	D394	300
Stores	D283	232
Developing	D283	549

EMP_DEPT_MAPPING table:

EMP_ID	EMP_DEPT
D394	283
D394	300
D283	232
D283	549

Functional dependencies:

EMP_ID → EMP_COUNTRY
EMP_DEPT → {DEPT_TYPE, EMP_DEPT_NO}

Candidate keys:

For the first table: EMP_ID

For the second table: EMP_DEPT

For the third table: {EMP_ID, EMP_DEPT}

Now, this is in BCNF because left side part of both the functional dependencies is a key.

Q10) Explain fourth normal forms (4NF)?

A10) A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued dependency.

● For a dependency A→ B, if for a single value of A, multiple values of B exists, then the relation will be a multi-valued dependency.

Example

STUDENT

STU_ID	COURSE	HOBBY
21	Computer	Dancing
21	Math	Singing
34	Chemistry	Dancing
74	Biology	Cricket
59	Physics	Hockey

The given STUDENT table is in 3NF, but the COURSE and HOBBY are two independent entity. Hence, there is no relationship between COURSE and HOBBY.

In the STUDENT relation, a student with STU_ID, 21 contains two courses, Computer and Math and two hobbies, Dancing and Singing. So there is a Multi-valued dependency on STU_ID, which leads to unnecessary repetition of data.

So to make the above table into 4NF, we can decompose it into two tables:

STUDENT_COURSE

STU_ID	COURSE
21	Computer
21	Math
34	Chemistry
74	Biology
59	Physics

STUDENT_HOBBY

STU_ID	HOBBY
21	Dancing
21	Singing
34	Dancing
74	Cricket
59	Hockey

Q11) Define fifth normal forms (5NF)?

A11) A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should be lossless.

● 5NF is satisfied when all the tables are broken into as many tables as possible in order to avoid redundancy.

● 5NF is also known as Project-join normal form (PJ/NF).

Example

SUBJECT	LECTURER	SEMESTER
Computer	Anshika	Semester 1
Computer	John	Semester 1
Math	John	Semester 1
Math	Akash	Semester 2
Chemistry	Praveen	Semester 1

In the above table, John takes both Computer and Math class for Semester 1 but he doesn't take Math class for Semester 2. In this case, combination of all these fields required to identify a valid data.

Suppose we add a new Semester as Semester 3 but do not know about the subject and who will be taking that subject so we leave Lecturer and Subject as NULL. But all three columns together acts as a primary key, so we can't leave other two columns blank.

So to make the above table into 5NF, we can decompose it into three relations P1, P2 & P3:

SEMESTER	SUBJECT
Semester 1	Computer
Semester 1	Math
Semester 1	Chemistry
Semester 2	Math

SUBJECT	LECTURER
Computer	Anshika
Computer	John
Math	John
Math	Akash
Chemistry	Praveen

SEMSTER	LECTURER
Semester 1	Anshika
Semester 1	John
Semester 1	John
Semester 2	Akash
Semester 1	Praveen

Q12) What is decompositions?

A12) Relational Decomposition

● When a relation in the relational model is not in appropriate normal form then the decomposition of a relation is required.

● In a database, it breaks the table into multiple tables.

● If the relation has no proper decomposition, then it may lead to problems like loss of information.

● Decomposition is used to eliminate some of the problems of bad design like anomalies, inconsistencies, and redundancy.

Types of Decomposition

Fig - Decomposition

Lossless Decomposition

● If the information is not lost from the relation that is decomposed, then the decomposition will be lossless.

● The lossless decomposition guarantees that the join of relations will result in the same relation as it was decomposed.

● The relation is said to be lossless decomposition if natural joins of all the decomposition give the original relation.

Example:

EMPLOYEE_DEPARTMENT table:

EMP_ID	EMP_NAME	EMP_AGE	EMP_CITY	DEPT_ID	DEPT_NAME
22	Denim	28	Mumbai	827	Sales
33	Alina	25	Delhi	438	Marketing
46	Stephan	30	Bangalore	869	Finance
52	Katherine	36	Mumbai	575	Production
60	Jack	40	Noida	678	Testing

The above relation is decomposed into two relations EMPLOYEE and DEPARTMENT

EMPLOYEE table:

EMP_ID	EMP_NAME	EMP_AGE	EMP_CITY
22	Denim	28	Mumbai
33	Alina	25	Delhi
46	Stephan	30	Bangalore
52	Katherine	36	Mumbai
60	Jack	40	Noida

DEPARTMENT table

DEPT_ID	EMP_ID	DEPT_NAME
827	22	Sales
438	33	Marketing
869	46	Finance
575	52	Production
678	60	Testing

Now, when these two relations are joined on the common column "EMP_ID", then the resultant relation will look like:

Employee ⋈ Department

EMP_ID	EMP_NAME	EMP_AGE	EMP_CITY	DEPT_ID	DEPT_NAME
22	Denim	28	Mumbai	827	Sales
33	Alina	25	Delhi	438	Marketing
46	Stephan	30	Bangalore	869	Finance
52	Katherine	36	Mumbai	575	Production
60	Jack	40	Noida	678	Testing

Hence, the decomposition is Lossless join decomposition.

Q13) What are the decomposition algorithm for 3NF?

A13) By explicitly generating a schema for each dependency in the canonical cover, the decomposition procedure for 3NF ensures that dependencies are preserved. It assures that at least one schema has a candidate key for the one being decomposed, ensuring that the decomposition created is a lossless decomposition.

Decomposition Algorithm

Let Fc be a canonical cover for F;

i=0;

For each functional dependency α->β in Fc

i = i+1;

R = αβ;

If none of the schemas Rj, j=1,2,…I holds a candidate key for R

Then

i = i+1;

Ri= any candidate key for R;

/* Optionally, remove the repetitive relations*/

Repeat

If any schema Rj is contained in another schema Rk

Then

/* Delete Rj */

Rj = Ri;

i = i-1;

Until no more Rjs can be deleted

Return (R1, R2, . . . ,Ri)

The supplied relation is R, and the given collection of functional dependencies is F, for which Fc maintains the canonical cover. The decomposed portions of the given relation R are R1, R2,..., Ri. As a result, this technique preserves the dependency while also generating a lossless decomposition of R.

A 3NF synthesis algorithm is another name for a 3NF algorithm. It's called so because the regular form works with a dependency set and adds one schema at a time, rather than repeatedly dissecting the basic schema.

Q14) What are the decomposition algorithm for BCNF?

A14) It is important to check if the given relation is in Boyce-Codd Normal Form before applying the BCNF decomposition technique on it. If it is discovered that the supplied relation is not in BCNF after the test, we can decompose it further to produce BCNF relations.

The following situations necessitate determining whether the supplied relation schema R follows the BCNF rule:

Case 1: Evaluate and compute α+, i.e., the attribute closure of to see if a nontrivial dependency α -> β violates the BCNF rule. Check that + contains all of the attributes of the supplied relation R. It should, therefore, be the super key of relation R.

Case 2: It is not necessary to test all of the dependencies in F+ if the specified relation R is in BCNF. For the BCNF test, all that is required is detecting and checking the dependencies in the specified dependency list F. It's because if no dependent in F violates BCNF, then none of the F+ dependencies will as well.

Decomposition Algorithm

If the supplied relation R is deconstructed into numerous relations R1, R2,..., Rn since it was not found in the BCNF, this algorithm is employed. Thus,

We must validate that α+ (an attribute closure of under F) either includes all the attributes of the relation Ri or no attribute of Ri-α for each subset of attributes in the relation Ri.

Result={R};

Done=false;

Compute F+;

While (not done) do

If (there is a schema Ri in result that is not in BCNF)

Then begin

Let α->β be a nontrivial functional dependency that holds

On Ri such that α->Ri is not in F+, and α ꓵ β= ø;

Result=(result-Ri) U (Ri-β) U (α,β);

End

Else done=true;

This procedure is used to break down a given relation R into its decomposers. This approach does the breakdown of the relation R using dependencies that demonstrate the violation of BCNF. As a result, such an algorithm not only generates relation R decomposers in BCNF, but it is also a lossless decomposition. It signifies that no data is lost when the specified relation R is decomposed into R1, R2, and so on...

The time it takes for the BCNF decomposition procedure to complete is proportional to the size of the original relation schema R. As a result, one disadvantage of this technique is that it may excessively breakdown the given relation R, i.e., over-normalize it.

The decomposition methods for BCNF and 4NF are nearly identical, with one exception. The fourth normal form is concerned with multivalued dependencies, while BCNF is concerned with functional dependencies. The multivalued dependencies aid in reducing data repetition, which is difficult to comprehend in terms of functional relationships.

Sign Up