Unit 5 | unit 5 recovery system and advanced databases

DBMS

Unit - 5

Recovery System and advanced databases

Q1) What is Failure classification in DBMS?

A1) To find that where the problem has occurred, we generalize a failure into the following categories:

Transaction failure
System crash
Disk failure

1. Transaction failure

The transaction failure occurs when it fails to execute or when it reaches a point from where it can't go any further. If a few transaction or process is hurt, then this is called as transaction failure.

Reasons for a transaction failure could be -

Logical errors: If a transaction cannot complete due to some code error or an internal error condition, then the logical error occurs.
Syntax error: It occurs where the DBMS itself terminates an active transaction because the database system is not able to execute it. For example, The system aborts an active transaction, in case of deadlock or resource unavailability.

2. System Crash

System failure can occur due to power failure or other hardware or software failure. Example: Operating system error.
Fail-stop assumption: In the system crash, non-volatile storage is assumed not to be corrupted.

3. Disk Failure

● It occurs where hard-disk drives or storage drives used to fail frequently. It was a common problem in the early days of technology evolution.

● Disk failure occurs due to the formation of bad sectors, disk head crash, and unreachability to the disk or any other failure, which destroy all or part of disk storage.

Q2) What is Storage in DBMS? What are its different types and what is storage hierarchy?

A2) A database system provides an ultimate view of the stored data. However, data in the form of bits and bytes get stored in different storage devices.

In this section, we will take an overview of various types of storage devices that are used for accessing and storing data.

Types of Data Storage

For storing the data, there are different types of storage options available. These storage types differ from one another as per the speed and accessibility. There are the following types of storage devices used for storing the data:

● Primary Storage

● Secondary Storage

● Tertiary Storage

Fig 1 - Types of Data Storage

Primary Storage

It is the primary area that offers quick access to the stored data. We also know the primary storage as volatile storage. It is because this type of memory does not permanently store the data. As soon as the system leads to a power cut or a crash, the data also get lost. Main memory and cache are the types of primary storage.

● Main Memory: It is the one that is responsible for operating the data that is available by the storage medium. The main memory handles each instruction of a computer machine. This type of memory can store gigabytes of data on a system but is small enough to carry the entire database. At last, the main memory loses the whole content if the system shuts down because of power failure or other reasons.

● Cache: It is one of the costly storage media. On the other hand, it is the fastest one. A cache is a tiny storage media which is maintained by the computer hardware usually. While designing the algorithms and query processors for the data structures, the designers keep concern on the cache effects.

Secondary Storage

Secondary storage is also called as Online storage. It is the storage area that allows the user to save and store data permanently. This type of memory does not lose the data due to any power failure or system crash. That's why we also call it non-volatile storage.

There are some commonly described secondary storage media which are available in almost every type of computer system:

● Flash Memory: A flash memory stores data in USB (Universal Serial Bus) keys which are further plugged into the USB slots of a computer system. These USB keys help transfer data to a computer system, but it varies in size limits. Unlike the main memory, it is possible to get back the stored data which may be lost due to a power cut or other reasons. This type of memory storage is most commonly used in the server systems for caching the frequently used data. This leads the systems towards high performance and is capable of storing large amounts of databases than the main memory.

● Magnetic Disk Storage: This type of storage media is also known as online storage media. A magnetic disk is used for storing the data for a long time. It is capable of storing an entire database. It is the responsibility of the computer system to make availability of the data from a disk to the main memory for further accessing. Also, if the system performs any operation over the data, the modified data should be written back to the disk. The tremendous capability of a magnetic disk is that it does not affect the data due to a system crash or failure, but a disk failure can easily ruin as well as destroy the stored data.

Tertiary Storage

It is the storage type that is external from the computer system. It has the slowest speed. But it is capable of storing a large amount of data. It is also known as Offline storage. Tertiary storage is generally used for data backup. There are following tertiary storage devices available:

● Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact Disk (CD) can store 700 megabytes of data with a playtime of around 80 minutes. On the other hand, a Digital Video Disk or a DVD can store 4.7 or 8.5 gigabytes of data on each side of the disk.

● Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for archiving or backing up the data. It provides slow access to data as it accesses data sequentially from the start. Thus, tape storage is also known as sequential-access storage. Disk storage is known as direct-access storage as we can directly access the data from any location on disk.

Storage Hierarchy

Besides the above, various other storage devices reside in the computer system. These storage media are organized on the basis of data accessing speed, cost per unit of data to buy the medium, and by medium's reliability. Thus, we can create a hierarchy of storage media on the basis of its cost and speed.

Thus, on arranging the above-described storage media in a hierarchy according to its speed and cost, we conclude the below-described image:

Fig 2 – Storage device hierarchy

In the image, the higher levels are expensive but fast. On moving down, the cost per bit is decreasing, and the access time is increasing. Also, the storage media from the main memory to up represents the volatile nature, and below the main memory, all are non-volatile devices.

Q3) What is Stable-Storage Implementation in DBMS?

A3) To achieve such storage, we need to replicate the required information on multiple storage devices with independent failure modes. The writing of an update should be coordinate in such a way that it would not delete all the copies of the state and that, when we are recovering from a failure, we can force all the copies to a consistent and correct value, even if another failure occurs during the recovery. In these, we discuss how to meet these needs.

The disk write operation results to one of the following outcome:

https://media.geeksforgeeks.org/wp-content/uploads/20200518114844/Capture3310.png

Figure 3 – Outcomes of Disk

Successful completion –

The data will written correctly on disk.

2. Partial Failure –

In this case, failure is occurred in the middle of the data transfer, so that only some sectors were written with the new data, and the sector which is written during the failure may have been corrupted.

3. Total Failure –

The failure occurred before the disk write started, so the previous data values on the disk remains intact.

During writing a block somehow if failure occurs, the system’s first work is to detect the failure and then invoke a recovery process to restore the consistent state. To do that, the system must contain two physical block for each logical block.

An output operation is executed as follows:

https://media.geeksforgeeks.org/wp-content/uploads/20200518115420/Capture37777.png

Figure 4 – Process of execution of output operation

Write the information onto the first physical block.
When the first write completes successfully, write the same operation onto the second physical block.
When both the operation declares successfully, declare the operation as complete.

During the recovery from a failure each of the physical block is examined. If both are the same and no detectable error exists, then no further action is necessary. If one block contains detectable errors then we replace its content with the value of the other block. If neither block contains the detectable error, but the block differ in content, then we replace the content of first block with the content of the second block. This procedure of the recovery give us an conclusion that either the write to stable content succeeds successfully or it results in no change.

This procedure will be extended if we want arbitrarily large number of copies of each block of the stable storage. With the usage of large number of copies, the chances of the failure reduces. Generally, it is usually reasonable to simulate stable storage with only two copies. The data present in the stable storage is safe unless a failure destroys all the copies. The data that is present in the stable storage is guaranteed to be safe unless a failure destroys all the copies.

Because waiting for disk writes to complete is time consuming, many storage arrays add NVRAM as a cache. Since the memory is no-volatile it can be trusted to store the data en route to the disks. In this way it is considered as a part of the stable storage. Writing to the stable storage is much faster than to disk, so performance is greatly improved.

Q4) What is Crash Recovery in DBMS explain in detail?

A4) DBMS is a highly complex system with hundreds of transactions being executed every second. The durability and robustness of a DBMS depends on its complex architecture and its underlying hardware and system software. If it fails or crashes amid transactions, it is expected that the system would follow some sort of algorithm or techniques to recover lost data.

Failure Classification

To see where the problem has occurred, we generalize a failure into various categories, as follows −

Transaction failure

A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any further. This is called transaction failure where only a few transactions or processes are hurt.

Reasons for a transaction failure could be −

● Logical errors − Where a transaction cannot complete because it has some code error or any internal error condition.

● System errors − Where the database system itself terminates an active transaction because the DBMS is not able to execute it, or it has to stop because of some system condition. For example, in case of deadlock or resource unavailability, the system aborts an active transaction.

System Crash

There are problems − external to the system − that may cause the system to stop abruptly and cause the system to crash. For example, interruptions in power supply may cause the failure of underlying hardware or software failure.

Examples may include operating system errors.

Disk Failure

In early days of technology evolution, it was a common problem where hard-disk drives or storage drives used to fail frequently.

Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which destroys all or a part of disk storage.

Storage Structure

We have already described the storage system. In brief, the storage structure can be divided into two categories −

● Volatile storage − As the name suggests, a volatile storage cannot survive system crashes. Volatile storage devices are placed very close to the CPU; normally they are embedded onto the chipset itself. For example, main memory and cache memory are examples of volatile storage. They are fast but can store only a small amount of information.

● Non-volatile storage − These memories are made to survive system crashes. They are huge in data storage capacity, but slower in accessibility. Examples may include hard-disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.

Q5) What is Recovery and Atomicity in DBMS?

A5) When a system crashes, it may have several transactions being executed and various files opened for them to modify the data items. Transactions are made of various operations, which are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained, that is, either all the operations are executed or none.

When a DBMS recovers from a crash, it should maintain the following −

● It should check the states of all the transactions, which were being executed.

● A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of the transaction in this case.

● It should check whether the transaction can be completed now or it needs to be rolled back.

● No transactions would be allowed to leave the DBMS in an inconsistent state.

There are two types of techniques, which can help a DBMS in recovering as well as maintaining the atomicity of a transaction −

● Maintaining the logs of each transaction, and writing them onto some stable storage before actually modifying the database.

● Maintaining shadow paging, where the changes are done on a volatile memory, and later, the actual database is updated.

Q6) What is Log-based Recovery?

A6) Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the logs are written prior to the actual modification and stored on a stable storage media, which is failsafe.

Log-based recovery works as follows −

● The log file is kept on a stable storage media.

● When a transaction enters the system and starts execution, it writes a log about it.

<Tn, Start>

● When the transaction modifies an item X, it write logs as follows −

<Tn, X, V1, V2>

It reads Tn has changed the value of X, from V1 to V2.

● When the transaction finishes, it logs −

<Tn, commit>

The database can be modified using two approaches −

● Deferred database modification − All logs are written on to the stable storage and the database is updated when a transaction commits.

● Immediate database modification − Each log follows an actual database modification. That is, the database is modified immediately after every operation.

Q7) Explain Recovery with Concurrent Transactions in DBMS

A7) When more than one transaction are being executed in parallel, the logs are interleaved. At the time of recovery, it would become hard for the recovery system to backtrack all logs, and then start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.

Checkpoint

Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the system. As time passes, the log file may grow too big to be handled at all. Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares a point before which the DBMS was in consistent state, and all the transactions were committed.

Recovery

When a system with concurrent transactions crashes and recovers, it behaves in the following manner −

Fig 5 - Recovery

● The recovery system reads the logs backwards from the end to the last checkpoint.

● It maintains two lists, an undo-list and a redo-list.

● If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in the redo-list.

● If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list.

All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and their previous logs are removed and then redone before saving their logs.

Q8) Explain in detail log based recovery in DBMS?

A8) Log based recovery

● The log is a sequence of records. Log of each transaction is maintained in some stable storage so that if any failure occurs, then it can be recovered from there.

● If any operation is performed on the database, then it will be recorded in the log.

● But the process of storing the logs should be done before the actual transaction is applied in the database.

Let's assume there is a transaction to modify the City of a student. The following logs are written for this transaction.

● When the transaction is initiated, then it writes 'start' log.

<Tn, Start>

● When the transaction modifies the City from 'Noida' to 'Bangalore', then another log is written to the file.

<Tn, City, 'Noida', 'Bangalore' >

● When the transaction is finished, then it writes another log to indicate the end of the transaction.

<Tn, Commit>

There are two approaches to modify the database:

1. Deferred database modification:

● The deferred modification technique occurs if the transaction does not modify the database until it has committed.

● In this method, all the logs are created and stored in the stable storage, and the database is updated when a transaction commits.

2. Immediate database modification:

● The Immediate modification technique occurs if database modification occurs while the transaction is still active.

● In this technique, the database is modified immediately after every operation. It follows an actual database modification.

Recovery using Log records

When the system is crashed, then the system consults the log to find which transactions need to be undone and which need to be redone.

If the log contains the record <Ti, Start> and <Ti, Commit> or <Ti, Commit>, then the Transaction Ti needs to be redone.
If log contains record<Tn, Start> but does not contain the record either <Ti, commit> or <Ti, abort>, then the Transaction Ti needs to be undone.

Q9) Explain Checkpoints in DBMS?

A9) Checkpoints

● The checkpoint is a type of mechanism where all the previous logs are removed from the system and permanently stored in the storage disk.

● The checkpoint is like a bookmark. While the execution of the transaction, such checkpoints are marked, and the transaction is executed then using the steps of the transaction, the log files will be created.

● When it reaches to the checkpoint, then the transaction will be updated into the database, and till that point, the entire log file will be removed from the file. Then the log file is updated with the new step of transaction till next checkpoint and so on.

● The checkpoint is used to declare a point before which the DBMS was in the consistent state, and all transactions were committed.

Recovery using Checkpoint

In the following manner, a recovery system recovers the database from this failure:

Fig 6 – Recovery using checkpoint

● The recovery system reads log files from the end to start. It reads log files from T4 to T1.

● Recovery system maintains two lists, a redo-list, and an undo-list.

● The transaction is put into redo state if the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>. In the redo-list and their previous list, all the transactions are removed and then redone before saving their logs.

● For example: In the log file, transaction T2 and T3 will have <Tn, Start> and <Tn, Commit>. The T1 transaction will have only <Tn, commit> in the log file. That's why the transaction is committed after the checkpoint is crossed. Hence it puts T1, T2 and T3 transaction into redo list.

● The transaction is put into undo state if the recovery system sees a log with <Tn, Start> but no commit or abort log found. In the undo-list, all the transactions are undone, and their logs are removed.

● For example: Transaction T4 will have <Tn, Start>. So T4 will be put into the undo list since this transaction is not yet complete and failed amid.

Q10) Explain NoSQL database?

A10) A NoSQL database, which stands for "non SQL" or "non relational," is a database that allows for data storage and retrieval. This information is represented in ways other than tabular relationships found in relational databases. Such databases first appeared in the late 1960s, but it wasn't until the early twenty-first century that they were given the name NoSQL. NoSQL databases are increasingly being employed in real-time online applications and large data analytics. Not only SQL is a term used to stress the fact that NoSQL systems may support SQL-like query languages.

Design simplicity, horizontal scaling to clusters of machines, and tighter control over availability are all advantages of a NoSQL database. The data structures used by NoSQL databases differ from those used by relational databases by default, which allows NoSQL to perform some operations faster. The applicability of a NoSQL database is determined by the problem it is supposed to answer. NoSQL databases' data structures are sometimes seen to be more flexible than relational database tables.

Many NoSQL databases make trade-offs between consistency and availability, performance, and partition tolerance. The usage of low-level query languages, a lack of standardized interfaces, and large prior investments in relational databases are all barriers to wider adoption of NoSQL storage. Although most NoSQL databases lack true ACID transactions (atomicity, consistency, isolation, and durability), a few databases, including MarkLogic, Aerospike, FairCom c-treeACE, Google Spanner (though technically a NewSQL database), Symas LMDB, and OrientDB, have made them a central part of their designs.

Most NoSQL databases support eventual consistency, which means that database updates are propagated to all nodes over time. As a result, queries for data may not return updated data right away, or may result in reading data that is inaccurate, a problem known as stale reads. Some NoSQL systems may also experience lost writes and other data loss. To avoid data loss, certain NoSQL systems offer features like write-ahead logging. Data consistency is even more difficult to achieve when doing distributed transaction processing across many databases. Both NoSQL and relational databases struggle with this.

Advantages

● it is possible to use it as a primary or analytic data source.

● Capacity for Big Data.

● There isn't a single point of failure.

● Replication is simple.

● There is no requirement for a separate caching layer.

● It offers quick performance as well as horizontal scalability.

● Can effectively handle structured, semi-structured, and unstructured data.

● Object-oriented programming is a type of programming that is simple to use and adaptable.

● A dedicated high-performance server isn't required for NoSQL databases.

● Key Developer Languages and Platforms are supported.

● RDBMS is more difficult to implement.

● It has the potential to be the key data source for web-based applications.

● Manages large data, including the velocity, diversity, volume, and complexity of data.

Disadvantages

● There are no norms for standardization.

● Querying capabilities are limited.

● Databases and tools for Relational Database Management Systems (RDBMS) are relatively established.

● It lacks standard database features such as consistency when several transactions are carried out at the same time.

● As the volume of data grows, it becomes more difficult to preserve unique values as keys become more difficult to remember.

● With relational data, it doesn't work as well.

● For novice developers, the learning curve is steep.

● For businesses, open source choices are less common.

Need

● When you need to store and retrieve a large volume of data.

● The relationship between the data you keep isn't as significant as you might think.

● The data is unstructured and changes over time.

● At the database level, support for constraints and joins is not necessary.

● The data is always growing, and you'll need to scale the database on a regular basis to keep up with it.

Q11) Write the features of Nosql database?

A11) Features of NoSQL database

Non - relational

● The relational model is never followed by NoSQL databases.

● Tables with flat fixed-column records should never be used.

● Work with BLOBs or self-contained aggregates.

● Object-relational mapping and data standardization are not required.

● There are no advanced features such as query languages, planners, referential integrity joins, or ACID.

Schema -free

● NoSQL databases are either schema-free or contain schemas that are more loose.

● There is no requirement for the data schema to be defined.

● Provides data structures that are heterogeneous within the same domain.

Fig 7: NoSQL is schema free

Simple API

● Provides simple user interfaces for storing and querying data.

● APIs make it possible to manipulate and choose data at a low level.

● HTTP REST with JSON is typically used with text-based protocols.

● The majority of queries were written in a NoSQL query language that was not based on any standard.

● Databases that are web-enabled and run as internet-facing services.

Distributed

● A distributed execution of many NoSQL databases is possible.

● Auto-scaling and fail-over capabilities are included.

● The ACID principle is frequently overlooked in favor of scalability and throughput.

● Asynchronous replication across remote nodes is almost non-existent. HDFS Replication, Asynchronous Multi-Master Replication, Peer-to-Peer.

● Only ensuring long-term consistency.

● Nothing is shared in the architecture. As a result, there is less coordination and more dispersal.

Fig 8: NoSQL is Shared Nothing

Q12) Describe database security?

A12) The term "database security" refers to a set of tools, policies, and procedures for ensuring and maintaining database confidentiality, integrity, and availability. Because confidentiality is the factor that is violated in the majority of data breaches, this article will concentrate on it.

The following must be addressed and protected by database security:

● The data in the database

● The database management system (DBMS)

● Any associated applications

● The physical database server and/or the virtual database server and the underlying hardware

● The computing and/or network infrastructure used to access the database

Database security is a complicated and difficult task that encompasses all areas of information security technology and procedures. It is also incompatible with database usability. The database becomes more exposed to security threats as it becomes more accessible and usable; the database becomes more invulnerable to attacks as it becomes more difficult to access and use.

Database security refers to the safeguarding of sensitive information and the prevention of data loss. The Database Administrator is in charge of the database's security (DBA).

Why is it important

A data breach is defined as a failure to keep the confidentiality of data in a database. The amount of damage a data breach causes your company is determined by a number of repercussions or factors:

● Compromised intellectual property: Your intellectual property—trade secrets, inventions, and private practices—could be crucial to maintaining a competitive advantage in your industry. If your intellectual property is stolen or revealed, it may be difficult or impossible to preserve or restore your competitive edge.

● Damage to brand reputation: Customers or partners may be hesitant to acquire your products or services (or do business with you) if they don't trust you to protect their or their data.

● Business continuity (or lack thereof): Some businesses are unable to continue operating until the breach has been fixed.

● Fines or penalties for non-compliance: Failure to comply with global regulations such as the Sarbanes-Oxley Act (SAO) or the Payment Card Industry Data Security Standard (PCI DSS), industry-specific data privacy regulations like HIPAA, or regional data privacy regulations like Europe's General Data Protection Regulation (GDPR) can have devastating financial consequences, with fines exceeding several million dollars per violation in the worst cases.

● Costs of repairing breaches and notifying customers: A violated organisation must spend for forensic and investigation efforts, crisis management, triage, system repair, and more, in addition to the cost of disclosing a breach to customers.

Common threats and challenges

Any database system must have data security as a top priority. Because of the enormous number of users, fragmented and replicated data, different sites, and dispersed control, it is especially important in distributed systems.

Threats in a Database

● Availability loss − The term "availability loss" refers to the inability of legitimate users to access database items.

● Integrity loss − When unacceptable activities are done on the database, either mistakenly or purposefully, integrity is lost. This can happen when you're putting data in, updating it, or deleting it. It leads to tainted data, which leads to erroneous decisions.

● Confidentiality loss − Unauthorized or inadvertent disclosure of confidential information causes confidentiality loss. It could lead to unlawful activities, security threats, and a loss of public trust.

Q13) Explain data mining?

A13) Data mining, in simple terms, is defined as a method used to extract useful data from a larger collection of raw data. It means analysing data patterns using one or more tools in large batches of data. In many areas, including science and analysis, data mining has applications.

As a data mining application, organisations can learn more about their clients and develop more efficient strategies related to different business functions and, in turn, exploit resources in a more optimal and informative way. This helps organisations get closer to their targets and make better choices. Successful data collection and warehousing as well as information processing include data mining.

Data mining uses advanced mathematical algorithms for segmenting the data and determining the likelihood of future events. Often known as Knowledge Exploration in Data Mining, data mining (KDD).

Main data mining features:

Automatic forecasts of patterns based on study of trends and behaviours.
Prediction on the basis of probable results.
Creation of knowledge that is decision-oriented.
Focus on broad data sets and research databases.
Clustering centred on identifying groups of facts not previously recognised and visually registered.

Fig 9: Data mining

Data mining technique:

Fig 10: Data mining technique

● Classification: This research is used to gather information about data and metadata that is significant and relevant. This method of data mining helps to classify data into various groups.

● Clustering: Analysis of clustering is a technique of data mining to classify data that is like each other. The variations and similarities between the data help to explain this method.

● Regression: The data mining method of defining and evaluating the relationship between variables is regression analysis. It is used, given the existence of other variables, to classify the probability of a particular variable.

● Association rules: This method of data mining helps to find the connection between two or more objects. In the data set, it discovers a secret pattern.

● Outer prediction: This type of technique for data mining refers to the discovery of data items in the dataset that do not conform to an anticipated pattern or actions. In a number of domains, such as intrusion, tracking, fraud or fault detection, etc., this approach may be used. Outer identification is also referred to as Outlier Analysis or mining Outlier.

● Sequential patterns: For some times, this data mining technique helps to discover or recognise similar patterns or trends in transaction data.

● Prediction: A variation of other data mining strategies such as trends, temporal patterns, clustering, grouping, etc. has been used by Prediction. To predict a future occurrence, it analyses previous events or instances in the correct sequence.

Types of data mining

It is possible to conduct data mining on the following data types:

Relational database
Data warehouse
Data repositories
Object relational database
Transactional database

Fig 11: Application of data mining

Q14) Define data warehousing?

A14) In order to provide useful business insights, a Data Warehousing (DW) method is used to collect and handle data from different sources. A data warehouse is usually used for linking and analysing heterogeneous sources of business data. The data warehouse is the cornerstone of the data collection and reporting framework developed for the BI system.

It is a blend of technologies and elements that promotes the strategic use of information. Instead of transaction processing, it is electronic storage of a vast volume of information by an organisation that is intended for question and review. It is a method of converting data into data and making it accessible to users to make a difference in a timely manner.

A Data Warehouse with the following attributes can be described as a data system:

It is a database, using data from different applications, developed for investigative tasks.
It serves a relatively limited number of clients with relatively long interactions.
In order to have a historical view of knowledge, it requires current and historical data.
Its application is read-intensive.
It involves a few large tables.

"Data Warehouse is a subject-oriented, integrated, and time-variant store of information in support of management's decisions."

Types

Three main types of data warehouse are as follow:

Enterprise Data Warehouse (EDW)
Operational data store
Data Mart

Characteristics of data warehouse

❏ Subject oriented

❏ Integrated

❏ Time variant

❏ Non volatile

Components of data warehouse

Four components of DW

Load manger
Warehouse manager
Query manager
End user access tool

Fig 12: Data warehouse

Q15) Describe a distributed database?

A15) A distributed database is a database that is not restricted to a single system and is dispersed across numerous places, such as multiple computers or a network of computers. A distributed database system is made up of multiple sites with no physical components in common. This may be necessary if a database needs to be viewed by a large number of people all over the world. It must be administered in such a way that it appears to users as a single database.

A distributed database system (DDBS) is a database that does not have all of its storage devices connected to the same CPU. It might be stored on numerous computers in the same physical place, or it could be spread throughout a network of connected computers. Simply said, it is a logically centralized yet physically dispersed database system. It's a database system and a computer network all rolled into one. Despite the fact that this is a major issue in database architecture, one of the most serious challenges in today's database systems is storage and query in distributed database systems.

Fig 13: Distributed database

A centralized software system that manages a distributed database as if it were all kept in a single location is known as a distributed database management system (DDBMS).

Features

● It's used to create, retrieve, update, and destroy databases that are distributed.

● It synchronizes the database on a regular basis and provides access mechanisms, making the distribution transparent to the users.

● It ensures that data modified on any site is updated across the board.

● It's utilized in applications where a lot of data is processed and accessible by a lot of people at the same time.

● It's made to work with a variety of database platforms.

● It protects the databases' confidentiality and data integrity.

Advantages of Distributed Databases

The advantages of distributed databases versus centralized databases are as follows.

Modular Development − In centralized database systems, if the system needs to be expanded to additional locations or units, the activity necessitates significant effort and disruption of current operations. In distributed databases, on the other hand, the process is merely moving new computers and local data to the new site and then connecting them to the distributed system, with no interruption in present operations.
More Reliable − When a database fails, the entire centralized database system comes to a halt. When a component fails in a distributed system, however, the system may continue to function but at a lower level of performance.
Better Response − If data is delivered efficiently, user requests can be fulfilled from local data, resulting in a speedier response. In centralized systems, on the other hand, all inquiries must transit through the central computer for processing, lengthening the response time.
Lower Communication Cost − When data is stored locally where it is most frequently utilized in distributed database systems, communication costs for data manipulation can be reduced. In centralized systems, this is not possible.

Sign Up