Unit - 4
Transactions
A transaction is a logical unit of data processing that consists of a collection of database actions. One or more database operations, such as insert, remove, update, or retrieve data, are executed in a transaction. It is an atomic process that is either completed completely or not at all. A read-only transaction is one that involves simply data retrieval and no data updating.
A transaction is a group of operations that are logically related. It consists of a number of tasks. An activity or series of acts is referred to as a transaction. It is carried out by a single user to perform operations on the database's contents.
Each high-level operation can be broken down into several lower-level jobs or operations. A data update operation, for example, can be separated into three tasks:
Read_item() - retrieves a data item from storage and stores it in the main memory.
Modify_item() - Change the item's value in the main memory.
Write_item() - write the changed value to storage from main memory
Only the read item() and write item() methods have access to the database. Similarly, read and write are the fundamental database operations for all transactions.
Key takeaway
- A transaction is a logical unit of data processing that consists of a collection of database actions.
- One or more database operations, such as insert, remove, update, or retrieve data, are executed in a transaction.
A transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − in order to ensure accuracy, completeness, and data integrity.
- Atomicity
- Consistency
- Isolation
- Durability.
Atomicity: -
- A transaction is an atomic unit of processing.
- It is either performed in its entirely or not performed at all.
- It means either all operations are reflected in the database or none are.
- It is responsibility of the transaction recovery subsystem to ensure atomicity.
- If a transaction fails to complete for some reason, recovery technique must undo any effects of the transaction on the database.
Consistency: -
- A transaction is consistency preserving if its complete execution takes the database from one consistent state to other.
- It is responsibility of the programmer who writes database programs.
- Database programs should be written in such a way that, if database is in a consistent state before execution of transaction, it will be in a consistent state after execution of transaction.
E.g., Transaction is: Transfer $50 from account A to account B.
If account A having amount $100 & account B having amount $ 200 initially then after execution of transaction, A must have $ 50 & B must have $ 250.
Isolation: -
- If multiple transactions execute concurrently, they should produce the result as if they are working serially.
- It is enforced by concurrency control subsystem.
- To avoid the problem of concurrent execution, transaction should be executed in isolation.
Durability: -
- The changes applied to the database, by a committed transaction must persist in a database.
- The changes must not be lost because of any failure.
- It is the responsibility of the recovery subsystem of the DBMS.
Key takeaway
- A transaction is an atomic unit of processing.
- To avoid the problem of concurrent execution, transaction should be executed in isolation.
A transaction is consistency preserving if its complete execution takes the database from one consistent state to other.
What is serializability?
● Serializability is a concurrency scheme where the concurrent transaction is equivalent to one that executes the transactions serially.
● A schedule is a list of transactions.
● Serial schedule defines each transaction is executed consecutively without any interference from other transactions.
● Non-serial schedule defines the operations from a group of concurrent transactions that are interleaved.
● In non-serial schedule, if the schedule is not proper, then the problems can arise like multiple update, uncommitted dependency and incorrect analysis.
● The main objective of serializability is to find non-serial schedules that allow transactions to execute concurrently without interference and produce a database state that could be produced by a serial execution.
1. Conflict Serializability
● Conflict serializability defines two instructions of two different transactions accessing the same data item to perform a read/write operation.
● It deals with detecting the instructions that are conflicting in any way and specifying the order in which the instructions should execute in case there is any conflict.
● A conflict serializability arises when one of the instruction is a write operation.
The following rules are important in Conflict Serializability,
1. If two transactions are both read operation, then they are not in conflict.
2. If one transaction wants to perform a read operation and other transaction wants to perform a write operation, then they are in conflict and cannot be swapped.
3. If both the transactions are for write operation, then they are in conflict, but can be allowed to take place in any order, because the transactions do not read the value updated by each other.
2. View Serializability
● View serializability is the another type of serializability.
● It can be derived by creating another schedule out of an existing schedule and involves the same set of transactions.
Example: Let us assume two transactions T1 and T2 that are being serialized to create two different schedules SH1 and SH2, where T1 and T2 want to access the same data item. Now there can be three scenarios
1. If in SH1, T1 reads the initial value of data item, then in SH2 , T1 should read the initial value of that same data item.
2. If in SH2, T1 writes a value in the data item which is read by T2, then in SH2, T1 should write the value in the data item before T2 reads it.
3. If in SH1, T1 performs the final write operation on that data item, then in SH2, T1 should perform the final write operation on that data item.
If a concurrent schedule is view equivalent to a serial schedule of same transaction then it is said to be View serializable.
Testing of Serializability
Serialization Graph is used to test the Serializability of a schedule.
Assume a schedule S. For S, we construct a graph known as precedence graph. This graph has a pair G = (V, E), where V consists a set of vertices, and E consists a set of edges. The set of vertices is used to contain all the transactions participating in the schedule. The set of edges is used to contain all edges Ti ->Tj for which one of the three conditions holds:
- Create a node Ti→Tj if Ti executes write (Q) before Tj executes read (Q).
- Create a node Ti→Tj if Ti executes read (Q) before Tj executes write (Q).
- Create a node Ti→Tj if Ti executes write (Q) before Tj executes write (Q).
Fig 1: Precedence Graph
- If a precedence graph contains a single edge Ti→Tj, then all the instructions of Ti are executed before the first instruction of Tj is executed.
- If a precedence graph for schedule S contains a cycle, then S is non-serializable. If the precedence graph has no cycle, then S is known as serializable.
For example:
Explanation:
Read(A): In T1, no subsequent writes to A, so no new edges
Read(B): In T2, no subsequent writes to B, so no new edges
Read(C): In T3, no subsequent writes to C, so no new edges
Write(B): B is subsequently read by T3, so add edge T2 → T3
Write(C): C is subsequently read by T1, so add edge T3 → T1
Write(A): A is subsequently read by T2, so add edge T1 → T2
Write(A): In T2, no subsequent reads to A, so no new edges
Write(C): In T1, no subsequent reads to C, so no new edges
Write(B): In T3, no subsequent reads to B, so no new edges
Precedence graph for schedule S1:
Fig 2: Precedence graph
The precedence graph for schedule S1 contains a cycle that's why Schedule S1 is non-serializable.
Explanation:
Read(A): In T4,no subsequent writes to A, so no new edges
Read(C): In T4, no subsequent writes to C, so no new edges
Write(A): A is subsequently read by T5, so add edge T4 → T5
Read(B): In T5,no subsequent writes to B, so no new edges
Write(C): C is subsequently read by T6, so add edge T4 → T6
Write(B): A is subsequently read by T6, so add edge T5 → T6
Write(C): In T6, no subsequent reads to C, so no new edges
Write(A): In T5, no subsequent reads to A, so no new edges
Write(B): In T6, no subsequent reads to B, so no new edges
Precedence graph for schedule S2:
Fig 3: Precedence graph
The precedence graph for schedule S2 contains no cycle that's why ScheduleS2 is serializable.
Key takeaway
- Serializability is a concurrency scheme where the concurrent transaction is equivalent to one that executes the transactions serially.
- A schedule is a list of transactions.
- Serial schedule defines each transaction is executed consecutively without any interference from other transactions.
- Non-serial schedule defines the operations from a group of concurrent transactions that are interleaved.
- In non-serial schedules, if the schedule is not proper, then problems can arise like multiple updates, uncommitted dependency and incorrect analysis.
- The main objective of serializability is to find non-serial schedules that allow transactions to execute concurrently without interference and produce a database state that could be produced by a serial execution.
When more than one transaction is being processed at the same time, the logs are interleaved. It would be tough for the recovery system to backtrack all logs and then start recovering during recovery.
Multiple transactions can be completed at the same time with concurrency control, resulting in interleaved logs. However, because transaction results may change, keep the order in which those operations are executed.
It would be quite difficult for the recovery system to backtrack all of the logs and then begin the recovery process.
Concurrent transaction recovery can be accomplished in one of four methods:
- Interaction with concurrency control
- Transaction rollback
- Checkpoints
- Restart recovery
Interaction with concurrency control
The recovery mechanism in this system is very dependent on the concurrency control scheme used. To undo the updates made by a failed transaction, we must rewind the transaction.
Transaction rollback
We use the log to revert a failed transaction in this scheme. After scanning the log backwards for a failed transaction, the system restores the data item for each log entry identified in the log.
● Checkpoints were employed in this approach to limit the number of log records the system had to scan when it recovered from a crash.
● We demand that the checkpoint log record in a concurrent transaction processing system be of the form checkpoint L>, where ‘L' is a list of transactions active at the moment of the checkpoint.
● A fuzzy checkpoint is a checkpoint that allows transactions to make modifications while buffer blocks are being written out.
Restart recovery
● The system creates two lists when it recovers from a crash.
● The undo-list contains transactions that need to be undone, whereas the redo-list contains transactions that need to be redone.
● The following is how the system generates the two lists: They're both empty at first. The system examines each record in the log backwards until it reaches the initial <checkpoint> entry.
Key takeaway
- Multiple transactions can be completed at the same time with concurrency control, resulting in interleaved logs.
- It would be quite difficult for the recovery system to backtrack all of the logs and then begin the recovery process.
Two phase commit
The vulnerability of one-phase commit protocols is reduced by distributed two-phase commits. The steps performed in the two phases are as follows −
Phase 1: Prepare Phase
- Each slave sends a "DONE" notification to the controlling site once it has completed its transaction locally. When all slaves have sent a "DONE" message to the controlling site, it sends a "Prepare" message to the slaves.
- The slaves vote on whether they still want to commit or not. If a slave wants to commit, it sends a “Ready” message.
- A slave who refuses to commit sends the message "Not Ready." When the slave has conflicting concurrent transactions or there is a timeout, this can happen.
Phase 2: Commit/Abort Phase
After all of the slaves have sent "Ready" messages to the controlling site,
- The slaves receive a "Global Commit" message from the controlling site.
- The slaves complete the transaction and send the controlling site a "Commit ACK" message.
- The transaction is considered committed when the controlling site receives "Commit ACK" messages from all slaves.
After any slave sends the first "Not Ready" notification to the controlling site,
- The slaves receive a "Global Abort" notification from the controlling site.
- The slaves abort the transaction and send the controlling site a "Abort ACK" message.
- The transaction is considered aborted when the controlling site receives "Abort ACK" messages from all slaves.
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every second. The durability and robustness of a DBMS depends on its complex architecture and its underlying hardware and system software. If it fails or crashes amid transactions, it is expected that the system would follow some sort of algorithm or techniques to recover lost data.
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as follows −
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it can’t go any further. This is called transaction failure where only a few transactions or processes are hurt.
Reasons for a transaction failure could be −
- Logical errors− Where a transaction cannot complete because it has some code error or any internal error condition.
- System errors− Where the database system itself terminates an active transaction because the DBMS is not able to execute it, or it has to stop because of some system condition. For example, in case of deadlock or resource unavailability, the system aborts an active transaction.
System Crash
There are problems − external to the system − that may cause the system to stop abruptly and cause the system to crash. For example, interruptions in power supply may cause the failure of underlying hardware or software failure.
Examples may include operating system errors.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives or storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or any other failure, which destroys all or a part of disk storage.
Storage Structure
We have already described the storage system. In brief, the storage structure can be divided into two categories −
- Volatile storage−As the name suggests, a volatile storage cannot survive system crashes. Volatile storage devices are placed very close to the CPU; normally they are embedded onto the chipset itself. For example, main memory and cache memory are examples of volatile storage. They are fast but can store only a small amount of information.
- Non-volatile storage−These memories are made to survive system crashes. They are huge in data storage capacity, but slower in accessibility. Examples may include hard-disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity
When a system crashes, it may have several transactions being executed and various files opened for them to modify the data items. Transactions are made of various operations, which are atomic in nature. But according to ACID properties of DBMS, atomicity of transactions as a whole must be maintained, that is, either all the operations are executed or none.
When a DBMS recovers from a crash, it should maintain the following −
- It should check the states of all the transactions, which were being executed.
- A transaction may be in the middle of some operation; the DBMS must ensure the atomicity of the transaction in this case.
- It should check whether the transaction can be completed now or it needs to be rolled back.
- No transactions would be allowed to leave the DBMS in an inconsistent state.
There are two types of techniques, which can help a DBMS in recovering as well as maintaining the atomicity of a transaction −
- Maintaining the logs of each transaction, and writing them onto some stable storage before actually modifying the database.
- Maintaining shadow paging, where the changes are done on a volatile memory, and later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a transaction. It is important that the logs are written prior to the actual modification and stored on a stable storage media, which is failsafe.
Log-based recovery works as follows −
- The log file is kept on a stable storage media.
- When a transaction enters the system and starts execution, it writes a log about it.
<Tn, Start>
- When the transaction modifies an item X, it write logs as follows −
<Tn, X, V1, V2>
It reads Tn has changed the value of X, from V1 to V2.
- When the transaction finishes, it logs −
<Tn, commit>
The database can be modified using two approaches −
- Deferred database modification− All logs are written on to the stable storage and the database is updated when a transaction commits.
- Immediate database modification−Each log follows an actual database modification. That is, the database is modified immediately after every operation.
Recovery with Concurrent Transactions
When more than one transaction are being executed in parallel, the logs are interleaved. At the time of recovery, it would become hard for the recovery system to backtrack all logs, and then start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the system. As time passes, the log file may grow too big to be handled at all. Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares a point before which the DBMS was in consistent state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the following manner −
Fig 4: Recovery
- The recovery system reads the logs backwards from the end to the last checkpoint.
- It maintains two lists, an undo-list and a redo-list.
- If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it puts the transaction in the redo-list.
- If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and their previous logs are removed and then redone before saving their logs.
Key takeaway
DBMS is a highly complex system with hundreds of transactions being executed every second. The durability and robustness of a DBMS depends on its complex architecture and its underlying hardware and system software. If it fails or crashes amid transactions, it is expected that the system would follow some sort of algorithm or techniques to recover lost data.
The log is made up of a series of records. Each transaction is logged and stored in a secure location so that it may be recovered in the event of a failure.
Any operation that is performed on the database is documented in the log.
However, before the actual transaction is applied to the database, the procedure of saving the logs should be completed.
Let's pretend there's a transaction to change a student's city. This transaction generates the following logs.
- When a transaction is started, the start' log is written.
<Tn, Start>
- Another log is written to the file when the transaction changes the City from 'Noida' to 'Bangalore.'
<Tn, City, 'Noida', 'Bangalore' >
- When the transaction is completed, it writes another log to mark the transaction's completion.
<Tn, Commit>
The database can be modified in two ways:
1. Deferred database modification:
If a transaction does not affect the database until it has committed, it is said to use the postponed modification strategy. When a transaction commits, the database is updated, and all logs are created and stored in the stable storage.
2. Immediate database modification:
If a database change is made while the transaction is still running, the Immediate modification approach is used.
The database is changed immediately after each action in this method. It occurs after a database change has been made.
Recovery using Log records
When a system crashes, the system reads the log to determine which transactions must be undone and which must be redone.
- The Transaction Ti must be redone if the log contains the records <Ti, Start> and <Ti, Commit> or <Ti, Commit>.
- If the log contains the record <Tn, Start> but not the records <Ti, commit> or <Ti, abort>, then the Transaction Ti must be undone.
Key takeaway
- The log is made up of a series of records. Each transaction is logged and stored in a secure location so that it may be recovered in the event of a failure.
- Any operation that is performed on the database is documented in the log.
A system normally permits many transactions to be processed at the same time during the transaction process. This is referred to as concurrent execution.
Advantages of concurrent execution of a transaction
● Reduce the time spent waiting or the time it takes to complete a task.
● Enhance your response time.
● Throughput or resource consumption has increased.
Concurrency problems
When concurrent transactions are conducted in an unregulated manner, they can cause a variety of issues known as concurrency difficulties.
There are several distinct types of difficulties or conflicts that might arise as a result of concurrent transaction execution:
Lost update problem (Write – Write conflict)
This problem happens when two database transactions visit the same data item and their processes are interleaved, causing the value of some database items to be erroneous.
If two transactions T1 and T2 both access the same data item value and subsequently change it, the second record overwrites the first.
Example: Let’s take the value of A is 100
Time | Transaction T1 | Transaction T2 |
t1 | Read(A) |
|
t2 | A=A-50 |
|
t3 |
| Read(A) |
t4 |
| A=A+50 |
t5 | Write(A) |
|
t6 |
| Write(A) |
Here,
● T1 transaction reads the value of A, which is 100, at t1 time.
● T1 transaction deducts 50 from the value of A at t2 time.
● T2 transactions read the value of A, which is 100, at t3 time.
● T2 transaction increases the value of A by 150 at t4 time.
● T1 transaction publishes the value of A data item at time t5 based on the value seen at time t2, which is 50.
● T2 transaction publishes the value of A at time t6 based on the value seen at time t4, which is 150.
● As a result, at time T6, Transaction T1's change is lost since Transaction T2 overwrites A's value without checking its current value.
● The Lost Update is a term used to describe this type of issue.
Dirty read problem (W-R conflict)
This situation happens when one transaction T1 updates a database data item, and then that transaction fails for some reason, but the alterations are still retrieved by another transaction.
Example: Let’s take the value of A is 100
Time | Transaction T1 | Transaction T2 |
t1 | Read(A) |
|
t2 | A=A+20 |
|
t3 | Write(A) |
|
t4 |
| Read(A) |
t5 |
| A=A+30 |
t6 |
| Write(A) |
t7 | Write(B) |
|
Here,
● T1 transaction reads the value of A, which is 100, at t1 time.
● T1 transaction increases the value of A by 20 at t2.
● T1transaction saves the value of A (120) in the database at t3 time.
● T2 transactions read the value of A data item, which is 120, at t4 time.
● T2 transaction increases the value of A data item by 30 at t5 time.
● T2transaction saves the value of A (150) in the database at t6 time.
● If a T1 transaction fails owing to a power outage at time t7, the transaction is rolled back according to the atomicity property of the transaction (either all or none).
● As a result, at t4 time, transaction T2 contains a value that has not been committed to the database. The value read by the transaction T2 is known as a dirty read.
Unrepeatable read (R-W Conflict)
Example: Let’s take the value of A is 100
Time | Transaction T1 | Transaction T2 |
t1 | Read(A) |
|
t2 |
| Read(A) |
t3 |
| A=A+30 |
t4 |
| Write(A) |
t5 | Read(A) |
|
Here,
● T1 transaction reads the value of A, which is 100, at t1 time.
● T2transaction reads the value of A, which is 100, at t2 time.
● T2 transaction increases the value of A data item by 30 at t3 time.
● T2 transaction writes the value of A (130) in the database at t4 time.
● The value of A is updated in transaction T2. As a result, when transaction T1 performs another read statement, it reads the new value of A, which was modified by T2. R-W conflict is a term used to describe this type of conflict.
One method of concurrency control is the locking mechanism. A data lock is required if several transactions are performed. As a result, we'll need a technique to lock requests and prevent the database from being inconsistent. The lock manager and the transactions communicate in this way to lock and unlock data objects. The LOCK TABLE is the best data structure for implementing the locking protocol because it requires a data structure.
Consider the following scenario to gain a better understanding:
A downward arrow appears beneath the transactions that are requesting a lock. As a result, the locked data elements are 5,47,167, and 15. The node's colour indicates whether it has been granted or is in the process of being granted.
The lock table is empty at first, and none of the data elements are locked.
When the lock manager receives a request to lock a specific data item from a Transaction, such as T(i) on data item P(i), one of two scenarios may occur.
i) If P(i) has already been given a lock, a new node will be appended to the linked list's end, with information about the type of request submitted.
Ii). If P(i) hasn't been locked yet, a linked list will be established and the lock granted.
T(i) will now acquire the lock if the desired lock mode is compatible with the existing transaction lock mode, and the status will be "Granted" otherwise "Waiting."
Databases use locking techniques to produce sequential data output without the need for sequential steps. The locks provide a way for safeguarding the data that is being utilised, preventing anomalies such as lost data or more data being added as a result of a transaction being lost.
Problems that Locks Solve:
● Lost update problem - A lost update occurs when two different transactions try to update the same column on the same row in the same database record at the same time.
● Temporary Update Problem - A temporary update problem, also known as a dirty read problem, happens when a transaction fails to update an item.
● Incorrect Summary Problem - When one transaction takes a summary over the value of all instances of a repeating data-item, and a second transaction updates only a few instances of that exact data-item, an incorrect summary issue develops.
● Phantom Reads - Lowering the isolation level allows more users to access the same data at the same time, but it also increases the number of concurrent effects (such as dirty reads or missing updates) that users may experience.
Different Types of Locks:
In a database, there are three different types of locks that can be utilised.
● Read Locks: Data can only be read when using these types of locks. Depending on the database system and read lock constraints, this can range from allowing only one user to read the data to enabling any user to read the data but not edit it. A read lock can be used on a single row, a group of rows, or the entire database. This can also be influenced by the database system in use, which may restrict the amount of data that can be locked for reading.
● Write Locks: This sort of lock is utilised for database system upgrades. This lock prevents any other transactions from affecting the records that are being accessed when it is activated. This allows transactions to read data before it is changed and the modifications become permanent.
● Exclusive Write Locks: A write lock is similar to this sort of lock. The only difference is that with an exclusive write lock, the original transaction is the only one who can look at or edit the data. While this lock is in place, no other transaction can read the data.
● There are a variety of additional locks that indicate a desire for a different sort of lock. Multi-level locks are the name for these types of locks. This is important for showing other transactions what kinds of locks the transaction has at different levels of the data hierarchy.
Multi-Level-Locks
● Update Lock - Indicates that an exclusive lock will be requested in the near future.
● IS Lock - Indicates that a shared (read) lock will be requested in the near future.
● IX Lock - Indicates that an exclusive (write) lock will be requested in the near future.
Multiple transactions running concurrently in an uncontrolled or unconstrained way might cause a variety of issues.
Concurrency difficulties are a type of problem like this.
The issues with concurrency are as follows:
Dirty Read Problem
This article is said to as a "dirty read" because-
● There's always the possibility that the uncommitted transaction will be reversed later.
● As a result, uncommitted transactions may cause other transactions to read values that do not exist.
● As a result, the database becomes inconsistent.
Inconsistency may not always result from a sloppy read.
It only becomes a problem when an uncommitted transaction fails and has to be rolled back later for any reason.
Example
Here,
● T1 reads the value of A.
● T1 updates the value of A in the buffer.
● T2 reads the value of A from the buffer.
● T2 writes the updated the value of A.
● T2 commits.
● T1 fails in later stages and rolls back.
In this example,
The dirty value of A written by the uncommitted transaction T1 is read by T2.
Later in the game, T1 fails and the game is rolled back.
As a result, the value T2 read is now wrong.
As a result, the database becomes unreliable.
Unrepeatable Read Problem
This problem happens when a transaction reads the same variable unrepeatedly, i.e. distinct values of the same variable in each of its read operations, even though the variable's value has not changed.
Here,
● T1 reads the value of X (= 10 say).
● T2 reads the value of X (= 10).
● T1 updates the value of X (from 10 to 15 say) in the buffer.
● T2 again reads the value of X (but = 15).
In this example,
In its second reading, T2 is given a different value of X.
T2 is perplexed as to how the value of X changed because it claims to be running in isolation.
Lost Update Problem-
This problem arises when numerous transactions are running at the same time and updates from one or more of them are lost.
Here,
● T1 reads the value of A (= 10 say).
● T2 updates the value to A (= 15 say) in the buffer.
● T2 does blind write A = 25 (write without read) in the buffer.
● T2 commits.
● When T1 commits, it writes A = 25 in the database.
In this example,
T1 updates the database with the overwritten value of X.
As a result, the T1 update is lost.
When there is a write-write conflict, this problem develops.
In a write-write conflict, each transaction performs two writes on the same data item, with no read in between.
Phantom Read Problem-
This problem happens when a transaction reads a variable from the buffer and then discovers that the variable does not exist when it reads it again later.
Example
Here,
● T1 reads X.
● T2 reads X.
● T1 deletes X.
● T2 tries reading X but does not find it.
In this example,
When T2 tries to read X again, it discovers that no such variable exists.
T2 is perplexed as to who deleted the variable X because it appears to be running in isolation.
Avoiding Concurrency Problems-
● It is critical to avoid the occurrence of the aforementioned issues in order to ensure database consistency.
● Concurrency Control Protocols aid in the prevention of the aforementioned issues and the database's consistency.
A deadlock occurs when two or more transactions are stuck waiting for one another to release locks indefinitely. Deadlock is regarded to be one of the most dreaded DBMS difficulties because no work is ever completed and remains in a waiting state indefinitely.
For example, transaction T1 has a lock on some entries in the student table and needs to update some rows in the grade table. At the same time, transaction T2 is holding locks on some rows in the grade table and needs to update the rows in the Student table that Transaction T1 is holding.
The primary issue now arises. T1 is currently waiting for T2 to release its lock, and T2 is currently waiting for T1 to release its lock. All actions come to a complete halt and remain in this state. It will remain at a halt until the database management system (DBMS) recognizes the deadlock and aborts one of the transactions.
Fig 5: Deadlock
Deadlock Avoidance
When a database becomes stuck in a deadlock condition, it is preferable to avoid the database rather than abort or restate it. This is a waste of both time and money.
Any deadlock condition is detected in advance using a deadlock avoidance strategy. For detecting deadlocks, a method such as "wait for graph" is employed, although this method is only effective for smaller databases. A deadlock prevention strategy can be utilized for larger databases.
Deadlock Detection
When a transaction in a database waits endlessly for a lock, the database management system (DBMS) should determine whether the transaction is in a deadlock or not. The lock management keeps a Wait for the graph to detect the database's deadlock cycle.
Wait for Graph
This is the best way for detecting deadlock. This method generates a graph based on the transaction and its lock. There is a deadlock if the constructed graph has a cycle or closed loop.
The system maintains a wait for the graph for each transaction that is awaiting data from another transaction. If there is a cycle in the graph, the system will keep checking it.
Fig 6: Wait for a graph
Deadlock Prevention
A huge database can benefit from a deadlock prevention strategy. Deadlock can be avoided if resources are distributed in such a way that deadlock never arises.
The database management system examines the transaction's operations to see if they are likely to cause a deadlock. If they do, the database management system (DBMS) never allows that transaction to be completed.
Time stamping Methods
- Time stamp is a unique identifier created by the DBMS to identify a transaction.
- These values are assigned, in the order in which the transactions are submitted to the system, so timestamp can be thought of as the transaction start time.
- It is denoted as Ts(T).
- Note: - Concurrency control techniques based on timestamp ordering do not use locks, here deadlock cannot occur.
Generation of timestamp:
Timestamps can be generated in several ways like:
- Use a counter that is incremental each time its value is assigned to a transaction. Transaction timestamps are numbered 1,2, 3, in this scheme. The system must periodically reset the counter to zero when no transactions are executing for some short period of time.
- Another way is to use the current date/time of the system lock.
- Two timestamp values are associated with each database item X: -
1) Read-Ts(X): -
i. Read timestamp of X.
Ii. Read –Ts(X) = Ts(T)
Where T is the youngest transaction that has read X successfully.
2) Write-Ts(X): -
i. Write timestamp of X.
Ii.Write-Ts(X) = Ts(T),
Where T is the youngest transaction that has written X successfully.
These timestamps are updated whenever a new read(X) or write(X) instruction is executed.
Timestamp Ordering Algorithm:
a. Whenever some transaction T issues a Read(X) or Write(X) operation, the algorithm compares Ts(T) with Read-Ts(X) & Write-Ts(X) to ensure that the timestamp order of execution is not violated.
b. If violate, transaction is aborted & resubmitted to system as a new transaction with new timestamp.
c. If T is aborted & rolled back, any transaction T1 that may have used a value written by T must also be rolled back.
d. Similarly, any transaction T2 that may have used a value written by T1 must also be rolled back & so on.
So, cascading rollback problems is associated with this algorithm.
Algorithm: -
1) If transaction T issues a written(X) operation: -
a) If Read-Ts(X) > Ts(T)
OR
Write-Ts(X) > Ts(T)
Then abort & rollback T & reject the operation.
This is done because some younger transaction with a timestamp greater than Ts(T) has already read/written the value of X before T had a chance to write(X), thus violating timestamp ordering.
b) If the condition in (a) does not occur then execute write (X) operation of T & set write-Ts(X) to Ts(T).
2) If a transaction T issues a read(X) operation.
a. If write-Ts(X) > Ts(T), then abort & Rollback T & reject the operation.
This should be done because some younger transaction with timestamp greater than Ts(T) has already written the value of X before T had a chance to read X.
b. If Write-Ts(X) ≤ Ts(T), then execute the read(X) operation of T & set read-Ts(X) to the larger of Ts(T) & current read-Ts(X).
Advantages: -
1. Timestamp ordering protocol ensures conflict serializability. This is because conflicting order.
2. The protocol ensures freedom from deadlock, since no transaction ever waits.
Disadvantages: -
1. There is a possibility of starvation of long transactions if a sequence of conflicting short transaction causes repeated restarting of long transaction.
2. The protocol can generate schedules that are not recoverable.
Key takeaway
- A deadlock occurs when two or more transactions are stuck waiting for one another to release locks indefinitely.
- Deadlock is regarded to be one of the most dreaded DBMS difficulties because no work is ever completed and remains in a waiting state indefinitely.
Two-Phase Locking Protocol: -
- Each transaction in the system should follow a set of rules, called locking protocol.
- A transaction is said to follow the two-phase locking protocol, if all locking operation (shared, exclusive protocol) precede the first unlock operation.
- Such a transaction can be divided into two phases: Growing & shrinking phase.
- During Expanding / Growing phase, new locks on items can be acquired but no one can be released.
- During a shrinking phase, existing locks can be released but no new locks can be acquired.
- If every transaction in a schedule follows the two-phase locking protocol, the schedule is guaranteed to be serialization, obviating the need to test for Serializability.
- Consider the following schedules S11, S12. S11 does not follow two-phase locking &S12 follows two-phase locking protocol.
T1
Lock_S(Y) R(Y) Unlock(Y) Lock_X(Q) R(Q) Unlock(Q) Q: = Q + Y; W(Q); Unlock(Q)
|
This schedule does not follow two phase locking because 3 rd instruction lock-X(Q) appears after unlock(Y) instruction.
T1
Lock_S(Y) R(Y) Lock_X(Q) R(Q) Q: = Q + Y; W(Q); Unlock(Q); Unlock(Y); |
This schedule follows two-phase locking because all lock instructions appear before first unlock instruction.
As we all know, a database must adhere to ACID properties in order to maintain consistency. Isolation affects how transaction integrity is observable to other users and systems among these four properties (Atomicity, Consistency, Isolation, and Durability). It indicates that a transaction should occur in a system in such a way that it is the sole transaction that accesses the database system's resources.
The degree to which a transaction must be isolated from data updates performed by any other transaction in the database system is defined by isolation levels.
Isolation influences how transaction integrity is visible to other users and systems in database systems.
Lowering the isolation level allows more users to access the same data at the same time, but it also increases the number of concurrent effects (such as dirty reads or lost updates) that users may experience. A higher isolation level, on the other hand, reduces the types of concurrency effects that users may encounter while also requiring more system resources and increasing the likelihood that one transaction would block another.
Isolation is usually described at the database level as a property that specifies how or when[clarification needed] the changes produced by one activity become apparent to other operations. It may be implemented systematically on older systems, such as by the usage of temporary tables.
To ensure isolation in two-tier systems, a transaction processing (TP) manager is required. To commit the booking and transmit confirmation to the consumer in n-tier systems (such as many websites attempting to reserve the final ticket on a flight), a combination of stored procedures and transaction management is necessary.
Along with atomicity, consistency, and durability, isolation is one of the four ACID qualities.
It determines the visibility of other systems' transactions. Every user has access to the same data at a lower level. As a result, there is a considerable danger of data privacy and system security. Higher isolation levels, on the other hand, reduce the type of concurrency over the data, but they demand more resources and are slower than lower isolation levels.
Isolation methods help protect data from unauthorised transactions. They ensure data integrity by defining how and when changes made by one process become visible to others.
Levels of isolation
Isolations are classified into four levels, which are described below.
● Read Uncommitted − It's the most isolated state possible. Dirty reads are allowed at this level, which means that one can read the uncommitted changes done by another.
● Read committed − There are no dirty reads allowed, and any uncommitted data is now committed after it is read.
● Repeatable Read − This is the most severe form of isolation. The transaction has read locks on all the rows it references, as well as write locks on the rows it updates, inserts, or deletes. As a result, there is no risk of non-repeatable readings.
● Serializable − The pinnacle of civilization. It specifies that all concurrent transactions should be processed in a sequential order.
References:
- Korth, Silbertz, Sudarshan,” Database Concepts”, McGraw Hill
- Date C J, “An Introduction to Database Systems”, Addision Wesley
- Elmasri, Navathe, “Fundamentals of Database Systems”, Addision Wesley
- O’Neil, Databases, Elsevier Pub.