Unit - 4
Transaction processing
Database Management System Concurrency Control is a procedure for handling simultaneous operations without interfering with each other. It ensures that database transactions are conducted simultaneously and correctly in order to generate the right results without compromising the respective database's data integrity.
If all users are only reading data, concurrent access is very fast. There is no way that they are going to interfere with each other. Since it will have a combination of READ and WRITE operations with any realistic database, the competition is therefore a challenge.
To address such conflicts, which often occur with a multi-user system, DBMS Concurrency Control is used. Concurrency Control is therefore the most important aspect of a database management system's proper functioning, where two or more database transactions are performed concurrently, which involve access to the same data.
Most of the techniques ensure Serializability of schedules, using protocols that guarantee Serializability.
Different types of protocols/ schemes used to control concurrent execution of transaction area:-
- Lock based protocol
- Time stamp-based protocol
- Validation based protocol (optimistic)
- Multiple granularity
- Multifermion schemes
DBMS is the purpose for using the Method of Concurrency Control:
● Application of Isolation by mutual exclusion between conflicting transactions
● To address problems with read-write and write-write conflicts
● To ensure database consistency by continuously maintaining obstacles to execution
● The interaction between concurrent transactions needs to be regulated by the system. Using concurrent-control systems, this control is accomplished.
● Regulation of contests helps to ensure serialisation
Key takeaway:
- Concurrency control techniques are used to ensure the noninterference or isolation property of concurrently executing transactions.
- If all users are only reading data, concurrent access is very fast.
- Database Management System Concurrency Control is a procedure for handling simultaneous operations without interfering with each other.
A transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability − commonly known as ACID properties − in order to ensure accuracy,
Completeness, and data integrity.
● Atomicity
● Consistency
● Isolation
● Durability
Atomicity: -
❏ A transaction is an atomic unit of processing.
❏ It is either performed in its entirely or not performed at all.
❏ It means either all operations are reflected in the database or none are.
❏ It is responsibility of the transaction recovery subsystem to ensure atomicity.
❏ If a transaction fails to complete for some reason, recovery technique must undo any effects of the transaction on the database.
Consistency: -
❏ A transaction is consistency preserving if its complete execution takes the database from one consistent state to other.
❏ It is responsibility of the programmer who writes database programs.
❏ Database programs should be written in such a way that, if database is in a consistent state before execution of transaction, it will be in a consistent state after execution of transaction.
Eg. Transaction is :- Transfer $50 from account A to account B.
If account A having amount $100 & account B having amount $ 200 initially then after execution of transaction, A must have $ 50 & B must have $ 250.
Isolation: -
❏ If multiple transactions execute concurrently, they should produce the result as if they are working serially.
❏ It is enforced by concurrency control subsystem.
❏ To avoid the problem of concurrent execution, transaction should be executed in isolation.
Durability: -
❏ The changes applied to the database, by a committed transaction must persist in a database.
❏ The changes must not be lost because of any failure.
❏ It is the responsibility of the recovery subsystem of the DBMS.
Key takeaway;
- A transaction is an atomic unit of processing.
- To avoid the problem of concurrent execution, transaction should be executed in isolation.
- A transaction is consistency preserving if its complete execution takes the database from one consistent state to other.
A schedule that can be serialized often leaves the database in a consistent state. A serial schedule is often a serializable schedule since a transaction only begins after the other transaction has completed execution in the serial schedule. However, for serialization, a non-serial schedule needs to be tested.
A non-serial schedule of n transactions is known to be a serialized schedule, given that the serial schedule of n transactions is equal to that of n transactions. Only one transaction is executed at a time and the other begins when the already operating transaction is completed. A serial schedule does not allow choice.
A serializable schedule helps to maximize both the usage of resources and CPU throughput.
Types of serializability: - There are 2 types of serializability:
- Conflict serializability
- View serializability
Conflict serializability
If it can be converted into a serial schedule by exchanging non-conflicting operations, a schedule is called dispute serializable. If all conditions comply, two activities are said to be conflicting:
● They are part of numerous transactions.
● On the same data object, they run
● At least one of them is a writing procedure.
To understand this, let us see some examples:
Example
T1 T2
----- ------
R(A)
R(B)
R(A)
R(B)
W(B)
W(A)
We need to swap the R(A) operation of Transaction T2 with the W(A) operation of Transaction T1 to turn this schedule into a serial schedule. However, since they are competing operations, we cannot swap these two operations, so we can conclude that this given schedule is not Serializable to Conflict.
View serializability
If the view is equal to a serial schedule, a schedule is called view serializable (no overlapping transactions). A conflict schedule is a serializable view, but if blind writing is included in the serializability, the serializable view is not serializable conflict.
View equivalent
If they satisfy the following conditions, two timetables S1 and S2 are said to be considered equivalent:
- Initial Read
The initial reading of the two timetables must be the same. Suppose there are two S1 and S2 schedules. If a transaction T1 reads data item A in schedule S1, then in S2, transaction T1 can read A as well.
Fig 1: two schedules (initial read)
The two schedules above are considered as similar since T1 performs the initial read operation in S1 and T1 also performs it in S2.
2. Updated Read
If Ti reads A that is updated by Tj in schedule S1, then in S2 as well, Ti can read A that is updated by Tj.
Fig 2: two schedules(updated read)
The two schedules above are not considered equivalent since, in S1, T3 reads A updated by T2 and in S2, T3 reads A updated by T1.
3. Final Read
For the two schedules, the final writing must be the same. In Schedule S1, if a transaction T1 eventually updates A then in S2, T1 should also perform final writing operations.
Fig 3: two schedules(final read)
The view is identical to the above two schedules since T3 performs the final writing operation in S1 and T3 performs the final writing operation in S2.
Key takeaway:
- If we are able to turn it into a serial schedule after swapping its non-conflicting activities, a schedule is called dispute serializable.
- View Serializability is a tool to assess whether or not a given schedule can be serialised as a view.
Lock - based method
i. A lock is a variable associated with a data item which describes the status of the item with respect to possible operations that can be applied to it.
Ii. Generally, there is one lock for each data item in the database.
Iii. Locks are used as a means of synchronizing the access by concurrent transactions to the database items.
Locking Models: -
There are two models in which a data item may be locked: -
a. Shared (Read) Lock Mode &
b. Exclusive (write) Lock Mode
a. Shared-Lock Mode: -
● If a transaction Ti has obtained a shared mode lock on item Q, then Ti can read Q but cannot modify/write Q.
● Shared lock is also called as a Read-lock because other transactions are allowed to read the item.
● It is denoted by s.
Eg. T1 T2
Locks(A);
Locks(B);
T1 has locked A in shared mode & T2 also has locked B in shared lock mode.
b. Exclusive-Lock Mode: -
● If a transaction Ti has obtained a exclusive-lock mode on item Q, then Ti can read & write Q.
● Exclusive –lock is also called a write-lock because a single transaction exclusively holds the locks on the item
● It is denoted by X.
T1 T2
LockX(A)
LockX(B);
● T1 has locked A in exclusive-mode & T2 has locked B in exclusive mode.
Lock-Compatibility Matrix: -
| S | X |
S | ✓ | X |
X | X | X |
● If a transaction Ti has locked Q in shared mode, then another transaction Tj can lock Q in shared mode only. This is called as composite mode.
● If a transaction Ti has locked Q in S mode, another transaction Tj can’t lock Q in X mode. This is called incompatible mode.
● Similarly, if a transaction Ti has locked Q in X mode, then other transaction Tj can’t lock Q in either S or X mode.
● To access, a data item, transaction Ti must first lock that item.
● If the data item is already locked by another transaction in an incompatible mode, the concurrency control manager will not grant the lock until lock has been released.
● Thus, Ti is mode to wait until all incompatible locks have been released.
● A transaction can unlock a data item by unlock (Q)instruction.
Eg.1. Transaction T1 displays total amount of money in accounts A & B.
Without-locking modes: -
T1
R(A)
R(B)
Display(A+B)
With-locking modes: -
T1
Locks(A)
R(A) unlock(A)
Locks(B)
R(B) unlock(B)
Display(A+B)
T1 locks A & B in shared mode because T1 uses A & B only for reading purpose.
Two-Phase Locking Protocol: -
● Each transaction in the system should follow a set of rules, called locking protocol.
● A transaction is said to follow the two-phase locking protocol, if all locking operation (shared, exclusive protocol) precede the first unlock operation.
● Such a transaction can be divided into two phases: Growing & shrinking phase.
● During Expanding / Growing phase, new locks on items can be acquired but no one can be released.
● During a shrinking phase, existing locks can be released but no new locks can be acquired.
● If every transaction in a schedule follows the two-phase locking protocol, the schedule is guaranteed to be serialization, obviating the need to test for Serializability.
● Consider, following schedules S11, S12. S11 does not follow two-phase locking &S12 follows two-phase locking protocol.
T1
Lock_S(Y) R(Y) Unlock(Y) Lock_X(Q) R(Q) Unlock(Q) Q: = Q + Y; W(Q); Unlock(Q)
|
This schedule does not follow two phase locking because 3 rd instruction lock-X(Q) appears after unlock(Y) instruction.
T1
Lock_S(Y) R(Y) Lock_X(Q) R(Q) Q: = Q + Y; W(Q); Unlock(Q); Unlock(Y); |
This schedule follows two-phase locking because all lock instructions appear before first unlock instruction.
Time stamping Methods
● Time stamp is a unique identifier created by the DBMS to identify a transaction.
● These values are assigned, in the order in which the transactions are submitted to the system, so timestamp can be thought of as the transaction start time.
● It is denoted as Ts(T).
● Note: - Concurrency control techniques based on timestamp ordering do not use locks, here deadlock cannot occur.
Generation of timestamp: -
Timestamps can be generated in several ways like: -
● Use a counter that is incremental each time its value is assigned to a transaction. Transaction timestamps are numbered 1,2, 3, in this scheme. The system must periodically reset the counter to zero when no transactions are executing for some short period of time.
● Another way is to use the current date/time of the system lock.
- Two timestamp values are associated with each database item X: -
1) Read-Ts(X): -
i. Read timestamp of X.
Ii.Read –Ts(X) = Ts(T)
Where T is the youngest transaction that has read X successfully.
2) Write-Ts(X): -
i. Write timestamp of X.
Ii.Write-Ts(X) = Ts(T),
Where T is the youngest transaction that has written X successfully.
These timestamps are updated whenever a new read(X) or write(X) instruction is executed.
Timestamp Ordering Algorithm: -
a. Whenever some transaction T issues a Read(X) or Write(X) operation, the algorithm compares Ts(T) with Read-Ts(X) & Write-Ts(X) to ensure that the timestamp order of execution is not violated.
b. If violate, transaction is aborted & resubmitted to system as a new transaction with new timestamp.
c. If T is aborted & rolled back, any transaction T1 that may have used a value written by T must also be rolled back.
d. Similarly, any transaction T2 that may have used a value written by T1 must also be rolled back & so on.
So, cascading rollback problems is associated with this algorithm.
Algorithm: -
1) If transaction T issues a written(X) operation: -
a) If Read-Ts(X) > Ts(T)
OR
Write-Ts(X) > Ts(T)
Then abort & rollback T & reject the operation.
This is done because some younger transaction with a timestamp greater than Ts(T) has already read/written the value of X before T had a chance to write(X), thus violating timestamp ordering.
b) If the condition in (a) does not occur then execute write (X) operation of T & set write-Ts(X) to Ts(T).
2) If a transaction T issues a read(X) operation.
a. If write-Ts(X) > Ts(T), then abort & Rollback T & reject the operation.
This should be done because some younger transaction with timestamp greater than Ts(T) has already written the value of X before T had a chance to read X.
b. If Write-Ts(X) ≤ Ts(T), then execute the read(X) operation of T & set read-Ts(X) to the larger of Ts(T) & current read-Ts(X).
Advantages: -
1. Timestamp ordering protocol ensures conflict serializability. This is because of conflicting order.
2. The protocol ensures freedom from deadlock, since no transaction ever waits.
Disadvantages: -
1. There is a possibility of starvation of long transactions if a sequence of conflicting short transaction causes repeated restarting of long transaction.
2. The protocol can generate schedules that are not recoverable.
Key takeaway:
- Concurrency control techniques are used to ensure the noninterference or isolation property of concurrently executing transactions.
- Locks are used as a means of synchronizing the access by concurrent transactions to the database items.
- Time stamp is a unique identifier created by the DBMS to identify a transaction.
Multiversion Concurrency Control (MVCC) is a way to control the consistency of data accessed simultaneously by multiple users. The snapshot isolation guarantee is enforced by MVCC, which guarantees that each transaction always sees a clear data snapshot.
When it begins, each transaction obtains a consistent data snapshot and can only access and change data in this snapshot. When a transaction updates an entry, Ignite checks that other transactions have not modified the entry and generates a new version of the entry. Only when and if this transaction is successfully committed will the new version become available to other transactions. If the entry has been modified, an exception to the current transaction fails.
Snapshots are not physical snapshots, but logical MVCC-coordinator-generated snapshots: a cluster node that coordinates the cluster's transactional operation. The coordinator keeps track of all current transactions and, when each transaction is completed, is informed. All operations with an MVCC-enabled cache request a coordinator data snapshot.
Optimistic Techniques(validation)
The DBMS validation-based protocol, also known as the Positive Concurrency Management Technique, is a tool for preventing transaction rivalry. In this protocol, instead of the data itself, the local copies of the transaction data are modified, resulting in less intrusion as the transaction is executed.
The Validation-based Protocol is carried out in three stages:
- Read phase
- Validation phase
- Write phase
Read phase
In the Read Process, a transaction may read the data values from the database, but only local data copies, not the original database, are applied to write operations or updates.
Validation phase
The data is reviewed in the Validation Process to ensure that there is no breach of serializability when adding transaction changes to the database.
Write phase
In the Write Process, if validation is successful, updates are added to the database; otherwise, updates are not applied, and the transaction is rolled back.
Key takeaway :
- Optimistic technique is a tool for preventing transaction rivalry.
- Multiversion Concurrency Control is a way to control the consistency of data accessed simultaneously by multiple users.
In the event of a failure at the time of the transaction or after the completion of a procedure, this is the method of restoring the database to its correct state. The idea of database recovery was previously given to you as a service that should be offered by all the DBMS to ensure that the database is stable and stays in a consistent state in the presence of failures. Reliability in this sense refers to both the DBMS's flexibility for different forms of failure and its ability to recover from those failures.
You will first learn about the need for recovery and its types of failure, which usually happens in a database environment, to gain a better understanding of the possible problems you may encounter in providing a consistent system.
The recovery of databases can be divided into two parts;
- In the corresponding data blocks, Rolling Forward applies redo records.
- Rolling Back adds segments of rollback to the data files. It is stored in tables for transactions.
Using Log–Based Recovery, we can restore the database.
Log based recovery
● Logs are a series of documents that hold records of a transaction's actions.
● In Log-Based Recovery, in some secure storage, the log of each transaction is stored. It can be retrieved from there to restore the database if any malfunction happens.
● The log includes information about the transaction being executed, changed values, and the status of the transaction.
● All this data will be processed in the execution order.
Example
Assume that a transaction modifies an employee's address. For this transaction, the logs below are written,
Log 1: It begins the transaction, writes 'START' log.
Log: <Tn START>
Log 2: The transaction modifies the address to 'Raipur' from 'Mumbai'.
Log: <Tn Address, 'Mumbai', 'Raipur'>
Log 3: The transaction is finished. The log shows the transaction's end.
Log: <Tn COMMIT>
There are two ways to build the log files and to update the database :
1. Deferred Database Modification
Both logs are created for the transaction and stored in a secure storage system. In the example above, three log records are generated and stored in a storage system, and these steps will update the database.
2. Immediate Database Modification
After each log record has been generated, the database is automatically updated for each log entry stage. In the above example, at each step of the log entry, the database is updated, which means that after the first log entry, the transaction hits the database to fetch the record, then the second log is entered, followed by updating the address of the employee, then the third log, followed by committing changes to the database.
Recovery with Concurrent Transaction
● If two transactions are executed in parallel, the logs are split. The recovery system will find it difficult to restore all logs to an earlier point and then start recovering.
● 'Checkpoint' is used to fix this example.
Key takeaway:
- Like every other computer system, database systems are prone to failure, but the data contained in them must be accessible as and when necessary.
- The process of restoring the database and the data to a consistent state is Database Recovery. This can involve the retrieval of missing information up to the point of the incident (e.g. System crash).
References:
- “Principles of Database and Knowledge – Base Systems”, Vol 1 by J. D. Ullman,
Computer Science Press.
2. “Database System Concepts”, 6th Edition by Abraham Silberschatz, Henry F. Korth, S. Sudarshan, McGraw-Hill
3. “Fundamentals of Database Systems”, 5th Edition by R. Elmasri and S. Navathe, Pearson Education