Database Management Systems: December 2019

Tuesday, 24 December 2019

Authorization

Authorization

Authorization is the process of giving someone permission to do or have Something.

Authorization is the function of specifying access rights/privileges to resources, which is related to information security and computer security in general and to access control in particular.

A user may assign a user several forms of authorizations on parts of the database.

Authorizations on data include:

• Authorization to read data.

• Authorization to insert new data.

• Authorization to update data.

• Authorization to delete data.

Each of these types of authorizations is called a privilege.

The user can be authorized with all, none, or a combination of these types of privileges on specified parts of a database, such as a relation or a view.

In addition to authorizations on data, users may also be granted authorizations on the database schema, allowing them, for example, to create, modify, or drop relations. A user who has some form of authorization may be allowed to pass on (grant) this authorization to other users, or to withdraw (revoke) an authorization that was granted earlier.

Granting and Revoking of Privileges

The SQL standard includes the privileges select, insert, update, and delete. The privilege all privileges can be used as a short form for all the allowable privileges.

The SQL data-definition language includes commands to grant and revoke privileges. The grant statement is used to confer authorization. The basic form of this statement is:

grant <privilege list>

on <relation name or view name>

to <user/role list>;

Example:

grant select on department to Ashraf, Suresh;

grant update (budget) on department to Ashraf, Suresh;

To revoke an authorization, we use the revoke statement. It takes a form almost identical to that of grant:

revoke <privilege list>

on <relation name or view name>

from <user/role list>;

Database Security

Database Security

Database security is the technique that protects and secures the database against intentional or accidental threats.

Security concerns will be relevant not only to the data resides in an organization's database: the breaking of security may harm other parts of the system which may ultimately affect the database structure

Threats in a Database

Availability loss − Availability loss refers to non-availability of database objects by legitimate users.
Integrity loss − Integrity loss occurs when unacceptable operations are performed upon the database either accidentally or maliciously. This may happen while creating, inserting, updating or deleting data. It results in corrupted data leading to incorrect decisions.
Confidentiality loss − Confidentiality loss occurs due to unauthorized or unintentional disclosure of confidential information. It may result in illegal actions, security threats and loss in public confidence.

Measures of Control

The measures of control can be broadly divided into the following categories −

Access Control − Access control includes security mechanisms in a database management system to protect against unauthorized access. A user can gain access to the database after clearing the login process through only valid user accounts. Each user account is password protected.
Flow Control − Distributed systems encompass a lot of data flow from one site to another and also within a site. Flow control prevents data from being transferred in such a way that it can be accessed by unauthorized agents. A flow policy lists out the channels through which information can flow. It also defines security classes for data as well as transactions.
Data Encryption − Data encryption refers to coding data when sensitive data is to be communicated over public channels. Even if an unauthorized agent gains access of the data, he cannot understand it since it is in an incomprehensible format.

Database Recovery - Kinds of failures - Failure Control Methods - Recovery

Database Recovery

Database recovery is the method of restoring the database to its correct state in the event of a failure at the time of the transaction or after the end of a process. Database systems, like any other computer system, are subject to failures but the data stored in it must be available as and when required. When a database fails it must possess the facilities for fast recovery. It must also have atomicity i.e. either transactions are completed successfully and committed (the effect is recorded permanently in the database) or the transaction should have no effect on the database.

Kinds of failures

1. Transaction failure: A transaction needs to abort once it fails to execute or once it reaches to any further extent from wherever it can’t go to any extent further. This is often known as transaction failure wherever solely many transactions or processes are hurt. The reasons for transaction failure are:

· Logical errors

· System errors

Logical errors: Where a transaction cannot complete as a result of its code error or an internal error condition.

System errors: Wherever the information system itself terminates a transaction as a result of the DBMS isn’t able to execute it, due to some system condition.

2. System crash: There are issues − external to the system − that will cause the system to prevent abruptly and cause the system to crash. For instance, interruptions in power supply might cause the failure of underlying hardware or software package failure. Examples might include OS errors.

3. Disk failure: In early days of technology evolution, it had been a typical drawback wherever hard-disk drives or storage drives accustomed to failing oftentimes. Disk failures include the formation of dangerous sectors, unreachability to the disk, disk crash or the other failure, that destroys all or a section of disk storage.

The sources of failure are:

Due to hardware or software errors, the system crashes which ultimately resulting in loss of main memory.
Failures of media, such as head crashes or unreadable media that results in the loss of portions of secondary storage.
There can be application software errors, such as logical errors which are accessing the database that can cause one or more transactions to abort or fail.
Natural physical disasters can also occur such as fires, floods, earthquakes, or power failures.
Carelessness or unintentional destruction of data or directories by operators or users.
Damage or intentional corruption or hampering of data (using malicious software or files) hardware or software facilities.

Whatever the reasons of the failure are, there are two principal things that have to be considered:

Failure of main memory including that database buffers.
Failure of the disk copy of that database.

Failure Control Methods

There are 2 forms of techniques, which may facilitate a database management system in recovering as well as maintaining the atomicity of a transaction:

Maintaining the logs of every transaction, and writing them onto some stable storage before truly modifying the info.
Maintaining shadow paging, wherever the changes are done on a volatile memory, and later, and the particular info is updated.

Log-based recovery Or (Manual Recovery):

Log could be a sequence of records, which maintains the records of actions performed by dealing. It’s necessary that the logs area unit written before the particular modification and hold on a stable storage media, that is failsafe. Log-based recovery works as follows:

The log file is unbroken on a stable storage media.
When a transaction enters the system and starts execution, it writes a log regarding it.

Recovery with concurrent transactions (Automated Recovery):

When more than one transaction are being executed in parallel, the logs are interleaved. At the time of recovery, it would become hard for the recovery system to backtrack all logs, and then start recovering. To ease this situation, most modern DBMS use the concept of 'checkpoints'.

Checkpoint: Keeping and maintaining logs in real time and in real environment may fill out all the memory space available in the system. As time passes, the log file may grow too big to be handled at all. Checkpoint is a mechanism where all the previous logs are removed from the system and stored permanently in a storage disk. Checkpoint declares a point before which the DBMS was in consistent state, and all the transactions were committed.

Recovery

When a system with concurrent transactions crashes and recovers, it behaves in the following manner

The recovery system reads the logs backwards from the end to the last checkpoint.
It maintains two lists, an undo-list and a redo-list.
If the recovery system sees a log with <T_n, Start> and <T_n, Commit> or just <T_n, Commit>, it puts the transaction in the redo-list.
If the recovery system sees a log with <T_n, Start> but no commit or abort log found, it puts the transaction in undo-list.

All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the redo-list and their previous logs are removed and then redone before saving their logs.

Sunday, 22 December 2019

Optimistic Concurrency Control

Optimistic Concurrency Control

The optimistic method of concurrency control is based on the assumption that conflicts of database operations are rare and that it is better to let transactions run to completion and only check for conflicts before they commit.

An optimistic concurrency control method is also known as validation or certification methods. No checking is done while the transaction is executing. The optimistic method does not require locking or time stamping techniques. Instead, a transaction is executed without restrictions until it is committed.

In optimistic methods, each transaction moves through the following phases:

Read phase.
Validation or certification phase.
Write phase.

Read phase: In this phase, the transaction T is read and executed. It is used to read the value of various data items and stores them in temporary local variables. It can perform all the write operations on temporary variables without an update to the actual database.

Validation phase: In this phase, the temporary variable value will be validated against the actual data to see if it violates the serializability.

Three rules to enforce serializability in validation phase −

Rule 1 − Given two transactions T_i and T_j, if T_i is reading the data item which T_j is writing, then T_i’s execution phase cannot overlap with T_j’s commit phase. T_j can commit only after T_i has finished execution.

Rule 2 − Given two transactions T_i and T_j, if T_i is writing the data item that T_j is reading, then T_i’s commit phase cannot overlap with T_j’s execution phase. T_j can start executing only after T_i has already committed.

Rule 3 − Given two transactions T_i and T_j, if T_i is writing the data item which T_j is also writing, then T_i’s commit phase cannot overlap with T_j’s commit phase. T_j can start to commit only after T_i has already committed.

Write phase: If the validation of the transaction is validated, then the temporary results are written to the database or system otherwise the transaction is rolled back.

Deadlock in DBMS and Prevention

Deadlock in DBMS

A deadlock is a condition where two or more transactions are waiting indefinitely for one another to give up locks. Deadlocks are not healthy for a system. In case a system is stuck in a deadlock, the transactions involved in the deadlock are either rolled back or restarted.

For example, transaction T1 holds a lock on some rows in students table and needs to update some rows in the grade table. Simultaneously, transaction T2 holds locks on some rows in the grade table and needs to update the rows in the Student table held by Transaction T1. Now the problem is Transaction T1 is waiting for T2 to release its lock and similarly, transaction T2 is waiting for T1 to release its lock. All activities come to a halt state and no work is done. It will remain in a standstill until the DBMS detects the deadlock and aborts one of the transactions.

Deadlock Avoidance –

It is always better to avoid the deadlock rather than restarting or aborting the database.

Deadlock avoidance method is suitable for smaller database whereas deadlock prevention method is suitable for larger database.
One method of avoiding deadlock is using application consistent logic.

In the above given example, Transactions that access Students and Grades should always access the tables in the same order. In this way, Transaction T1 simply waits for transaction T2 to release the lock on Grades before it begins. When transaction T2 releases the lock, Transaction T1 can proceed freely.
Another method for avoiding deadlock is to apply both row level locking mechanism and READ COMMITTED isolation level. However, It does not guarantee to remove deadlocks completely.

Deadlock Detection

When a transaction waits indefinitely to obtain a lock, the database managememt system should detect whether the transaction is involved in a deadlock or not.

One of the methods to detect deadlock situation is Wait-for-graph. This method is suitable for smaller database. In this method a graph is drawn based on the transaction and their lock on the resource. If the graph created has a closed loop or a cycle, then there is a deadlock.

Example of the Wait-For graph is drawn below

When a deadlock has been detected the system recover from the deadlock. The most common solution is to rollback one or more transactions to break the deadlock (victim selection). The victim has to be selected such that the rollback of those transaction incur minimum cost.

- the transaction which has fewest locks

- transaction tahat has done least work

- transaction that is farthest from completion.

Deadlock prevention

Deadlock prevention method is suitable for large database. A deadlock can be prevented if the resources are allocated in such a way that deadlock never occur. The DBMS analyzes the operations whether they can create deadlock situation or not, If they create a deadlock situation, then that transaction is never allowed to be executed.

Deadlock prevention mechanism proposes two schemes:

Wait-Die Scheme
In this scheme, if a transaction request for a resource that is locked by other transaction, then the DBMS simply checks the timestamp of both transactions and allows the older transaction to wait until the resource is available for execution.
Suppose, there are two transactions T1 and T2 and let timestamp of any transaction T be TS (T). Now, if there is a lock on T2 by some other transaction and T1 is requesting for resources held by T2, then DBMS performs following actions:

Checks if TS (T1) < TS (T2) – if T1 is the older transaction and T2 has held some resource, then it allows T1 to wait until resource is available for execution. If T1 is older transaction and has held some resource with it and if T2 is waiting for it, then T2 is killed and restarted latter with random delay but with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.

Wound Wait Scheme
In this scheme, if an older transaction requests for a resource held by younger transaction, then older transaction forces younger transaction to kill the transaction and release the resource. The younger transaction is restarted with minute delay but with same timestamp. If the younger transaction is requesting a resource which is held by older one, then younger transaction is asked to wait till older releases it.

Saturday, 21 December 2019

Two Phase Locking (2PL)

Two Phase Locking (2PL)

Two-Phase Locking (2PL) is a concurrency control method. It ensures conflict serializable schedules. 2PL divides the execution phase of a transaction into three parts.

In the first phase, when the transaction begins to execute, it requires permission for the locks it needs.
The second part the transaction acquires all the locks. When a transaction releases its first lock, the third phase starts.
In this third phase, the transaction cannot demand any new locks. Instead, it only releases the acquired locks.

The Two-Phase Locking protocol allows each transaction to make a lock or unlock request in two steps:

Growing Phase: In this phase transaction may obtain locks but may not release any locks.
Shrinking Phase: In this phase, a transaction may release locks but not obtain any new lock

2PL protocol offers serializability. The 2PL algorithm offers Strict Serializability, which is the golden standard when it comes to data integrity. But it does not avoid cascading rollback and deadlocks.

Consider a schedule:

Where LP is lock point, Read(A) in T₂ and T₃ denotes Dirty Read because of Write(A) in T₁. Because of Dirty Read in T₂ and T₃ in lines 8 and 12 respectively, when T₁ failed we have to rollback others also. Hence Cascading Rollbacks are possible in 2-PL.

Locking Protocol and locks

Locking Protocol

Locking protocols are used in database management systems as a means of concurrency control. Multiple transactions may request a lock on a data item simultaneously. Hence, a mechanism is required to manage the locking requests made by transactions. Such a mechanism is called as Lock Manager. It relies on the process of message passing where transactions and lock manager exchange messages to handle the locking and unlocking of data items.

Data structure used in Lock Manager –
The data structure required for implementation of locking is called as Lock table.

It is a hash table where name of data items are used as hashing index.
Each locked data item has a linked list associated with it.
Every node in the linked list represents the transaction which requested for lock, mode of lock requested (mutual/exclusive) and current status of the request (granted/waiting).
Every new lock request for the data item will be added in the end of linked list as a new node.
Collisions in hash table are handled by technique of separate chaining.

Working of Lock Manager –

Initially the lock table is table empty as no data item is locked.
Whenever lock manger receives a lock request from a transaction T_i on a particular data item Q_i following cases may arise:

If Q_i is not already locked, a linked list will be created and lock will be granted to the requesting transaction T_i.
If the data item is already locked, a new node will be added at the end of its linked list containing the information about request made by T_i.

If the lock mode requested by T_i is compatible with lock mode of transaction currently having the lock, T_i will acquire the lock too and status will be changed to ‘granted’. Else, status of T_i’s lock will be ‘waiting’.
If a transaction T_i wants to unlock the data item it is currently holding, it will send an unlock request to the lock manager. The lock manger will delete T_i’s node from this linked list. Lock will be granted to the next transaction in the list.
Sometimes transaction T_i may have to be aborted. In such a case all the waiting request made by T_i will be deleted from the linked lists present in lock table. Once abortion is complete, locks held by T_i will also be released.

Locks

A lock is a data variable which is associated with a data item. This lock signifies that operations that can be performed on the data item. Locks help synchronize access to the database items by concurrent transactions.

All lock requests are made to the concurrency-control manager. Transactions proceed only once the lock request is granted.

Binary Locks: A Binary lock on a data item can either locked or unlocked states.

Shared/exclusive: This type of locking mechanism separates the locks based on their uses. If a lock is acquired on a data item to perform a write operation, it is called an exclusive lock.

1. Shared Lock (S): A shared lock is also called a Read-only lock. With the shared lock, the data item can be shared between transactions. This is because you will never have permission to update data on the data item.

For example, consider a case where two transactions are reading the account balance of a person. The database will let them read by placing a shared lock. However, if another transaction wants to update that account's balance, shared lock prevent it until the reading process is over.

2. Exclusive Lock (X): With an Exclusive Lock, a data item can be read as well as written. This is exclusive and can't be held concurrently on the same data item. X-lock is requested using lock-x instruction. Transactions may unlock the data item after finishing the 'write' operation.

Wednesday, 18 December 2019

DBMS Serializability

DBMS Serializability

Serializability is the classical concurrency scheme. It ensures that a schedule for executing concurrent transactions is equivalent to one that executes the transactions serially in some order. When multiple transactions are running concurrently then there is a possibility that the database may be left in an inconsistent state. Serializability is a concept that helps us to check which schedules are serializable.

Serializable schedule

A serializable schedule always leaves the database in consistent state. A serial schedule is always a serializable schedule because in serial schedule, a transaction only starts when the other transaction finished execution. However a non-serial schedule needs to be checked for Serializability. A non-serial schedule of n number of transactions is said to be serializable schedule, if it is equivalent to the serial schedule of those n transactions.

There are two types of Serializability:

1. Conflict Serializability

2. View Serializability

1. Conflict Serializability:

Conflict Serializability is one of the type of Serializability, which can be used to check whether a non-serial schedule is conflict serializable or not.

A schedule is called conflict serializable if we can convert it into a serial schedule after swapping its non-conflicting operations.

Conflicting operations

Two operations are said to be in conflict, if they satisfy all the following three conditions:

1. Both the operations should belong to different transactions.
2. Both the operations are working on same data item.
3. At least one of the operation is a write operation.

Example : Operation W(X) of transaction T1 and operation R(X) of transaction T2 are conflicting operations, because they satisfy all the three conditions mentioned above. They belong to different transactions, they are working on same data item X, one of the operation in write operation.

Two schedules are said to be conflict Equivalent if one schedule can be converted into other schedule after swapping non-conflicting operations. If a schedule is conflict Equivalent to its serial schedule then it is called Conflict Serializable schedule.

Example of Conflict Serializability

Consider this schedule:

To convert this schedule into a serial schedule we must have to swap the R(A) operation of transaction T2 with the W(A) operation of transaction T1. However we cannot swap these two operations because they are conflicting operations, thus we can say that this given schedule is not Conflict Serializable.

Another Example:

Swapping non-conflicting operations:

1. swapping R(A) of T1 and R(A) of T2

2. swapping R(A) of T1 and R(B) of T2

3. swapping R(A) of T1 and W(B) of T2.

After doing above swappings, the schedule is

This is a serial schedule after swapping all the non-conflicting operations so the given schedule is Conflict Serializable.

2. View Serializability

View Serializability is a process to find out that a given schedule is view serializable or not. A given schedule is view serializable, if the given schedule is View Equivalent to its serial schedule.

View Equivalent

Two schedules T1 and T2 are said to be view equivalent, if they satisfy all the following conditions:

1. Initial Read: Initial read of each data item in transactions must match in both schedules. For example, if transaction T1 reads a data item X before transaction T2 in schedule S1 then in schedule S2, T1 should read X before T2.

2. Final Write: Final write operations on each data item must match in both the schedules. For example, a data item X is last written by Transaction T1 in schedule S1 then in S2, the last write operation on X should be performed by the transaction T1.

3. Update Read: If in schedule S1, the transaction T1 is reading a data item updated by T2 then in schedule S2, T1 should read the value after the write operation of T2 on same data item. For example, In schedule S1, T1 performs a read operation on X after the write operation on X by T2 then in S2, T1 should read the X after T2 performs write on X.

View Serializable

If a schedule is view equivalent to its serial schedule then the given schedule is said to be View Serializable.

Example:

Now check the three conditions of view serializability:

Initial Read

In schedule S1, transaction T1 first reads the data item X. In S2 also transaction T1 first reads the data item X.

Check for Y in schedule S1, transaction T1 first reads the data item Y. In S2 also the first read operation on Y is performed by T1.

For both data items X & Y, the initial read condition is satisfied in S1 & S2.

Final Write

In schedule S1, the final write operation on X is done by transaction T2. In S2 also transaction T2 performs the final write on X.

Checking for Y in schedule S1, the final write operation on Y is done by transaction T2. In schedule S2, final write on Y is done by T2.

For both data items X & Y, the final write condition is satisfied in S1 & S2.

Update Read

In S1, transaction T2 reads the value of X, written by T1. In S2, the same transaction T2 reads the X after it is written by T1.

In S1, transaction T2 reads the value of Y, written by T1. In S2, the same transaction T2 reads the value of Y after it is updated by T1.

The update read condition is also satisfied for both the schedules.

Since all the three conditions that checks whether the two schedules are view equivalent are satisfied in this example, which means S1 and S2 are view equivalent.

Posts