Database Management Systems: File Organization and its types

File Organization and its types

File organization in DBMS is the technique of arranging records in a file so they can be accessed efficiently.

In DBMS (Database Management System), file organization refers to the way data is stored in files on disk. The choice of file organization affects the efficiency of data access, storage space, and performance of various database operations like search, insert, delete, and update.

Types of File Organization in DBMS:

1. Heap (Unordered) File Organization

2. Sequential (Ordered) File Organization

3. Hashed File Organization

4. Clustered File Organization

5. Indexed File Organization

Objectives of file organization are:

It contains an optimal selection of records, i.e., records can be selected as fast as possible.
To perform insert, delete or update transaction on the records should be quick and easy.
The duplicate records cannot be induced as a result of insert, update or delete.
For the minimal cost of storage, records should be stored efficiently.

Heap file organization

Heap file organization (also known as unordered file organization) is the simplest method of storing records in a DBMS. In this type of file organization, records are inserted wherever space is available, without any particular order.

Records are inserted wherever space is available. There is no specific order based any attribute. If a record is to be added, it will go into the next available empty slot.

To search, update or delete the data in heap file organization, then we need to traverse the data from staring of the file till we get the requested record. If the database is very large then searching, updating or deleting of record will be time-consuming because there is no sorting or ordering of records.

Advantages:

Very simple to implement.
Fast inserts — no need to maintain order.
Efficient for applications with frequent inserts and minimal reads.

Disadvantages:

Slow search and retrieval (especially for large datasets).
No support for range queries.

Sequential File Organization

Sequential File Organization is a method of storing records in a file one after another in a sorted order based on a key field (like employee ID or name). Sequential file organization is a way of storing data records in a sorted sequence, which makes reading records in order or within a range faster and easier.

The main characteristic of sequential file organization is that records are stored in a sorted order. This sorting allows for efficient operations like binary search and range queries. Maintaining this order means that insertion and deletion can be more complex because the file must stay sorted.

Advantages

Efficient for Range Queries: Since records are stored in sorted order, retrieving records within a range (e.g., all employees with IDs between 1000 and 2000) is fast and straightforward.
Fast Sequential Access: Reading records one after another is efficient, making it ideal for batch processing and report generation.
Supports Binary Search: Because of sorting, binary search can be used to quickly find specific records (faster than linear search).
Simple Structure: The organization is straightforward, easy to understand and implement for read-heavy applications.

Disadvantages

Slow Insertions and Deletions: Inserting or deleting a record requires shifting many records to maintain sorted order, which is time-consuming.
Not Suitable for Frequent Updates: Systems with lots of inserts, deletes, or updates can suffer performance degradation.
Inefficient for Random Access: While binary search helps, accessing random records still involves more overhead compared to hashed files.

Hash file organization

Hash file organization stores records based on a hash function that computes the address (or bucket) where each record will be placed. This allows direct access to records using their key. Instead of scanning sequentially, the hash function determines exactly where to find the record on disk.

Advantages:

Constant time access (O(1)) for exact key lookups.
Efficient for large datasets.
Minimal overhead compared to sequential search.

Disadvantages:

Collisions can occur (different keys hash to same address).
Collision handling methods add complexity (e.g., chaining, open addressing).
Not suitable for range queries since records aren’t stored in sorted order.
Performance depends on quality of hash function and load factor.

Hash File Organization

Hash File Organization uses the computation of hash function on some fields of the records. The hash function's output determines the location of disk block where the records are to be placed.

If a record has to be received using the hash key, then the address is generated, and the whole record is retrieved using that address. If a new record has to be inserted, then the address is generated using the hash key and record is directly inserted. The same process is applied in the case of delete and update.

In this method, there is no effort for searching and sorting the entire file. In this method, each record will be stored randomly in the memory.

Clustered file organization

Clustered file organization stores related records from one or more tables physically close together on disk, typically grouped by a common attribute or key. The goal is to improve the efficiency of queries that access related records, especially joins or grouped data retrieval.

Suppose you have two tables: Departments and Employees. Employees are clustered by Department_ID. Records of employees belonging to the same department are stored close together on disk. Queries like "Get all employees in Department 10" are very fast since related records are physically near each other.

Advantages:

Improves join and group-by query performance by storing related data close.
Reduces the number of disk accesses.

Disadvantages:

Complex to maintain and update.
Not suitable for all data types or workloads.

Indexed File Organization

Indexed File Organization is a method where a separate data structure called an index is maintained to quickly locate records in the data file, based on a key field (like ID, name, etc.). The index stores pointers (or addresses) to the actual records, allowing fast data retrieval without scanning the whole file.

Types of Indexes:

Primary Index:
Based on the primary key of the table; one index entry per record.
Secondary Index:
Based on non-primary (non-unique) fields.
Clustered Index:
Records are stored in the same order as the index.
Non-Clustered Index:
Index points to data stored in a different order.

Advantages:

Very fast searching (especially for large files).
Efficient for range queries, sorting, and joining.
Supports both exact-match and partial-match lookups.

Disadvantages:

Extra storage space needed for index files.
Index must be updated on every insert/delete/update.

B+ Tree is a type of Indexed File Organization in DBMS. A B+ tree is a balanced binary search tree that follows a multi-level index format. The leaf nodes of a B+ tree denote actual data pointers. B+ tree ensures that all leaf nodes remain at the same height, thus balanced. Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree can support random access as well as sequential access.

B+ Trees are a standard implementation of indexed file organization in DBMS — especially for primary and secondary indexes in large databases like those used in MySQL, Oracle, and PostgreSQL.

Multikey file organization is a method where a single data file is organized using more than one key (attribute), allowing efficient access based on multiple search criteria (e.g., by ID, Name, Date, etc.). Unlike single-key file organization, multikey files use multiple indexes or inverted indexes to support searching on several fields.

Advantages:

Allows fast searches on multiple attributes, not just the primary key.
Good for applications where users search by various fields.

Disadvantages:

Requires extra storage for multiple indexes.
Updates are expensive — changing a record means updating all associated indexes.

Two approaches for providing additional access paths into a file of data records.

1. Multilist file organisation

2. Inverted file organization

1. Multi-list file organization

Multi-list file organization is a method where a file is organized using multiple linked lists, each based on a different key. This structure allows efficient access to records using more than one field. Each list connects records using pointers according to a particular attribute. A record can appear in multiple lists (i.e., be linked through more than one key). It is useful when you need to search or process data using multiple logical paths.

Advantages:

Efficient for group-based queries, like "List all employees in HR" or "All working on Project Alpha".

Avoids duplication — one copy of the record, used in multiple lists.

Improves access flexibility by supporting multiple logical views of the same data.

Disadvantages:

Complex structure — managing multiple pointers per record.

Insertion/deletion requires updating multiple linked lists.

Not suitable for all applications, especially those that don’t require multi-key access.

2. Inverted file organization is a data storage method in which a separate index (inverted index) is created for each field (attribute) of the records in a file. These indexes map attribute values to the locations (addresses or record IDs) of the records containing those values.

Inverted file organization stores the actual data separately and maintains indexes for one or more fields, allowing quick access to records based on non-primary attributes.

Advantages:

Very fast searches on any indexed field.

Supports multi-attribute queries (e.g., "Find all Alices in CS").

Ideal for text-based or keyword searches (e.g., search engines).

Disadvantages:

Requires more storage space for multiple indexes.

Insert, delete, update operations are slower, because all relevant indexes must be updated.

Complexity increases with the number of indexed attributes.

Posts

Sunday, 15 December 2019

File Organization and its types

No comments:

Post a Comment

Updates