Database Indexing

Database is divided into logical blocks and each block contains data.

CPU works along with RAM.

One by one Block loads into the RAM and CPU reads the block if the record is not found then another block will load into the memory.

If we need to store 10000 records and 1 block can store 100 records.

No. of blocks required = 10000/100->100 blocks.

Indexing basically reduces the I/O cost means it reduces the number of blocks that get loaded in the RAM.

Block Size -> 1000 Bytes Record Size -> 250 Bytes

Total No of records -> 10000 Records in a block -> 1000/250-> 4 records

Total No of blocks required -> 10000/4 -> 2500

Suppose it is taking 1 ms to read 1 block.

In best case it will take -> 1 ms

In worst case, it will be the last record -> 2500 ms

Average Case -> 2500/2 = 1250 (N/2)

If the data is sorted then we can implement binary search. Time Complexity will be log N

Log2 2500 -> 12

The time complexity is very much reduced with sorted data.

But with indexing it will reduce more.

In case of indexing first block of index load into the RAM. It contains Key(ID, name, address, phone, etc.) and Pointer.

So it will reduce the time complexity.

Sparse Vs Dense Index

One quality that database indexes can have is that they can be dense or sparse. Each of these index qualities come with their own trade-offs. Let's look at how each index type would work using the following user data.

This is a pretty basic table with four columns. For this example, the table is spread across four pages that contain four rows each. We'll demonstrate an index on the first_name column. Let's look at the dense index first:

We can see that the index has an entry for every first name in the table. If we want to look up a user with the first name "Rachelle," then we perform a binary search on the index and read the location of the data. In contrast, a sparse index only has entries for some of the table rows.

We can see that our sparse index only has four entries (one for each page). Now, if we want to find the row for "Rachelle," we can perform a binary search on our index to find that it falls between "Loyd" and "Shepherd." After discovering those bounds, we go to the page starting with "Loyd" and begin scanning for Rachelle's row. Notice that the data is now sorted on the right side for this example. This sorting is a limitation of the sparse index. A sparse index requires ordered data; otherwise, the scanning step would be impossible.

Dense indexes require more maintenance than sparse indexes at write-time. Since every row must have an entry, the database must maintain the index on inserts, updates, and deletes. Having an entry for every row also means that dense indexes will require more memory. The benefit of a dense index is that values can be quickly found with just a binary search. Dense indexes also do not impose any ordering requirements on the data.

Sparse indexes require less maintenance than dense indexes at write-time since they only contain a subset of the values. This lighter maintenance burden means that inserts, updates, and deletes will be faster. Having fewer entries also means that the index will use less memory. Finding data is slower since a scan across the page typically follows the binary search. Sparse indexes are also only an option when working with ordered data.

Multicolumn Indexes

Multicolumn indexes (also known as composite indexes) are similar to standard indexes. They both store a sorted “table” of pointers to the main table. Multicolumn indexes however can store additional sorted pointers to other columns.

Standard indexes on a column can lead to substantial decreases in query execution times as shown in this article on optimizing queries. Multi-column indexes can achieve even greater decreases in query time due to its ability to move through the data quicker.

Syntax

CREATE INDEX [index name] ON [Table name]([column1, column2, column3,...]);

Multicolumn indexes are indexes that store data on up to 32 columns. When creating a multicolumn index, the column order is very important. This is due to the structure that multicolumn indexes possess. Multicolumn indexes are structured to have a hierarchical structure. Take for example this table:

A traditional index on this table would look like this:

The index points back to the table and is sorted by year. Adding a second column to the index looks like this:

Now the index has pointers to a secondary reference table that is sorted by make. Adding a third column to the index causes the index to look like this:

In a three column index we can see that the main index stores pointers to both the original table and the reference table on make, which in turn has pointers to the reference table on model.

When the multicolumn index is accessed, the main portion of the index (the index on the first column) is accessed first. Each entry in the main index has a reference to the row‘s location in the main table. The main index also has a pointer to the secondary index where the related make is stored. The secondary index in term has a pointer to the tertiary index. Because of this pointer ordering, in order to access the secondary index, it has to be done through the main index. This means that this multicolumn index can be used for queries that filter by just year, year and make, or year, make, and model. However, the multicolumn index cannot be used for queries just on the make or model of the car because the pointers are inaccessible.

Note

Column order is very important. The second column in a multicolumn index can never be accessed without accessing the first column as well.

Search This Blog

InterviewReady

Database Indexing

Comments

Post a Comment

Popular posts from this blog

Java 8 : Find the number starts with 1 from a list of integers

Important Linux Commands

Junit Mockito and Power Mockito