Database Indexes Explained

Have you wanted to improve the query performance? Or have you faced a situation where the indexes you added weren’t used? Read further to know about working with Postgres.

Database indexes are redundant data structures that help us to find required data faster. Although indexes can help us find data faster, it is not without cost. Whenever we make changes to any column present in the index we need to update the index. Index also requires additional space.

The main idea with index is that the search complexity of hash/tree - data structure used for index, is better than any linear structure. And as storage required for the whole index is many times smaller than the table, index reduces disk I/O even if the planner needs to scan the whole index to know the resultant row’s location.

Due to how Postgres implements concurrency control, a query needs to access the row to check it’s visibility - even if index contains all the required data.

Creating Indexes

Before we can create an index, we need to know about operator class and index type.

Postgres supports different types of indexes like B-Tree, GIN, GIST, BRIN, Hash, etc. Each index type is optimized for specific use cases.

B-Tree - default index type and it works for most of the cases.
GIN - used with subdividable data types like arrays to search for individual element.
Hash - for unordered columns with almost distinct values.

Operator class defines supported operations of index and how these can utilize index. Operator class are data type specific.

We can create an index by running the following command. We need to specify the index class and operator class for each column if we need to choose any non-default value.

CREATE INDEX ON table_name USING index_class (column1 operator_class1, column2 operator_class2);

We can query all the operation classes for each index and their supported operation using the below query.

SELECT am.amname AS index_method,
       opf.opfname AS opfamily_name,
       amop.amopopr::regoperator AS opfamily_operator
    FROM pg_am am, pg_opfamily opf, pg_amop amop
    WHERE opf.opfmethod = am.oid AND
          amop.amopfamily = opf.oid
    ORDER BY index_method, opfamily_name, opfamily_operator;

We can restrict the rows that an index covers by specifying the where clause while creating the index. This index will only be used when the condition in both index and query match. This makes sense to use this when we know that a particular condition will always be present in the query.

Multi column indexes

From the above create index statement, we know that we can define multiple columns on an index. Some index type like hash does not support multi-column indexes.

How can we decide when to use a single multi-column index or multiple single-column indexes? It depends on the data and queries. If most of the queries on the table have all the columns mentioned in the index, a single multi-column indexes would be more performant. Even though multiple single-column indexes can be used, they require additional operations like a filter or bitmap scan.

A primary key in index points to the page where the row is present. A page may contain multiple rows. Each bit in bitmap represents a page. Bitmap scan is used for query when multiple indexes are present which match parts of query’s condition. Bit is set if row(s) matching the condition is found in that page. Bitmap from multiple index scan are then combined to get a final bitmap, which will be used to identify all the heap pages which the planner needs to check to get the result.

When we have queries that filter on columns only defined in the index, multi-column index will be used directly.

explain select id from names where first_name = 'aaa' and last_name = 'bbb';
                                                    QUERY PLAN
----------------------------------------------------------------------------------------------------
 Index Scan using names_first_name_last_name_idx on names  (cost=0.42..8.44 rows=1 width=4)
   Index Cond: (((first_name)::text = 'aaa'::text) AND ((last_name)::text = 'bbb'::text))
(2 rows)

In case we only have a single column index, the planner will use that index and filter the data on the second column.

explain analyze select id from names where first_name = 'aaa' and last_name = 'bbb';
                                                  QUERY PLAN
----------------------------------------------------------------------------------------------------
 Index Scan using names_last_name_idx on names  (cost=0.42..8.44 rows=1 width=4)
   Index Cond: ((last_name)::text = 'bbb'::text)
   Filter: ((first_name)::text = 'aaa'::text)
(3 rows)

For some index types like B-Tree and GIST, column order in the index matter. You can read more about it here.

Why is my index not used?

The planner only uses index only when it estimates that query will be faster using index. So you should verify if the estimates are accurate. You can check this using by running EXPLAIN ANALYZE query and see if there is no major difference in the estimates and actual rows fetched. If the estimates are off by a large margin, you need to run VACUUM ANALYZE table_name to update the statistics of the table. These statistics are used by the planner to estimates the rows for a query.

If there are no issues with estimates, it could be that we have created indexes incorrectly. Indexes can only be used with certain operations and data types. So how can we identify if an operation is supported by a given index? We can query the Postgres in built tables to get the information about the indexes which support the operations. We can query for index type and operator class which supports comparison operation < on the text at the left and right-hand side of the operator as below.

SELECT am.amname AS index_method,
       opf.opfname AS opfamily_name,
       amop.amopopr::regoperator AS opfamily_operator
  FROM pg_am am,
       pg_opfamily opf,
       pg_amop amop
WHERE opf.opfmethod = am.oid AND amop.amopfamily = opf.oid
    AND amop.amopopr = '<(text,text)'::regoperator;

 index_method |  opfamily_name  | opfamily_operator
--------------+-----------------+-------------------
 btree        | text_ops        | <(text,text)
 spgist       | text_ops        | <(text,text)
 brin         | text_minmax_ops | <(text,text)
(3 rows)

Another thing which decides if a index will be used is the collation. It should be same in both the query and index’s column. Collation is used to define how data (of columns having data type var, text, and char) is sorted and stored in a database - like whether lowercase and uppercase should be treated equally. Collation can be defined at the database, column, and expression level. You can read more about collation here

Data is stored in Postgres as a page and whenever we access the data in page, we need to read the whole page. If there is only one page of data in the table, the planner would not use the index to avoid 2 reads - one on the index and another on the page, because this is more optimal.

I hope that you were able to learn something new. Let me know your thoughts in the comment.

Database Indexes Explained

Creating Indexes

Multi column indexes

Why is my index not used?

Stay Updated