Misinterpretation of pg_index.indkey column results in overestimate of index bloat #8

bonesmoses · 2018-03-16T15:35:14Z

In index_item_sizes portion of the index bloat query, indexes with multiple columns do not retain all attributes from the btree_index_atts subquery. This is due to a misunderstanding of the contents of the pg_index.indkey column.

From the documentation:

This is an array of indnatts values that indicate which table columns this index indexes. For example a value of 1 3 would mean that the first and the third table columns make up the index key. A zero in this array indicates that the corresponding index attribute is an expression over the table columns, rather than a simple column reference.

Thus if columns 2 and 3 are indexed, indkey will reflect '2 3'. However, in pg_attribute, the index attributes will be 1 and 2. Thus joining the two tables will result in only one attribute result (2==2), and it will be incorrect (table col2 != index col2).

There are generally two ways to fix this:

Alter the JOIN to pg_attribute to use indrelid instead, and subsequently modify the JOIN to pg_stats similarly.
Replace the regexp_split_to_table call with generate_series bounded by pg_index.indnatts. If an index has 3 columns, the attributes in pg_attribute will be listed as 1, 2, and 3. This means the indkey split isn't strictly necessary.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misinterpretation of pg_index.indkey column results in overestimate of index bloat #8

Misinterpretation of pg_index.indkey column results in overestimate of index bloat #8

bonesmoses commented Mar 16, 2018 •

edited

Loading

Misinterpretation of pg_index.indkey column results in overestimate of index bloat #8

Misinterpretation of pg_index.indkey column results in overestimate of index bloat #8

Comments

bonesmoses commented Mar 16, 2018 • edited Loading

bonesmoses commented Mar 16, 2018 •

edited

Loading