Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misinterpretation of pg_index.indkey column results in overestimate of index bloat #8

Open
bonesmoses opened this issue Mar 16, 2018 · 0 comments

Comments

@bonesmoses
Copy link

bonesmoses commented Mar 16, 2018

In index_item_sizes portion of the index bloat query, indexes with multiple columns do not retain all attributes from the btree_index_atts subquery. This is due to a misunderstanding of the contents of the pg_index.indkey column.

From the documentation:

This is an array of indnatts values that indicate which table columns this index indexes. For example a value of 1 3 would mean that the first and the third table columns make up the index key. A zero in this array indicates that the corresponding index attribute is an expression over the table columns, rather than a simple column reference.

Thus if columns 2 and 3 are indexed, indkey will reflect '2 3'. However, in pg_attribute, the index attributes will be 1 and 2. Thus joining the two tables will result in only one attribute result (2==2), and it will be incorrect (table col2 != index col2).

There are generally two ways to fix this:

  1. Alter the JOIN to pg_attribute to use indrelid instead, and subsequently modify the JOIN to pg_stats similarly.
  2. Replace the regexp_split_to_table call with generate_series bounded by pg_index.indnatts. If an index has 3 columns, the attributes in pg_attribute will be listed as 1, 2, and 3. This means the indkey split isn't strictly necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant