索引维护

来自 PostgreSQL wiki
跳转到导航跳转到搜索


总有一天,您可能需要处理例行重新索引数据库,特别是如果您没有足够积极地使用 VACUUM。此方面的特别方便的命令是CLUSTER,它可以帮助进行其他类型的清理。

在 8.4 及更早版本中,避免使用VACUUM FULL

索引摘要

以下是一个示例查询,用于提取每个表的行数、索引数以及有关这些索引的一些信息。

性能片段

索引摘要

适用于 PostgreSQL

>=9.4

SQL

取决于

SELECT
    pg_class.relname,
    pg_size_pretty(pg_class.reltuples::bigint)            AS rows_in_bytes,
    pg_class.reltuples                                    AS num_rows,
    COUNT(*)                                              AS total_indexes,
    COUNT(*) FILTER ( WHERE indisunique)                  AS unique_indexes,
    COUNT(*) FILTER ( WHERE indnatts = 1 )                AS single_column_indexes,
    COUNT(*) FILTER ( WHERE indnatts IS DISTINCT FROM 1 ) AS multi_column_indexes
FROM
    pg_namespace
    LEFT JOIN pg_class ON pg_namespace.oid = pg_class.relnamespace
    LEFT JOIN pg_index ON pg_class.oid = pg_index.indrelid
WHERE
    pg_namespace.nspname = 'public' AND
    pg_class.relkind = 'r'
GROUP BY pg_class.relname, pg_class.reltuples
ORDER BY pg_class.reltuples DESC;

索引大小/使用统计信息

表和索引大小,以及哪些索引正在被扫描以及获取了多少元组。有关包括表和索引大小的另一个视图,请参阅磁盘使用情况

性能片段

索引统计信息

适用于 PostgreSQL

>=8.1

SQL

取决于

SELECT
    t.schemaname,
    t.tablename,
    c.reltuples::bigint                            AS num_rows,
    pg_size_pretty(pg_relation_size(c.oid))        AS table_size,
    psai.indexrelname                              AS index_name,
    pg_size_pretty(pg_relation_size(i.indexrelid)) AS index_size,
    CASE WHEN i.indisunique THEN 'Y' ELSE 'N' END  AS "unique",
    psai.idx_scan                                  AS number_of_scans,
    psai.idx_tup_read                              AS tuples_read,
    psai.idx_tup_fetch                             AS tuples_fetched
FROM
    pg_tables t
    LEFT JOIN pg_class c ON t.tablename = c.relname
    LEFT JOIN pg_index i ON c.oid = i.indrelid
    LEFT JOIN pg_stat_all_indexes psai ON i.indexrelid = psai.indexrelid
WHERE
    t.schemaname NOT IN ('pg_catalog', 'information_schema')
ORDER BY 1, 2;

重复索引

查找具有相同列集、相同操作类、表达式和谓词的多个索引 - 这使得它们等效。通常可以安全地删除其中一个,但我不能保证。:)

SELECT pg_size_pretty(sum(pg_relation_size(idx))::bigint) as size,
       (array_agg(idx))[1] as idx1, (array_agg(idx))[2] as idx2,
       (array_agg(idx))[3] as idx3, (array_agg(idx))[4] as idx4
FROM (
    SELECT indexrelid::regclass as idx, (indrelid::text ||E'\n'|| indclass::text ||E'\n'|| indkey::text ||E'\n'||
                                         coalesce(indexprs::text,'')||E'\n' || coalesce(indpred::text,'')) as key
    FROM pg_index) sub
GROUP BY key HAVING count(*)>1
ORDER BY sum(pg_relation_size(idx)) DESC;

索引膨胀

基于 check_postgres

REINDEX 的常见需求之一是当索引由于稀疏删除或使用 VACUUM FULL(使用 9.0 之前的版本)而膨胀时。check_postgres 脚本中已包含一个用于估计表中膨胀量的估算器,您可以直接调用它或将其纳入更大的监控系统。基于此代码和/或来自其他来源的概念的脚本包括

新查询

一个新的查询已经创建,以便对 Btree 索引有一个更好的膨胀估计。与来自 check_postgres 的查询不同,此查询仅关注 BTree 索引及其磁盘布局。

有关它的更多信息,请参阅文章

监控脚本check_pgactivity包含一个基于此工作的检查。

总结 B-Tree 索引的键空间

性能片段

显示数据库膨胀

适用于 PostgreSQL

>=9.3

SQL

取决于

contrib/pageinspect


以下查询使用 contrib/pageinspect 来快速总结 B-Tree 的键空间。这对于希望准确确定索引何时何地变得不平衡的专家很有用。它可视化了索引的键空间。

该查询输出每个页面的最高键,从根开始,按逻辑/键空间顺序向下工作。

如果查询执行时间过长,请考虑取消注释“/* and level > 0 */”以使其仅包含内部页面。

另请参阅:“PostgreSQL 索引膨胀显微镜”博客文章

注意:您需要将“pgbench_accounts_pkey”更改为要总结的索引的名称。

WITH RECURSIVE index_details AS (
  SELECT
    'pgbench_accounts_pkey'::text idx
),
size_in_pages_index AS (
  SELECT
    (pg_relation_size(idx::regclass) / (2^13))::int4 size_pages
  FROM
    index_details
),
page_stats AS (
  SELECT
    index_details.*,
    stats.*
  FROM
    index_details,
    size_in_pages_index,
    lateral (SELECT i FROM generate_series(1, size_pages - 1) i) series,
    lateral (SELECT * FROM bt_page_stats(idx, i)) stats
),
meta_stats AS (
  SELECT
    *
  FROM
    index_details s,
    lateral (SELECT * FROM bt_metap(s.idx)) meta
),
pages_raw AS (
  SELECT
    *
  FROM
    page_stats
  ORDER BY
    btpo DESC
),
/* XXX: Note ordering dependency within this CTE */
pages_walk(item, blk, level) AS (
  SELECT
    1,
    blkno,
    btpo
  FROM
    pages_raw
  WHERE
    btpo_prev = 0
    AND btpo = (SELECT level FROM meta_stats)
  UNION
  SELECT
    CASE WHEN level = btpo THEN w.item + 1 ELSE 1 END,
    blkno,
    btpo
  FROM
    pages_raw i,
    pages_walk w
  WHERE
    i.btpo_prev = w.blk OR (btpo_prev = 0 AND btpo = w.level - 1)
)
SELECT
  /* Uncomment if these details interesting */
  /*
  idx,
  btpo_prev,
  btpo_next,
  */

  /*
   * "level" is level of tree -- 0 is leaf.  First tuple returned is root.
   */
  btpo AS level,

  /*
   * Ordinal number of item on this level
   */
  item AS l_item,

  /*
   * Block number, and details of page
   */
  blkno,
  btpo_flags,
  TYPE,
  live_items,
  dead_items,
  avg_item_size,
  page_size,
  free_size,

  /*
   * distinct_real_item_keys is how many distinct "data" fields on page
   * (excludes highkey).
   *
   * If this is less than distinct_block_pointers on an internal page, that
   * means that there are so many duplicates in its children that there are
   * duplicate high keys in children, so the index is probably pretty bloated.
   *
   * Even unique indexes can have duplicates.  It's sometimes interesting to
   * watch out for how many distinct real items there are within leaf pages,
   * compared to the number of live items, or total number of items.  Ideally,
   * these will all be exactly the same for unique indexes.
   */
  distinct_real_item_keys,

  /*
   * Per pageinspect docs, first item on non-rightmost page on level is "high
   * key" item, which represents an upper bound on items on the page.
   * (Rightmost pages are sometimes considered to have a conceptual "positive
   * infinity" item, and are shown to have a high key that's NULL by this query)
   *
   * This can be used to visualize how finely or coarsely separated the
   * keyspace is.
   *
   * Note that below int4_from_page_data() function could produce more useful
   * visualization of split points.
   */
  CASE WHEN btpo_next != 0 THEN first_item END AS highkey,

  /*
   * distinct_block_pointers is table blocks that are pointed to by items on
   * the page (not including high key, which doesn't point anywhere).
   *
   * This is interesting on leaf pages, because it indicates how fragmented the
   * index is with respect to table accesses, which is important for range
   * queries.
   *
   * This should be redundant on internal levels, because all downlinks in internal
   * pages point to distinct blocks in level below.
   */
  distinct_block_pointers

FROM
  pages_walk w,
  pages_raw i,
  lateral (
    SELECT
    count(distinct (case when btpo_next = 0 or itemoffset > 1 then (data collate "C") end)) as distinct_real_item_keys,
    count(distinct (case when btpo_next = 0 or itemoffset > 1 then (ctid::text::point)[0]::bigint end)) as distinct_block_pointers,
    (array_agg(data))[1] as first_item
    FROM bt_page_items(idx, blkno)
  ) items
  where w.blk = i.blkno
  /* Uncomment to avoid showing leaf level (faster): */
  /* and level > 0*/
ORDER BY btpo DESC, item;

将 bt_page_items() 的“数据”字段解释为小端 int4 属性

性能片段

显示数据库膨胀

适用于 PostgreSQL

>=9.2

SQL

取决于

contrib/pageinspect


以下方便函数可用于将 bt_page_items() 中的“数据”字段显示为其本机类型,至少对于其 pg_attribute 条目包含单个 int4/integer 属性的索引。这包括 SERIAL 主键索引。它可以用于使上面的“总结 B-Tree 索引的键空间”查询使用本机类型表示显示键空间分割点。

注意:字节交换仅在小端(Intel)CPU 上需要。

--
-- Sources:
--
-- https://stackoverflow.com/questions/17208945/whats-the-easiest-way-to-represent-a-bytea-as-a-single-integer-in-postgresql
-- https://stackoverflow.com/questions/11142235/convert-bigint-to-bytea-but-swap-the-byte-order
--
create or replace function reverse_bytes_iter(bytes bytea, length int, midpoint int, index int)
returns bytea as
$$
  select case when index >= midpoint then bytes else
    reverse_bytes_iter(
      set_byte(
        set_byte(bytes, index, get_byte(bytes, length-index)),
        length-index, get_byte(bytes, index)
      ),
      length, midpoint, index + 1
    )
  end;
$$ language sql immutable strict;

create or replace function reverse_bytes(bytes bytea) returns bytea as
$$
select reverse_bytes_iter(bytes, octet_length(bytes)-1, octet_length(bytes)/2, 0)
$$
language sql immutable strict;

create or replace function int4_from_bytea(bytea) returns int4
as $$
select ('x' || right($1::text, 6))::bit(24)::int;
$$
language sql immutable strict;

create or replace function int4_from_page_data(text) returns int4
as $$
select int4_from_bytea(reverse_bytes(decode($1, 'hex')));
$$
language sql immutable strict;

--
-- Use:
--
--  postgres=# select *, int4_from_page_data(data) from bt_page_items('f', 1) limit 15;
--   itemoffset │    ctid    │ itemlen │ nulls │ vars │          data           │ int4_from_page_data
--  ────────────┼────────────┼─────────┼───────┼──────┼─────────────────────────┼─────────────────────
--            1 │ (17698,69) │      16 │ f     │ f    │ 5c 00 00 00 00 00 00 00 │                  92
--            2 │ (0,1)      │      16 │ f     │ f    │ 01 00 00 00 00 00 00 00 │                   1
--            3 │ (8849,126) │      16 │ f     │ f    │ 01 00 00 00 00 00 00 00 │                   1
--            4 │ (17699,25) │      16 │ f     │ f    │ 01 00 00 00 00 00 00 00 │                   1
--            5 │ (17699,26) │      16 │ f     │ f    │ 01 00 00 00 00 00 00 00 │                   1
--            6 │ (0,2)      │      16 │ f     │ f    │ 02 00 00 00 00 00 00 00 │                   2
--            7 │ (8849,125) │      16 │ f     │ f    │ 02 00 00 00 00 00 00 00 │                   2
--            8 │ (17699,23) │      16 │ f     │ f    │ 02 00 00 00 00 00 00 00 │                   2
--            9 │ (17699,24) │      16 │ f     │ f    │ 02 00 00 00 00 00 00 00 │                   2
--           10 │ (0,3)      │      16 │ f     │ f    │ 03 00 00 00 00 00 00 00 │                   3
--           11 │ (8849,124) │      16 │ f     │ f    │ 03 00 00 00 00 00 00 00 │                   3
--           12 │ (17699,21) │      16 │ f     │ f    │ 03 00 00 00 00 00 00 00 │                   3
--           13 │ (17699,22) │      16 │ f     │ f    │ 03 00 00 00 00 00 00 00 │                   3
--           14 │ (0,4)      │      16 │ f     │ f    │ 04 00 00 00 00 00 00 00 │                   4
--           15 │ (8849,123) │      16 │ f     │ f    │ 04 00 00 00 00 00 00 00 │                   4
--  (15 rows)

未使用的索引

由于索引会对任何表的更改操作增加显著的开销,因此如果它们没有用于查询或约束强制(例如确保值唯一),则应将其删除。如何找到此类索引

参考