查找最近活动

来自 PostgreSQL Wiki
跳转到导航跳转到搜索

性能片段

查找最近活动

适用于 PostgreSQL

>=8.4

SQL

依赖于


偶尔会看到一个请求:给定一个表示更改的大型表(无论是错误评论、股票行情价格,还是任何其他内容),返回最近活动的一些受影响对象及其最新的更改。

显然,如果您正在跟踪每个对象的最后更改时间,那么有一些方法可以快速完成此操作,但代价是每次在更改表上插入数据时都要对对象行进行非 HOT 更新。本页介绍了如何从更改表本身高效地(以毫秒为单位)获取最近的更改。

这个具体的例子是为以下需求而写的:“给定一个(大型)表 prices(stock,price,updated),找出最近更新的 10 个股票中每个股票的 3 个最新价格”。假设存在适当的索引(特别是 (updated) 和 (stock,updated))。

此版本适用于 9.3 及更高版本

with recursive
  t1 as ( (select stock, array[stock] as seen
             from prices
            order by updated desc limit 1)
          union all
          (select p.stock, t1.seen || p.stock
             from t1,
                  LATERAL (select p1.stock from prices p1
                            where p1.stock <> all (t1.seen)
                            order by p1.updated desc limit 1) as p
            where array_upper(t1.seen,1) < 10) )
select p.*
  from t1,
       LATERAL (select * from prices p2
                 where p2.stock=t1.stock
                 order by updated desc limit 3) p;

这是为 8.4 及更高版本编写的原始版本

with recursive
  t1 as ( (select *, array[stock] as seen
             from prices
            order by updated desc limit 1)
          union all 
          (select (p).*, s.seen || (p).stock
             from (select (select p from prices p
                            where p.stock <> all (t.seen)
                            order by p.updated desc limit 1) as p,
                          t1.seen
                     from t1
                    where array_upper(t1.seen,1) < 10 offset 0) s
          )
        ),
  t2 as ( select stock, price, updated, 1 as n
            from t1
          union all
          (select (p).stock, (p).price, (p).updated, s.n+1
             from (select (select p from prices p
                            where p.stock=t2.stock and p.updated < t2.updated
                            order by p.updated desc limit 1) as p,
                          t2.n
                     from t2
                    where t2.n < 3 offset 0) s
          )
        )
select * from t2;

这里的想法是利用递归以最有效的方式获取单行,然后在达到所需数量时停止。假设源表比获取的行数大得多,而且不同股票的数量也比获取的数量大得多,或者至少有 10 种不同股票在表最小的部分有更新。

相当丑陋的子查询用法被证明是必要的,以防止在每个递归分支的最后迭代中完全扫描 prices 表。

以下变体适用于“返回每个主题的 5 个最新帖子”查询(但在 9.3+ 中,有一个更简单的 LATERAL 解决方案,无需递归)

with recursive
  rp as (select (p).*, 1 as rcount
           from (select (select p from posts p
                          where p.topic_id=t.topic_id
                          order by p.post_ts desc, p.post_id desc limit 1) as p
                   from topics t offset 0) s
                  where (p).post_id is not null
         union all
         select (p).*, s.rcount + 1
           from (select (select p from posts p
                          where p.topic_id=rp.topic_id
                            and (p.post_ts,p.post_id) < (rp.post_ts,rp.post_id)
                          order by p.post_ts desc, p.post_id desc limit 1) as p,
                        rp.rcount
                   from rp
                  where rp.rcount < 5 offset 0) s
          where (p).post_id is not null)
select * from rp;