ORDER BY RAND() 和大表的问题

Problems with ORDER BY RAND() and big tables(ORDER BY RAND() 和大表的问题)
本文介绍了ORDER BY RAND() 和大表的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

你好我今天早上问了一个问题,我意识到问题不是我在找的地方(这里是原始问题)

Hello I asked a question this morning, and I realized that the problem was not where I was looking (here the original question)

我有这个查询可以从地址簿中随机生成注册表.

I have this query to randomly generate registries from an address book.

SELECT * FROM address_book ab 
            WHERE 
            ab.source = "PB" AND 
            ab.city_id = :city_id AND 
            pb_campaign_id = :pb_campaign_id AND 
            ab.id NOT IN (SELECT address_book_id FROM calls WHERE calls.address_book_id = ab.id AND calls.status_id IN ("C","NO") OR (calls.status_id IN ("NR","OC") AND TIMESTAMPDIFF(MINUTE,calls.updated_at,NOW()) < 30))
            ORDER BY RAND()
            LIMIT 1';

但我注意到按 rand () 排序";花费超过 50 秒并使用高达 25-50% 的 CPU 和大表(100k +)所以我在这里寻找解决方案,但我没有找到任何有效的方法.注意:ids 不是自增的,可能会有差距

but I noticed that "order by rand ()" take more than 50s and use up to 25-50% CPU with large tables (100k +) so i looked for solutions here but i didn't find anything that worked. note: ids are not self-incrementing, there may be gaps

有什么想法吗?

推荐答案

我建议这样写:

SELECT *
FROM address_book ab 
WHERE ab.source = 'PB' AND 
      ab.city_id = :city_id AND 
      pb_campaign_id = :pb_campaign_id AND 
      NOT EXISTS (SELECT 1
                  FROM calls c
                  WHERE c.address_book_id = ab.id AND
                        ( c.status_id IN ('C', 'NO') OR
                         (c.status_id IN ('NR', 'OC') AND c.updated < now() - interval 30 minute)
                        ) 
                )

ORDER BY RAND()
LIMIT 1;

请注意,这会更改相关子查询中的逻辑,因此 c.address_book_id = ab.id 始终适用.我怀疑这是性能问题.

Note that this changes the logic in the correlated subquery so c.address_book_id = ab.id always applies. I suspect that is the issue with performance.

然后,在以下位置创建索引:

Then, create indexes on:

  • address_book(source, city_id, campaign_id, id)
  • 调用(address_book_id、status_id、更新)

我猜这足以提高性能.如果碰巧有无数行符合条件,那么 order by rand() 可能是个问题.

I am guessing that this will be sufficient to improve performance. If there happen to be a zillion rows that match the conditions, then the order by rand() might be an issue.

这篇关于ORDER BY RAND() 和大表的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Execute complex raw SQL query in EF6(在EF6中执行复杂的原始SQL查询)
Hibernate reactive No Vert.x context active in aws rds(AWS RDS中的休眠反应性非Vert.x上下文处于活动状态)
Bulk insert with mysql2 and NodeJs throws 500(使用mysql2和NodeJS的大容量插入抛出500)
Flask + PyMySQL giving error no attribute #39;settimeout#39;(FlASK+PyMySQL给出错误,没有属性#39;setTimeout#39;)
auto_increment column for a group of rows?(一组行的AUTO_INCREMENT列?)
Sort by ID DESC(按ID代码排序)