问题描述
我有一些表单的论坛数据
I have some forum data of the form
post(author, thread_id, text)
post(author, thread_id, text)
对于每个作者,我想选择与该作者相关联的 10 个不同的 thread_ids(可能超过 10 个,数量会因作者而异).
For each author, I would like to select 10 distinct thread_ids associated with that author (there may be more than 10, and the number will vary by author).
我正在考虑使用 GROUP BY 对作者"进行分组,但我无法理解如何表达每个组的 LIMIT,以及如何将每个组扩展回 10 行.
I'm thinking of using GROUP BY to group on 'author', but I cannot understand how to express the LIMIT on each group, and how to expand each group back into 10 rows.
推荐答案
这是每组前 N 个"类型查询的解决方案.
Here's a solution to "top N per group" type queries.
请注意,您必须为给定的作者选择哪些 10 个主题.对于此示例,我假设您需要最新的线程(并且 thread_id 是一个自动递增值),并且对于绑定的情况,您有一个主键 posts.post_id
.
Note that you have to choose which 10 threads for a given author you want. For this example, I'm assuming you want the most recent threads (and thread_id is an auto-increment value), and for cases of ties, you have a primary key posts.post_id
.
SELECT p1.*
FROM post p1 LEFT OUTER JOIN post p2
ON (p1.author = p2.author AND (p1.thread_id < p2.thread_id
OR p1.thread_id = p2.thread_id AND p1.post_id < p2.post_id))
GROUP BY p1.author
HAVING COUNT(*) < 10;
<小时>
在评论中回复您的后续问题,解释如下:
Re your follow-up question in the comment, here's the explanation:
在每个作者的前 10 个线程中,我们可以说对于这些线程中的每一个,该作者有 9 个或更少的其他线程属于结果集.因此,对于每个作者的帖子 (p1),我们计算来自同一作者的帖子 (p2) 有多少个帖子具有更大的线程.如果该计数小于 10,则该作者的帖子 (p1) 属于结果.
In the top 10 threads per author, we can say that for each of these, there are 9 or fewer other threads for that author belonging to the result set. So for each author's post (p1), we count how many posts (p2) from the same author have a greater thread. If that count is less than 10, then that author's post (p1) belongs in the result.
我添加了一个术语来解决与 post_id 的关系.
I added a term to resolve ties with the post_id.
这篇关于如何在 mysql 查询中从每个组中选择多个项目?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!