SQL - 使用 ID NOT IN 的高级重复删除

SQL - Advanced duplicates removal using ID NOT IN(SQL - 使用 ID NOT IN 的高级重复删除)
本文介绍了SQL - 使用 ID NOT IN 的高级重复删除的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在我的表中发现了需要删除的重复项.该表包含:

I found duplicates in my table which needed to be removed. The table contains:

  1. ID - 表的唯一键
  2. STUDENT_ID - 学生的 ID
  3. SUBJECT_ID - 学生的科目
  4. CLASS_ID - 班级学生在
  5. XP_LVL - 专业水平

一个学生应该只有一个科目、班级和 XP_lvl 的记录.在这种情况下,删除重复项是基于删除所有但保留一个.

One student should have only one record of subject, class and XP_lvl. In this case the removal of duplicities is based on delete all but keep one.

在我的情况下,重复看起来像这样:

In my case duplicates looks like this:

<头>
IDSTUDENT_IDSUBJECT_IDCLASS_IDEXPERTISE_LVL
11AAA55FFECLASS8082
21AAA55FFECLASS8082
32AAB49BBCLASS8903
42AAB49BBCLASS8903
52AAB49BBCLASS8904
62AAB49BBCLASS8903

我通过创建 (CONCAT(STUDENT_ID, CONCAT(SUBJECT_ID, CLASS_ID))) 的唯一 ID,然后通过 count.. >1 识别出所有重复项> 工作正常.

I have identified all the duplicates by creating unique ID of (CONCAT(STUDENT_ID, CONCAT(SUBJECT_ID, CLASS_ID))) and then by having count.. >1 which works fine.

现在我需要识别所有 ID,以便我可以在从查询中删除时使用 ID NOT IN (SELECT...).

Now I need to identify all the ID so I can use ID NOT IN (SELECT...) in my delete from query.

所以我这样做了..

AND ID NOT IN (SELECT UID FROM (
SELECT
    min(ID) AS UID,
    STUDENT_ID,
    SUBJECT_ID,
    CLASS_ID    
FROM
    my_table 
GROUP BY
    STUDENT_ID,
    SUBJECT_ID,
    CLASS_ID    
HAVING
    count(CONCAT(STUDENT_ID, CONCAT(SUBJECT_ID, CLASS_ID))) > 1))

但是,我不能使用 min/max(ID) 来选择要保留的 ID,因为正如您所见,对于学生 2AAB,存在具有不同 XP_LVL 的重复项.

However I cannot use min/max(ID) to choose which ID to keep because as you can see for student 2AAB there are duplicities with different XP_LVL.

在这种情况下,我需要选择最高 XP_LVL 的 ID 来保留和删除所有其他的.

In this case I need to select ID of highest XP_LVL to keep and delete all other.

我尝试使用 RANK、ROWNUM 不同的排序和子选择的负载,但没有想要的结果.

I tried using RANK, ROWNUM different ordering and loaaads of subselects but without desired results.

有谁知道如何有效地做到这一点?我们正在谈论 6k 重复,所以我不能一一做.提前感谢您的帮助.

Does anyone have any idea how to do it effectively ? We are talking about 6k duplicities so I cant do it one by one. Thank you for your help in advance.

推荐答案

您可以使用 ROW_NUMBER 解析函数并使用 ROWID 伪列关联删除:

You can use ROW_NUMBER analytic function and correlate the deletion using the ROWID pseudo-column:

DELETE FROM my_table
WHERE ROWID IN (
  SELECT ROWID
  FROM   (
    SELECT ROW_NUMBER() OVER (
             PARTITION BY student_id, subject_id, class_id
             ORDER BY expertise_lvl DESC
           ) AS rn
    FROM   my_table
  )
  WHERE  rn > 1
)

db<>fiddle 这里

这篇关于SQL - 使用 ID NOT IN 的高级重复删除的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Execute complex raw SQL query in EF6(在EF6中执行复杂的原始SQL查询)
Hibernate reactive No Vert.x context active in aws rds(AWS RDS中的休眠反应性非Vert.x上下文处于活动状态)
Bulk insert with mysql2 and NodeJs throws 500(使用mysql2和NodeJS的大容量插入抛出500)
Flask + PyMySQL giving error no attribute #39;settimeout#39;(FlASK+PyMySQL给出错误,没有属性#39;setTimeout#39;)
auto_increment column for a group of rows?(一组行的AUTO_INCREMENT列?)
Sort by ID DESC(按ID代码排序)