问题描述
我对具有多个连接的大量大表(行和列)进行了查询,但是其中一个表有一些重复的数据行,导致我的查询出现问题.由于这是来自另一个部门的只读实时提要,我无法修复该数据,但我正在努力防止我的查询出现问题.
I have a query against a large number of big tables (rows and columns) with a number of joins, however one of tables has some duplicate rows of data causing issues for my query. Since this is a read only realtime feed from another department I can't fix that data, however I am trying to prevent issues in my query from it.
鉴于此,我需要将此垃圾数据作为左连接添加到我的良好查询中.数据集如下所示:
Given that, I need to add this crap data as a left join to my good query. The data set looks like:
IDNo FirstName LastName ...
-------------------------------------------
uqx bob smith
abc john willis
ABC john willis
aBc john willis
WTF jeff bridges
sss bill doe
ere sally abby
wtf jeff bridges
...
(大约 2 打列和 100K 行)
(about 2 dozen columns, and 100K rows)
我的第一直觉是执行一个给我大约 80K 行的不同操作:
My first instinct was to perform a distinct gave me about 80K rows:
SELECT DISTINCT P.IDNo
FROM people P
但是当我尝试以下操作时,我会返回所有行:
But when I try the following, I get all the rows back:
SELECT DISTINCT P.*
FROM people P
或
SELECT
DISTINCT(P.IDNo) AS IDNoUnq
,P.FirstName
,P.LastName
...etc.
FROM people P
然后我想我会在所有列上做一个 FIRST() 聚合函数,但这也感觉不对.我在语法上做错了吗?
I then thought I would do a FIRST() aggregate function on all the columns, however that feels wrong too. Syntactically am I doing something wrong here?
更新:只是想注意:这些记录是基于上面列出的 ID 的非键/非索引字段的重复项.ID 是一个文本字段,虽然具有相同的值,但与导致问题的其他数据的情况不同.
Update: Just wanted to note: These records are duplicates based on a non-key / non-indexed field of ID listed above. The ID is a text field which although has the same value, it is a different case than the other data causing the issue.
推荐答案
事实证明我做错了,我需要先对重要列执行嵌套选择,然后再执行不同的选择以防止垃圾列破坏我的好数据的独特"数据.以下似乎已经解决了这个问题……但我稍后会尝试使用完整的数据集.
Turns out I was doing it wrong, I needed to perform a nested select first of just the important columns, and do a distinct select off that to prevent trash columns of 'unique' data from corrupting my good data. The following appears to have resolved the issue... but I will try on the full dataset later.
SELECT DISTINCT P2.*
FROM (
SELECT
IDNo
, FirstName
, LastName
FROM people P
) P2
这里是一些请求的播放数据:http://sqlfiddle.com/#!3/050e0d/3
Here is some play data as requested: http://sqlfiddle.com/#!3/050e0d/3
CREATE TABLE people
(
[entry] int
, [IDNo] varchar(3)
, [FirstName] varchar(5)
, [LastName] varchar(7)
);
INSERT INTO people
(entry,[IDNo], [FirstName], [LastName])
VALUES
(1,'uqx', 'bob', 'smith'),
(2,'abc', 'john', 'willis'),
(3,'ABC', 'john', 'willis'),
(4,'aBc', 'john', 'willis'),
(5,'WTF', 'jeff', 'bridges'),
(6,'Sss', 'bill', 'doe'),
(7,'sSs', 'bill', 'doe'),
(8,'ssS', 'bill', 'doe'),
(9,'ere', 'sally', 'abby'),
(10,'wtf', 'jeff', 'bridges')
;
这篇关于SQL Left Join 仅第一次匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!