问题描述
用于遍历 1700 万条记录以删除重复项的查询现在已经运行了大约 16 小时,我想知道查询是否正确停止现在它是否会最终确定删除语句,或者它是否在运行此查询时一直在删除?事实上,如果我停止它,它会完成删除还是回滚?
A query that is used to loop through 17 millions records to remove duplicates has been running now for about 16 hours and I wanted to know if the query is stopped right now if it will finalize the delete statements or if it has been deleting while running this query? Indeed, if I do stop it, does it finalize the deletes or rolls back?
我发现当我做一个
select count(*) from myTable
它返回的行数(在执行此查询时)大约比起始行数少 5.显然服务器资源极其贫乏,那么这是否意味着这个过程需要 16 个小时才能找到 5 个重复项(实际上有数千个),而且这可能会运行数天?
That the rows that it returns (while doing this query) is about 5 less than what the starting row count was. Obviously the server resources are extremely poor, so does that mean that this process has taken 16 hours to find 5 duplicates (when there are actually thousands), and this could be running for days?
这个查询在 2000 行测试数据上花费了 6 秒,并且在该组数据上效果很好,所以我认为完整的数据集需要 15 个小时.
This query took 6 seconds on 2000 rows of test data, and it works great on that set of data, so I figured it would take 15 hours for the complete set.
有什么想法吗?
下面是查询:
--Declare the looping variable
DECLARE @LoopVar char(10)
DECLARE
--Set private variables that will be used throughout
@long DECIMAL,
@lat DECIMAL,
@phoneNumber char(10),
@businessname varchar(64),
@winner char(10)
SET @LoopVar = (SELECT MIN(RecordID) FROM MyTable)
WHILE @LoopVar is not null
BEGIN
--initialize the private variables (essentially this is a .ctor)
SELECT
@long = null,
@lat = null,
@businessname = null,
@phoneNumber = null,
@winner = null
-- load data from the row declared when setting @LoopVar
SELECT
@long = longitude,
@lat = latitude,
@businessname = BusinessName,
@phoneNumber = Phone
FROM MyTable
WHERE RecordID = @LoopVar
--find the winning row with that data. The winning row means
SELECT top 1 @Winner = RecordID
FROM MyTable
WHERE @long = longitude
AND @lat = latitude
AND @businessname = BusinessName
AND @phoneNumber = Phone
ORDER BY
CASE WHEN webAddress is not null THEN 1 ELSE 2 END,
CASE WHEN caption1 is not null THEN 1 ELSE 2 END,
CASE WHEN caption2 is not null THEN 1 ELSE 2 END,
RecordID
--delete any losers.
DELETE FROM MyTable
WHERE @long = longitude
AND @lat = latitude
AND @businessname = BusinessName
AND @phoneNumber = Phone
AND @winner != RecordID
-- prep the next loop value to go ahead and perform the next duplicate query.
SET @LoopVar = (SELECT MIN(RecordID)
FROM MyTable
WHERE @LoopVar < RecordID)
END
推荐答案
不,如果您停止执行查询,sql server 不会回滚它已经执行的删除操作.oracle 需要明确提交操作查询,否则数据会回滚,但 mssql 不需要.
no, sql server will not roll back the deletes it has already performed if you stop query execution. oracle requires an explicit committal of action queries or the data gets rolled back, but not mssql.
使用 sql server 它不会回滚,除非您专门在事务的上下文中运行并且您回滚该事务,或者在没有提交事务的情况下关闭连接.但我在您的上述查询中没有看到交易上下文.
with sql server it will not roll back unless you are specifically running in the context of a transaction and you rollback that transaction, or the connection closes without the transaction having been committed. but i don't see a transaction context in your above query.
您也可以尝试重新构建您的查询以提高删除效率,但本质上,如果您的盒子的规格不符合标准,那么您可能会等待它结束.
you could also try re-structuring your query to make the deletes a little more efficient, but essentially if the specs of your box are not up to snuff then you might be stuck waiting it out.
接下来,您应该在表上创建一个唯一索引,以免自己再次经历此过程.
going forward, you should create a unique index on the table to keep yourself from having to go through this again.
这篇关于如果我停止长时间运行的查询,它会回滚吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!