问题描述
选择返回 23,000 行
除了将返回 60 到 200 行(而不是相同的行)
例外应该返回 0 因为它是选择一个除了选择一个
The select returns right at 23,000 rows
The except will return between 60 to 200 rows (and not the same rows)
The except should return 0 as it is select a except select a
PK: [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
PK: [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
[tf] 是一个浮点数,而我得到的浮点数不准确
但我天真地认为 avg(float) 是可重复的
Avg(float) 似乎是可重复的
[tf] is a float and and I get float is not exact
But I naively thought avg(float) would be repeatable
Avg(float) does appear to be repeatable
解决办法是什么?
TF 介于 0 和 1 之间,我只需要 5 个有效数字
我只需要 avg(TF) 是相同的运行数
Decimal(9,8) 给了我足够的精度,如果我转换为 decimal(9,8) ,except 正确返回 0
我可以将 [TF] 更改为十进制(9,8),但由于使用 [tf] 的某些测试需要花费一天的时间来运行,因此需要一些工作和大量回归测试
将 [TF] 更改为 decimal(9,8) 是最佳解决方案吗?
What is the solution?
TF is between 0 and 1 and I only need like 5 significant digits
I just need avg(TF) to be the same number run to run
Decimal(9,8) gives me enough precision and if I cast to decimal(9,8) the except properly returns 0
I can change [TF] to decimal(9,8) but it will be bit of work and lot of regression testing as some of the test that use [tf] take over a day to run
Is change [TF] to decimal(9,8) the best solution?
SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
, avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
FROM [docSVenum1]
JOIN [docFieldLock]
ON [docFieldLock].[sID] = [docSVenum1].[sID]
AND [docFieldLock].[fieldID] = [docSVenum1].[enumID]
AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */
JOIN [FTSindexWordOnce]
ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
except
SELECT [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
, avg([FTSindexWordOnce].[tf]) AS [avgTFraw]
FROM [docSVenum1]
JOIN [docFieldLock]
ON [docFieldLock].[sID] = [docSVenum1].[sID]
AND [docFieldLock].[fieldID] = [docSVenum1].[enumID]
AND [docFieldLock].[lockID] IN (4, 5) /* secLvl docAdm */
JOIN [FTSindexWordOnce]
ON [FTSindexWordOnce].[sID] = [docSVenum1].[sID]
GROUP BY [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
order by [docSVenum1].[enumID], [docSVenum1].[valueID], [FTSindexWordOnce].[wordID]
在这种情况下,tf 是 tf-idf
tf 归一化是主观的,不需要很高的精度
avg(tf) 需要从select到select保持一致否则结果不一致
在带有连接的单个选择中,我需要一致的 avg(tf)
使用十进制和低精度的 tf 得到一致的结果
In this case tf is term frequency of tf-idf
tf normalization is subjective and does not require much precision
Avg(tf) needs to be consistent from select to select or the results are not consistent
In a single select with joins I need a consistent avg(tf)
Going with decimal and a low precision for tf got consistent results
推荐答案
这非常类似于:SELECT SUM(...) 在添加列时是不确定的- 数据类型浮点值
.
问题在于,由于数据类型不准确(FLOAT/REAL
),浮点数算术运算的顺序很重要.来自连接的演示:
The problem is that with inaccurate datatype (FLOAT/REAL
) the order of of arithmetic operations on floating point matters. Demo from connect:
DECLARE @fl FLOAT = 100000000000000000000
DECLARE @i SMALLINT = 0
WHILE (@i < 100)
BEGIN
SET @fl = @fl + CONVERT(float, 5000)
SET @i = @i + 1
END
SET @fl = @fl - 100000000000000000000
SELECT CONVERT(NVARCHAR(40), @fl, 2)
-- 0.000000000000000e+000
DECLARE @fl FLOAT = 0
DECLARE @i SMALLINT = 0
WHILE (@i < 100)
BEGIN
SET @fl = @fl + CONVERT(float, 5000)
SET @i = @i + 1
END
SET @fl = @fl + 100000000000000000000
SET @fl = @fl - 100000000000000000000
SELECT @fl
-- 507904
LiveDemo
可能的解决方案:
CAST
将所有参数转换为准确的数据类型,例如DECIMAL/NUMERIC
- 改变表并将
FLOAT
改为DECIMAL
- 您可以尝试强制查询优化器以相同的顺序计算总和.
CAST
all arguments to accurate datatype likeDECIMAL/NUMERIC
- alter table and change
FLOAT
toDECIMAL
- you can try to force query optimizer to calculate the sum with the same order.
好消息是,当一个稳定的查询结果对您很重要时应用程序,您可以通过阻止来强制顺序相同与 OPTION (MAXDOP 1) 的并行性.
The good news is that when a stable query result matters to your application, you can force the order to be the same by preventing parallelism with OPTION (MAXDOP 1).
<小时>
看起来初始链接已失效.WebArchive
这篇关于浮动不一致的平均值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!