问题描述
我想检索一列,每行中的字母有多少差异.例如
I would like to retrieve a column of how many differences in letters in each row. For instance
如果你有一个值test"而另一行有一个值testing",那么test"和testing"之间的差异是4个字母.该列的数据将为值 4
If you have a a value "test" and another row has a value "testing ", then the differences is 4 letter between "test" and "testing ". The data of the column would be value 4
I have reflected about it and I don't know where to begin
id || value || category || differences
--------------------------------------------------
1 || test || 1 || 4
2 || testing || 1 || null
11 || candy || 2 || -3
12 || ca || 2 || null
在这个场景和上下文中,测试"和休息"没有区别.
In this scenario and context it is no difference between "Test" and "rest".
推荐答案
我认为您正在寻找的是 编辑差异,而不仅仅是计算前缀相似度,为此有一些常用算法.Levenshtein 的方法 是我以前使用过的方法,我已经看到它作为 TSQL 函数实现.this SO question 的答案建议了一些 TSQL 中的实现,您可能只是能够按原样获取和使用.
I think what you are looking for is a measure of edit difference, rather than just counting prefix similarity, for which there are a few common algorithms. Levenshtein's method is one that I've used before and I've seen it implemented as TSQL functions. The answers to this SO question suggest a couple of implementations in TSQL that you might just be able to take and use as-is.
(尽管花时间测试代码并理解方法,而不是仅仅复制代码并使用它,以便在出现问题时您可以理解输出 - 否则您可能会产生一些技术债务你以后要还钱)
确切地说,您想要哪种距离计算方法取决于您想如何计算某些事物,例如,您是将替换算作一次更改还是将删除和插入算作一次,以及您的字符串是否足够长,可以这样做你想考虑子串移动等等.
Exactly which distance calculation method you want will depend on how you want to count certain things, for instance do you count a substitution as one change or a delete and an insert, and if your strings are long enough for it to matter do you want to consider substring moves, and so forth.
这篇关于列中的差异数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!