如何计算给定2个字符串的距离相似性度量?

How to calculate distance similarity measure of given 2 strings?(如何计算给定2个字符串的距离相似性度量?)
本文介绍了如何计算给定2个字符串的距离相似性度量?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要计算 2 个字符串之间的相似度.那我到底是什么意思?让我用一个例子来解释:

I need to calculate the similarity between 2 strings. So what exactly do I mean? Let me explain with an example:

  • 真正的词:医院
  • 错字:haspita

现在我的目标是确定我需要多少个字符来修改错误的单词以获得真实的单词.在这个例子中,我需要修改 2 个字母.那么百分比是多少呢?我总是取真实单词的长度.所以它变成 2/8 = 25% 所以这 2 个给定的字符串 DSM 是 75%.

Now my aim is to determine how many characters I need to modify the mistaken word to obtain the real word. In this example, I need to modify 2 letters. So what would be the percent? I take the length of the real word always. So it becomes 2 / 8 = 25% so these 2 given string DSM is 75%.

如何在性能成为关键考虑因素的情况下实现这一目标?

How can I achieve this with performance being a key consideration?

推荐答案

你要找的叫做edit distance或者Levenshtein 距离.维基百科文章解释了它是如何计算的,并在底部有一段很好的伪代码,可以帮助您非常轻松地用 C# 编写这个算法.

What you are looking for is called edit distance or Levenshtein distance. The wikipedia article explains how it is calculated, and has a nice piece of pseudocode at the bottom to help you code this algorithm in C# very easily.

这是来自下面链接的第一个站点的实现:

Here's an implementation from the first site linked below:

private static int  CalcLevenshteinDistance(string a, string b)
    {
    if (String.IsNullOrEmpty(a) && String.IsNullOrEmpty(b)) {
        return 0;
    }
    if (String.IsNullOrEmpty(a)) {
        return b.Length;
    }
    if (String.IsNullOrEmpty(b)) {
        return a.Length;
    }
    int  lengthA   = a.Length;
    int  lengthB   = b.Length;
    var  distances = new int[lengthA + 1, lengthB + 1];
    for (int i = 0;  i <= lengthA;  distances[i, 0] = i++);
    for (int j = 0;  j <= lengthB;  distances[0, j] = j++);

    for (int i = 1;  i <= lengthA;  i++)
        for (int j = 1;  j <= lengthB;  j++)
            {
            int  cost = b[j - 1] == a[i - 1] ? 0 : 1;
            distances[i, j] = Math.Min
                (
                Math.Min(distances[i - 1, j] + 1, distances[i, j - 1] + 1),
                distances[i - 1, j - 1] + cost
                );
            }
    return distances[lengthA, lengthB];
    }

这篇关于如何计算给定2个字符串的距离相似性度量?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

DispatcherQueue null when trying to update Ui property in ViewModel(尝试更新ViewModel中的Ui属性时DispatcherQueue为空)
Drawing over all windows on multiple monitors(在多个监视器上绘制所有窗口)
Programmatically show the desktop(以编程方式显示桌面)
c# Generic Setlt;Tgt; implementation to access objects by type(按类型访问对象的C#泛型集实现)
InvalidOperationException When using Context Injection in ASP.Net Core(在ASP.NET核心中使用上下文注入时发生InvalidOperationException)
LINQ many-to-many relationship, how to write a correct WHERE clause?(LINQ多对多关系,如何写一个正确的WHERE子句?)