C# - 两个大字符串数组的模糊比较

C# - Fuzzy compare of two large string arrays(C# - 两个大字符串数组的模糊比较)
本文介绍了C# - 两个大字符串数组的模糊比较的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要找到 B 中部分"存在于 A 中的所有字符串.

I need to find all strings in B that "partly" exists in A.

B = [ "Hello World!", "Hello Stack Overflow!", "Foo Bar!", "Food is nice...", "Hej" ]
A = [ "World", "Foo" ]
C = B.FuzzyCompare(A) // C = [ "Hello World!", "Foo Bar!", "Food is nice..." ]

我一直在研究将 Levenshtein 距离算法用于模糊"问题的一部分,以及迭代的 LINQ.但是,A * B 通常会导致超过 15 亿次比较.

I've been looking into using Levenshtein Distance Algorithm for the "fuzzy" part of the problem, as well as LINQ for the iterations. However, A * B usually results in over 1,5 billion comparisons.

我应该怎么做?有没有办法快速几乎比较"两个字符串列表?

How should i go about this? Is there a way to quickly "almost compare" two Lists of strings?

推荐答案

也许只比较子串就足够了,这样效率会高很多:

Maybe it's sufficient to simply compare substrings, this would be much more efficient:

var C = B.Where(s1 => A.Any(s2 => s1.IndexOf(s2, StringComparison.OrdinalIgnoreCase) >= 0)).ToList();

这篇关于C# - 两个大字符串数组的模糊比较的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

DispatcherQueue null when trying to update Ui property in ViewModel(尝试更新ViewModel中的Ui属性时DispatcherQueue为空)
Drawing over all windows on multiple monitors(在多个监视器上绘制所有窗口)
Programmatically show the desktop(以编程方式显示桌面)
c# Generic Setlt;Tgt; implementation to access objects by type(按类型访问对象的C#泛型集实现)
InvalidOperationException When using Context Injection in ASP.Net Core(在ASP.NET核心中使用上下文注入时发生InvalidOperationException)
LINQ many-to-many relationship, how to write a correct WHERE clause?(LINQ多对多关系,如何写一个正确的WHERE子句?)