测试两个IEnumerable<T>具有相同频率的相同值

Test whether two IEnumerablelt;Tgt; have the same values with the same frequencies(测试两个IEnumerablelt;Tgt;具有相同频率的相同值)
本文介绍了测试两个IEnumerable<T>具有相同频率的相同值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个多重集,都是 IEnumerables,我想比较它们.

I have two multisets, both IEnumerables, and I want to compare them.

string[] names1 = { "tom", "dick", "harry" };
string[] names2 = { "tom", "dick", "harry", "harry"};
string[] names3 = { "tom", "dick", "harry", "sally" };
string[] names4 = { "dick", "harry", "tom" };

希望 names1 == names4 返回 true(而 self == self 显然返回 true)
但所有其他组合都返回 false.

Want names1 == names4 to return true (and self == self returns true obviously)
But all other combos return false.

什么是最有效的方法?这些可以是大量复杂对象.

What is the most efficient way? These can be large sets of complex objects.

我看着做:
var a = name1.orderby(v => v.Name);
var b = name4.orderby(v => v.Name);

return a == b;

推荐答案

最有效的方法取决于数据类型.一个相当有效且非常短的 O(N) 解决方案如下:

The most efficient way would depend on the datatypes. A reasonably efficient O(N) solution that's very short is the following:

var list1Groups=list1.ToLookup(i=>i);
var list2Groups=list2.ToLookup(i=>i);
return list1Groups.Count == list2Groups.Count 
   && list1Groups.All(g => g.Count() == list2Groups[g.Key].Count());

项目必须具有有效的 EqualsGetHashcode 实现.

The items are required to have a valid Equals and GetHashcode implementation.

如果您想要一个更快的解决方案,cdhowie 的解决方案在 10000 个元素下相当快,并且领先大型简单对象集合的因子 5 - 可能是由于更好的内存效率.

If you want a faster solution, cdhowie's solution below is comparably fast @ 10000 elements, and pulls ahead by a factor 5 for large collections of simple objects - probably due to better memory efficiency.

最后,如果您真的对性能感兴趣,我肯定会尝试 Sort-then-SequenceEqual 方法.虽然它的复杂性更差,但这只是一个 log N 因素,并且这些因素肯定会被所有实际数据集大小的常数差异所淹没 - 你也许可以就地排序,使用数组甚至增量排序(可以是线性的).即使有 40 亿个元素,log-base-2 也只有 32;这是一个相关的性能差异,但常数因子的差异可能会更大.例如,如果您正在处理整数数组并且不介意修改收集顺序,那么即使对于 10000000 个项目(两倍,我在 32 位上得到 OutOfMemory),以下选项也比任何一个选项都快:

Finally, if you're really interested in performance, I'd definitely try the Sort-then-SequenceEqual approach. Although it has worse complexity, that's just a log N factor, and those can definitely be drowned out by differences in the constant for all practical data set sizes - and you might be able to sort in-place, use arrays or even incrementally sort (which can be linear). Even at 4 billion elements, the log-base-2 is just 32; that's a relevant performance difference, but the difference in constant factor could conceivably be larger. For example, if you're dealing with arrays of ints and don't mind modifying the collection order, the following is faster than either option even for 10000000 items (twice that and I get an OutOfMemory on 32-bit):

Array.Sort(list1);
Array.Sort(list2);
return list1.SequenceEqual(list2);

YMMV 取决于机器、数据类型、月球周期和其他影响微基准的常见因素.

YMMV depending on machine, data-type, lunar cycle, and the other usual factors influencing microbenchmarks.

这篇关于测试两个IEnumerable<T>具有相同频率的相同值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

DispatcherQueue null when trying to update Ui property in ViewModel(尝试更新ViewModel中的Ui属性时DispatcherQueue为空)
Drawing over all windows on multiple monitors(在多个监视器上绘制所有窗口)
Programmatically show the desktop(以编程方式显示桌面)
c# Generic Setlt;Tgt; implementation to access objects by type(按类型访问对象的C#泛型集实现)
InvalidOperationException When using Context Injection in ASP.Net Core(在ASP.NET核心中使用上下文注入时发生InvalidOperationException)
LINQ many-to-many relationship, how to write a correct WHERE clause?(LINQ多对多关系,如何写一个正确的WHERE子句?)