问题描述
upd 我现在认为我的问题的根源不是线程",因为我观察到我的程序的任何一点都变慢了.我认为不知何故,当使用 2 个处理器时,我的程序执行速度较慢,可能是因为两个处理器需要相互通信".我需要做一些测试.我将尝试禁用其中一个处理器,看看会发生什么.
upd I now think that root of my problem not "threading", because I observe slowdown at any point of my program. I think somehow when using 2 processors my program executes slower probably because two processors need to "communicate" between each other. I need to do some tests. I will try to disable one of the processors and see what happens.
======================================
====================================
我不确定这是否是 C# 问题,可能更多的是关于硬件,但我认为 C# 将是最合适的.
I'm not sure if this is C# question, probably it more about hardware, but I think C# will be most suitable.
我使用的是便宜的 DL120 服务器,我决定升级到更昂贵的 2 处理器 DL360p 服务器.出乎意料的是,我的 C# 程序在新服务器上的运行速度大约慢了大约 2 倍,而新服务器应该快了几倍.
I was using cheap DL120 server and I decided to upgrade to much more expensive 2 processors DL360p server. Unexpectedly my C# program works about ~2 times slower on new server which supposed to be several times faster.
我处理了大约 60 种仪器的 FAST 数据.我为每个乐器创建了单独的任务:
I processed FAST data for about 60 Instruments. I have created separate Task for each Instrument like that:
BlockingCollection<OrderUpdate> updatesQuery;
if (instrument2OrderUpdates.ContainsKey(instrument))
{
updatesQuery = instrument2OrderUpdates[instrument];
} else
{
updatesQuery = new BlockingCollection<OrderUpdate>();
instrument2OrderUpdates[instrument] = updatesQuery;
ScheduleFastOrdersProcessing(updatesQuery);
}
orderUpdate.Checkpoint("updatesQuery.Add");
updatesQuery.Add(orderUpdate);
}
private void ScheduleFastOrdersProcessing(BlockingCollection<OrderUpdate> updatesQuery)
{
Task.Factory.StartNew(() =>
{
Instrument instrument = null;
OrderBook orderBook = null;
int lastRptSeqNum = -1;
while (!updatesQuery.IsCompleted)
{
OrderUpdate orderUpdate;
try
{
orderUpdate = updatesQuery.Take();
} catch(InvalidOperationException e)
{
Log.Push(LogItemType.Error, e.Message);
continue;
}
orderUpdate.Checkpoint("received from updatesQuery.Take()");
......................
...................... // long not interesting processing code
}, TaskCreationOptions.LongRunning);
因为我有大约 60 个可以并行执行的任务,我希望 2 * E5-2640(24 个虚拟线程,12 个真实线程)的执行速度应该比 1 * E3-1220(4 个真实线程)快得多.似乎使用 DL360p 我在任务管理器中找到了 95 个线程.使用 DL120 我只有 55 个线程.
Because I have about 60 task which can be executed in parallel I expect that 2 * E5-2640 (24 virtual threads, 12 real threads) should perform much more faster than 1 * E3-1220 (4 real threads). It seems that using DL360p I found 95 threads in task manager. Using DL120 I have only 55 threads.
但在 DL120G7 上的执行时间要快 2 (!!) 倍!E3-1220 的时钟频率比 E5-2640 好一点(3.1 GHz 与 2.5Ghz),但我仍然希望我的代码在 2 * E5-2640 上运行得更快,因为它可以更好地并行化,我绝对不期望它的工作速度慢了 2 倍!
But execution time on DL120G7 is 2 (!!) times faster! E3-1220 has a little bit better clock rate than E5-2640 (3.1 GHz vs 2.5Ghz) however I still expect that my code should work faster on 2 * E5-2640 because it can be paralleled much better and I absolutely do not expect that it work 2 times slower!
HP DL120G7 E3-1220
任务管理器中的 50 个线程最好 = 24 个平均 ~ 80 微秒
~50 threads in Task Manager best = 24 average ~ 80 microseconds
calling market.UpdateFastOrder = 23 updatesQuery.Add = 25 received from updatesQuery.Take() = 67 in orderbook = 80
calling market.UpdateFastOrder = 30 updatesQuery.Add = 32 received from updatesQuery.Take() = 64 in orderbook = 73
calling market.UpdateFastOrder = 31 updatesQuery.Add = 32 received from updatesQuery.Take() = 195 in orderbook = 204
calling market.UpdateFastOrder = 31 updatesQuery.Add = 32 received from updatesQuery.Take() = 74 in orderbook = 86
calling market.UpdateFastOrder = 18 updatesQuery.Add = 21 received from updatesQuery.Take() = 65 in orderbook = 78
calling market.UpdateFastOrder = 29 updatesQuery.Add = 32 received from updatesQuery.Take() = 76 in orderbook = 88
calling market.UpdateFastOrder = 30 updatesQuery.Add = 32 received from updatesQuery.Take() = 80 in orderbook = 92
calling market.UpdateFastOrder = 20 updatesQuery.Add = 21 received from updatesQuery.Take() = 65 in orderbook = 78
calling market.UpdateFastOrder = 21 updatesQuery.Add = 24 received from updatesQuery.Take() = 68 in orderbook = 81
calling market.UpdateFastOrder = 12 updatesQuery.Add = 13 received from updatesQuery.Take() = 58 in orderbook = 72
calling market.UpdateFastOrder = 22 updatesQuery.Add = 23 received from updatesQuery.Take() = 51 in orderbook = 59
calling market.UpdateFastOrder = 16 updatesQuery.Add = 16 received from updatesQuery.Take() = 20 in orderbook = 24
calling market.UpdateFastOrder = 28 updatesQuery.Add = 31 received from updatesQuery.Take() = 82 in orderbook = 94
calling market.UpdateFastOrder = 18 updatesQuery.Add = 21 received from updatesQuery.Take() = 65 in orderbook = 77
calling market.UpdateFastOrder = 29 updatesQuery.Add = 29 received from updatesQuery.Take() = 259 in orderbook = 264
calling market.UpdateFastOrder = 49 updatesQuery.Add = 52 received from updatesQuery.Take() = 99 in orderbook = 113
calling market.UpdateFastOrder = 22 updatesQuery.Add = 23 received from updatesQuery.Take() = 50 in orderbook = 60
calling market.UpdateFastOrder = 29 updatesQuery.Add = 32 received from updatesQuery.Take() = 76 in orderbook = 88
calling market.UpdateFastOrder = 16 updatesQuery.Add = 19 received from updatesQuery.Take() = 63 in orderbook = 75
calling market.UpdateFastOrder = 27 updatesQuery.Add = 27 received from updatesQuery.Take() = 226 in orderbook = 231
calling market.UpdateFastOrder = 15 updatesQuery.Add = 16 received from updatesQuery.Take() = 35 in orderbook = 42
calling market.UpdateFastOrder = 18 updatesQuery.Add = 21 received from updatesQuery.Take() = 66 in orderbook = 78
HP DL360p G8 2 * E5-2640
~任务管理器中的 95 个线程;最佳 = 40 平均 ~ 150 微秒
~95 threads in Task Manager; best = 40 average ~ 150 microseconds
calling market.UpdateFastOrder = 62 updatesQuery.Add = 64 received from updatesQuery.Take() = 144 in orderbook = 205
calling market.UpdateFastOrder = 27 updatesQuery.Add = 32 received from updatesQuery.Take() = 101 in orderbook = 154
calling market.UpdateFastOrder = 45 updatesQuery.Add = 50 received from updatesQuery.Take() = 124 in orderbook = 187
calling market.UpdateFastOrder = 46 updatesQuery.Add = 51 received from updatesQuery.Take() = 127 in orderbook = 162
calling market.UpdateFastOrder = 63 updatesQuery.Add = 68 received from updatesQuery.Take() = 137 in orderbook = 174
calling market.UpdateFastOrder = 53 updatesQuery.Add = 55 received from updatesQuery.Take() = 133 in orderbook = 171
calling market.UpdateFastOrder = 44 updatesQuery.Add = 46 received from updatesQuery.Take() = 131 in orderbook = 158
calling market.UpdateFastOrder = 37 updatesQuery.Add = 39 received from updatesQuery.Take() = 102 in orderbook = 140
calling market.UpdateFastOrder = 45 updatesQuery.Add = 50 received from updatesQuery.Take() = 115 in orderbook = 154
calling market.UpdateFastOrder = 50 updatesQuery.Add = 55 received from updatesQuery.Take() = 133 in orderbook = 160
calling market.UpdateFastOrder = 26 updatesQuery.Add = 50 received from updatesQuery.Take() = 99 in orderbook = 111
calling market.UpdateFastOrder = 14 updatesQuery.Add = 30 received from updatesQuery.Take() = 36 in orderbook = 40 <-- best one I can find among thousands
你能看出为什么我的程序在快几倍的服务器上运行慢了 2 倍吗?可能我不应该创建 ~60 任务?也许我应该告诉 .NET 不要使用 95 个线程,而是将其限制为 50 甚至 24 个?可能这是 2 个处理器与 1 个处理器的配置问题?可能只是禁用我的 DL360P Gen8 上的一个处理器会显着加快程序速度?
Are you able to see why my program runs 2 times slower on several times faster server? Probably I should not create ~60 Task? Probably I should tell .NET not to use 95 threads but to limit it to 50 or even 24? Probably this is 2 processors vs 1 processor configuration issue? Probably just disabling one of the processors on my DL360P Gen8 will speed-up program significantly?
已添加
- 调用 market.UpdateFastOrder - 创建 orderUpdate 对象
- updatesQuery.Add - orderUpdate 被放入 BlockingCollection
- 从 updatesQuery.Take() 接收 - orderUpdate 从 BlockingCollection 中弹出
- 在 orderbook 中 - orderUpdated 被解析并应用于 orderBook
推荐答案
仅仅因为你有一个可以处理更多线程的系统,这并不意味着它们都可以完全并行处理.
just because you have a System which can handle much more threads, this does not mean that all of them can be fully processed parallel.
当我从四核 CPU 升级到 i7(虚拟 8 核)时,我注意到使用比核心更多的线程的设置会导致线程在一段时间内相互阻塞,从而导致系统整体速度变慢.
When I upgrade from a Quadcore CPU to a i7(virtual 8 cores), I noticed that a setup using more threads than cores resulted in the threads blocking each other for some time, which lead to an overall slowdown of the System.
问题只是我的算法已经能够使用其线程正在运行的核心的全部处理时间,而等待线程仅在大约 5% 到 10% 上工作,这导致主线程完成但有些单一线程仍然必须完成所有工作(再次花费相同的时间).
The problem was just that my algorythims already were capable of using the full processing time of the core their thread was running on while waiting threads only worked on about 5 to 10%, which lead to the main threads to finish but some singe threads still having to do all their work(taking the same amout of time again).
只有在所有工作线程都完成后,线程池才会继续,因此直到完成的总时间将是其他线程未使用的处理器时间.
The threadpool will only continue if all workers have finished, so the total amount of time until finishing will be unuset processor time for the other threads.
也许您只需要找到最佳线程数.
maybe you just need to find an optimal number of threads.
这篇关于线程池程序在更快的服务器上运行得更慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!