问题描述
为什么注释掉这个 for 循环的前两行并取消注释第三行会导致 42% 的加速?
Why does commenting out the first two lines of this for loop and uncommenting the third result in a 42% speedup?
int count = 0;
for (uint i = 0; i < 1000000000; ++i) {
var isMultipleOf16 = i % 16 == 0;
count += isMultipleOf16 ? 1 : 0;
//count += i % 16 == 0 ? 1 : 0;
}
时间的背后是截然不同的汇编代码:循环中有 13 条与 7 条指令.该平台是运行 .NET 4.0 x64 的 Windows 7.启用了代码优化,并且测试应用在 VS2010 之外运行.[更新: Repro 项目,用于验证项目设置.]
Behind the timing is vastly different assembly code: 13 vs. 7 instructions in the loop. The platform is Windows 7 running .NET 4.0 x64. Code optimization is enabled, and the test app was run outside VS2010. [Update: Repro project, useful for verifying project settings.]
消除中间布尔值是一项基本优化,是我 1980 年代最简单的优化之一 龙之书.生成 CIL 或 JIT 处理 x64 机器码时,如何优化没有得到应用?
Eliminating the intermediate boolean is a fundamental optimization, one of the simplest in my 1980's era Dragon Book. How did the optimization not get applied when generating the CIL or JITing the x64 machine code?
是否有真正的编译器,我希望你优化这段代码,请"开关?虽然我对过早优化类似于 对金钱的热爱,我可以看到在尝试分析一个复杂算法时的挫败感,这种算法在其例行程序中散布着这样的问题.您将通过热点工作,但没有任何迹象表明可以通过手动调整我们通常认为编译器理所当然的内容来大大改善更广泛的温暖区域.我当然希望我在这里遗漏了一些东西.
Is there a "Really compiler, I would like you to optimize this code, please" switch? While I sympathize with the sentiment that premature optimization is akin to the love of money, I could see the frustration in trying to profile a complex algorithm that had problems like this scattered throughout its routines. You'd work through the hotspots but have no hint of the broader warm region that could be vastly improved by hand tweaking what we normally take for granted from the compiler. I sure hope I'm missing something here.
更新: x86 也存在速度差异,但取决于方法的即时编译顺序.请参阅 为什么 JIT 顺序会影响性能?
Update: Speed differences also occur for x86, but depend on the order that methods are just-in-time compiled. See Why does JIT order affect performance?
汇编代码(根据要求):
var isMultipleOf16 = i % 16 == 0;
00000037 mov eax,edx
00000039 and eax,0Fh
0000003c xor ecx,ecx
0000003e test eax,eax
00000040 sete cl
count += isMultipleOf16 ? 1 : 0;
00000043 movzx eax,cl
00000046 test eax,eax
00000048 jne 0000000000000050
0000004a xor eax,eax
0000004c jmp 0000000000000055
0000004e xchg ax,ax
00000050 mov eax,1
00000055 lea r8d,[rbx+rax]
count += i % 16 == 0 ? 1 : 0;
00000037 mov eax,ecx
00000039 and eax,0Fh
0000003c je 0000000000000042
0000003e xor eax,eax
00000040 jmp 0000000000000047
00000042 mov eax,1
00000047 lea edx,[rbx+rax]
推荐答案
这是 .NET Framework 中的一个错误.
It's a bug in the .NET Framework.
好吧,我真的只是在猜测,但我提交了一份关于 Microsoft Connect 看看他们怎么说.微软删除该报告后,我在 GitHub 上的 roslyn 项目中重新提交了该报告.
Well, really I'm just speculating, but I submitted a bug report on Microsoft Connect to see what they say. After Microsoft deleted that report, I resubmitted it on roslyn project on GitHub.
更新:Microsoft 已将该问题移至 coreclr 项目.从对该问题的评论来看,称其为错误似乎有点强;这更像是一个缺失的优化.
Update: Microsoft has moved the issue to the coreclr project. From the comments on the issue, calling it a bug seems a bit strong; it's more of a missing optimization.
这篇关于为什么添加局部变量会使 .NET 代码变慢的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!