内联汇编语言是否比本机 C++ 代码慢?

Is inline assembly language slower than native C++ code?(内联汇编语言是否比本机 C++ 代码慢?)
本文介绍了内联汇编语言是否比本机 C++ 代码慢?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图比较内联汇编语言和C++代码的性能,所以我写了一个函数,将两个大小为2000的数组相加100000次.代码如下:

I tried to compare the performance of inline assembly language and C++ code, so I wrote a function that add two arrays of size 2000 for 100000 times. Here's the code:

#define TIMES 100000
void calcuC(int *x,int *y,int length)
{
    for(int i = 0; i < TIMES; i++)
    {
        for(int j = 0; j < length; j++)
            x[j] += y[j];
    }
}


void calcuAsm(int *x,int *y,int lengthOfArray)
{
    __asm
    {
        mov edi,TIMES
        start:
        mov esi,0
        mov ecx,lengthOfArray
        label:
        mov edx,x
        push edx
        mov eax,DWORD PTR [edx + esi*4]
        mov edx,y
        mov ebx,DWORD PTR [edx + esi*4]
        add eax,ebx
        pop edx
        mov [edx + esi*4],eax
        inc esi
        loop label
        dec edi
        cmp edi,0
        jnz start
    };
}

这是main():

int main() {
    bool errorOccured = false;
    setbuf(stdout,NULL);
    int *xC,*xAsm,*yC,*yAsm;
    xC = new int[2000];
    xAsm = new int[2000];
    yC = new int[2000];
    yAsm = new int[2000];
    for(int i = 0; i < 2000; i++)
    {
        xC[i] = 0;
        xAsm[i] = 0;
        yC[i] = i;
        yAsm[i] = i;
    }
    time_t start = clock();
    calcuC(xC,yC,2000);

    //    calcuAsm(xAsm,yAsm,2000);
    //    for(int i = 0; i < 2000; i++)
    //    {
    //        if(xC[i] != xAsm[i])
    //        {
    //            cout<<"xC["<<i<<"]="<<xC[i]<<" "<<"xAsm["<<i<<"]="<<xAsm[i]<<endl;
    //            errorOccured = true;
    //            break;
    //        }
    //    }
    //    if(errorOccured)
    //        cout<<"Error occurs!"<<endl;
    //    else
    //        cout<<"Works fine!"<<endl;

    time_t end = clock();

    //    cout<<"time = "<<(float)(end - start) / CLOCKS_PER_SEC<<"
";

    cout<<"time = "<<end - start<<endl;
    return 0;
}

然后我运行该程序五次以获取处理器的周期,这可以看作是时间.每次我只调用上面提到的一个函数.

Then I run the program five times to get the cycles of processor, which could be seen as time. Each time I call one of the function mentioned above only.

结果来了.

Debug   Release
---------------
732        668
733        680
659        672
667        675
684        694
Average:   677

C++版本的功能:

Debug     Release
-----------------
1068      168
 999      166
1072      231
1002      166
1114      183
Average:  182

发布模式下的 C++ 代码几乎比汇编代码快 3.7 倍.为什么?

The C++ code in release mode is almost 3.7 times faster than the assembly code. Why?

我猜我写的汇编代码没有GCC生成的那么有效.像我这样的普通程序员很难写出比编译器生成的代码更快的代码.这是否意味着我不应该相信自己亲手编写的汇编语言的性能,专注于C++而忘记汇编语言?

I guess that the assembly code I wrote is not as effective as those generated by GCC. It's hard for a common programmer like me to wrote code faster than its opponent generated by a compiler.Does that mean I should not trust the performance of assembly language written by my hands, focus on C++ and forget about assembly language?

推荐答案

是的,大多数时候.

首先,您错误地假设低级语言(在本例中为汇编)将始终比高级语言(在本例中为 C++ 和 C)生成更快的代码.这不是真的.C 代码总是比 Java 代码快吗?不,因为还有另一个变量:程序员.您编写代码的方式和架构细节的知识极大地影响了性能(正如您在本例中所见).

First of all you start from wrong assumption that a low-level language (assembly in this case) will always produce faster code than high-level language (C++ and C in this case). It's not true. Is C code always faster than Java code? No because there is another variable: programmer. The way you write code and knowledge of architecture details greatly influence performance (as you saw in this case).

您可以总是生成一个示例,其中手工汇编代码比编译代码更好,但通常这是一个虚构的示例或单个例程,而不是真实em> 500.000 多行 C++ 代码的程序).我认为编译器会在 95% 的情况下生成更好的汇编代码,并且有时,只有极少数情况,您可能需要编写一些简短的汇编代码,高度使用,性能关键 例程或当您必须访问您最喜欢的高级语言未公开的功能时.你想感受一下这种复杂性吗?在 SO 上阅读这个很棒的答案.

You can always produce an example where handmade assembly code is better than compiled code but usually it's a fictional example or a single routine not a true program of 500.000+ lines of C++ code). I think compilers will produce better assembly code 95% times and sometimes, only some rare times, you may need to write assembly code for few, short, highly used, performance critical routines or when you have to access features your favorite high-level language does not expose. Do you want a touch of this complexity? Read this awesome answer here on SO.

为什么会这样?

首先,因为编译器可以进行我们甚至无法想象的优化(请参阅这个短列表),他们会在内完成(当我们可能需要几天时间时).

First of all because compilers can do optimizations that we can't even imagine (see this short list) and they will do them in seconds (when we may need days).

当您在汇编中编码时,您必须使用明确定义的调用接口创建明确定义的函数.但是他们可以考虑整个程序优化和过程间优化如注册分配、常量传播、常见子表达式消除、指令调度和其他复杂的、不明显的优化(Polytope 模型,例如).在 RISC 架构上,人们多年前就不再担心这个问题了(例如,指令调度非常困难)手动调谐)和现代CISC CPU 有很长的管道也是.

When you code in assembly you have to make well-defined functions with a well-defined call interface. However they can take in account whole-program optimization and inter-procedural optimization such as register allocation, constant propagation, common subexpression elimination, instruction scheduling and other complex, not obvious optimizations (Polytope model, for example). On RISC architecture guys stopped worrying about this many years ago (instruction scheduling, for example, is very hard to tune by hand) and modern CISC CPUs have very long pipelines too.

对于一些复杂的微控制器,甚至系统库都是用 C 语言编写的,而不是用汇编语言编写的,因为它们的编译器会生成更好(且易于维护)的最终代码.

For some complex microcontrollers even system libraries are written in C instead of assembly because their compilers produce a better (and easy to maintain) final code.

编译器有时可以自行自动使用一些 MMX/SIMDx 指令,如果您不要使用它们你根本无法比较(其他答案已经很好地审查了你的汇编代码).仅用于循环,这是一个循环优化的简短列表常见 由编译器检查(当 C# 程序的日程安排已经确定后,你认为你可以自己做吗?)如果你用汇编写一些东西,我认为你至少必须考虑一些 简单优化.数组的教科书示例是展开循环(其大小在编译时已知).这样做并再次运行您的测试.

Compilers sometimes can automatically use some MMX/SIMDx instructions by themselves, and if you don't use them you simply can't compare (other answers already reviewed your assembly code very well). Just for loops this is a short list of loop optimizations of what is commonly checked for by a compiler (do you think you could do it by yourself when your schedule has been decided for a C# program?) If you write something in assembly, I think you have to consider at least some simple optimizations. The school-book example for arrays is to unroll the cycle (its size is known at compile time). Do it and run your test again.

如今,由于另一个原因需要使用汇编语言也非常罕见:过多的不同CPU.你想支持他们吗?每个都有一个特定的微架构和一些特定指令集.它们具有不同数量的功能单元,应安排汇编指令以保持它们.如果您用 C 编写,您可以使用 PGO 但在汇编中,您将需要丰富的知识特定架构(以及为另一个架构重新思考和重做一切).对于小任务,编译器通常做得更好,而对于复杂任务通常,工作没有得到回报(并且 编译器可能做得更好.

These days it's also really uncommon to need to use assembly language for another reason: the plethora of different CPUs. Do you want to support them all? Each has a specific microarchitecture and some specific instruction sets. They have different number of functional units and assembly instructions should be arranged to keep them all busy. If you write in C you may use PGO but in assembly you will then need a great knowledge of that specific architecture (and rethink and redo everything for another architecture). For small tasks the compiler usually does it better, and for complex tasks usually the work isn't repaid (and compiler may do better anyway).

如果你坐下来看看你的代码,你可能会发现重新设计算法比转换为汇编会获得更多(阅读这篇这里是SO的好帖子),您可以在之前有效地应用高级优化(和编译器提示)你需要求助于汇编语言.可能值得一提的是,经常使用内在函数可以获得您正在寻找的性能提升,并且编译器仍然能够执行大部分优化.

If you sit down and you take a look at your code probably you'll see that you'll gain more to redesign your algorithm than to translate to assembly (read this great post here on SO), there are high-level optimizations (and hints to compiler) you can effectively apply before you need to resort to assembly language. It's probably worth to mention that often using intrinsics you will have performance gain your're looking for and compiler will still be able to perform most of its optimizations.

综上所述,即使您可以生成快 5 到 10 倍的汇编代码,您也应该询问您的客户他们是否愿意支付一周您的时间购买速度快 50 美元的 CPU.我们大多数人通常不需要极端优化(尤其是在 LOB 应用程序中).

All this said, even when you can produce a 5~10 times faster assembly code, you should ask your customers if they prefer to pay one week of your time or to buy a 50$ faster CPU. Extreme optimization more often than not (and especially in LOB applications) is simply not required from most of us.

这篇关于内联汇编语言是否比本机 C++ 代码慢?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!

本站部分内容来源互联网,如果有图片或者内容侵犯您的权益请联系我们删除!

相关文档推荐

Rising edge interrupt triggering multiple times on STM32 Nucleo(在STM32 Nucleo上多次触发上升沿中断)
How to use va_list correctly in a sequence of wrapper functions calls?(如何在一系列包装函数调用中正确使用 va_list?)
OpenGL Perspective Projection Clipping Polygon with Vertex Outside Frustum = Wrong texture mapping?(OpenGL透视投影裁剪多边形,顶点在视锥外=错误的纹理映射?)
How does one properly deserialize a byte array back into an object in C++?(如何正确地将字节数组反序列化回 C++ 中的对象?)
What free tiniest flash file system could you advice for embedded system?(您可以为嵌入式系统推荐什么免费的最小闪存文件系统?)
Volatile member variables vs. volatile object?(易失性成员变量与易失性对象?)