问题描述
术语运算符优先级"和求值顺序"是编程中非常常用的术语,对于程序员来说非常重要.而且,据我了解,这两个概念是紧密联系在一起的;在谈论表达时,一个人离不开另一个.
举个简单的例子:
int a=1;//第 1 行a = a++ + ++a;//第 2 行printf("%d",a);//第 3 行
现在,很明显 Line 2
导致未定义行为,因为 C 和 C++ 中的序列点 包括:
在 && 的左右操作数之间求值(合乎逻辑的与), ||(逻辑或)和逗号运营商.例如,在表达式 <代码>*p++ != 0 &&*q++ != 0, 全部子表达式的副作用
*p++ != 0
在任何尝试访问q
之前完成.三元的第一个操作数的求值之间问号"运算符和第二个或第三个操作数.例如,在表达式
a = (*p++) 中?(*p++): 0
后面有一个序列点第一个*p++
,意味着它已经被增加的时间第二个实例被执行.在完整表达式的末尾.此类别包括表达语句(例如赋值
a=b;
),返回语句,控制 if、switch、while 或 do-while 语句,以及所有for 语句中的三个表达式.在函数调用中输入函数之前.其中的顺序评估的参数不是指定,但这个序列点意味着他们所有的副作用在功能完成之前进入.在表达式
f(i++) + g(j++) + h(k++)
中,f
被调用i
的原始值的参数,但i
在输入之前增加f
的主体.类似地,j
和k
是在输入g
和h
之前更新分别.然而,它不是指定顺序f()
,g()
,h()
被执行,也没有顺序i
,j
,k
递增.j
和的值因此f
主体中的k
是未定义.3 注意一个函数callf(a,b,c)
不是使用逗号运算符和顺序a
、b
和c
的评估是未指定.在函数返回时,将返回值复制到调用上下文.(这个序列点仅在 C++ 标准中指定;它仅隐含地存在于C.)
在初始化器的末尾;例如,经过评估 5在声明
int a = 5;
.
因此,通过第 3 点:
在完整表达式的末尾.此类别包括表达式语句(例如赋值 a=b;)、return 语句、if、switch、while 或 do-while 语句的控制表达式以及 for 语句中的所有三个表达式.>
Line 2
显然会导致未定义行为.这显示了未定义行为如何与序列点紧密结合.
现在让我们再举一个例子:
int x=10,y=1,z=2;//第 4 行int 结果 = x
现在很明显Line 5
将使变量result
存储1
.
现在 Line 5
中的表达式 x
x<(y
(x
result
的值为 0
,而在第二种情况下,result
的值为 1
.但是我们知道,当 Operator Precedence
是 Equal/Same
- Associativity
开始起作用时,因此被评估为 (x<y)
这就是这篇 MSDN 文章 中所说的:>
C 运算符的优先级和结合性会影响表达式中操作数的分组和计算.仅当存在具有更高或更低优先级的其他运算符时,运算符的优先级才有意义.首先评估具有更高优先级运算符的表达式.优先级也可以用绑定"这个词来描述.具有更高优先级的运算符据说具有更紧密的绑定.
现在,关于上述文章:
它提到首先评估具有更高优先级运算符的表达式."
这听起来可能不正确.但是,如果我们认为 ()
也是一个运算符 x<y<z
与 (x<y)
<
不是 序列点.
另外,我在运算符优先级和关联性上找到的另一个链接是这样说的:>
此页面按优先级(从高到低)列出了 C 运算符.它们的结合性表示在表达式中应用相同优先级的运算符的顺序.
以int result=x<y<z
的第二个例子为例,我们可以看到这里有3个表达式,x
,y
和 z
,因为,最简单的表达式形式由单个文字常量或对象组成.因此表达式 x
, y
, z
的结果将是 rvalues,即 10
、1
和 2
分别.因此,现在我们可以将 x
10<1<2
.
现在,结合性没有发挥作用,因为现在我们有 2 个要计算的表达式,10<1
或 1<2
并且因为运算符的优先级是一样的,它们是从左到右计算的?
以最后一个例子作为我的论点:
int myval = ( printf("Operator
"), printf("Precedence
"), printf("vs
"),printf("求值顺序
"));
现在在上面的例子中,由于 comma
运算符具有相同的优先级,表达式被评估 left-to-right
和最后一个 的返回值printf()
存储在 myval
中.
在 SO/IEC 9899:201x 在 J.1 未指定行为 下,它提到:
计算子表达式的顺序和副作用的顺序发生,除非为函数调用 ()、&&、||、?: 和逗号指定运算符 (6.5).
现在我想知道,这样说会不会错:
求值顺序取决于运算符的优先级,留下未指定行为的情况.
如果我在问题中所说的有任何错误,我希望得到纠正.我发布这个问题的原因是因为 MSDN 文章在我脑海中造成了混乱.是不是错误?
是的,MSDN 文章有错误,至少在标准 C 和 C++1 方面是错误的.
话虽如此,让我先从有关术语的注释开始:在 C++ 标准中,它们(主要是——有一些失误)使用评估"来指代对操作数进行评估,而值计算"指进行手术.因此,当(例如)您执行 a + b
时,会评估 a
和 b
中的每一个,然后执行值计算以确定结果.
很明显,值计算的顺序(主要)由优先级和关联性控制——控制值计算基本上是对优先级和关联性的定义是.这个答案的其余部分使用评估"来指代操作数的评估,而不是值计算.
现在,对于由优先级确定的评估顺序,不,不是!就这么简单.举个例子,让我们考虑一下 x<y<z
的例子.根据关联规则,这解析为 (x
push(z);//评估它的参数并将值压入堆栈推(y);推(x);test_less();//将 TOS 与 TOS(1) 进行比较,将结果压入堆栈test_less();
这在 x
或 y
之前计算 z
,但仍然计算 (x
z
比较的结果,正如它应该的那样.
总结:求值顺序与结合性无关.
优先级是一样的.我们可以将表达式更改为 x*y+z
,并且仍然在 x
或 y
之前计算 z
:
push(z);推(y);推(x);多();添加();
总结:评估顺序与优先级无关.
当/如果我们添加副作用,这保持不变.我认为将副作用视为由单独的执行线程执行,在下一个序列点(例如,表达式的末尾)使用 join
是很有教育意义的.所以像 a=b++ + ++c;
可以像这样执行:
push(a);推(b);推(c+1);side_effects_thread.queue(inc, b);side_effects_thread.queue(inc, c);添加();分配();加入(side_effects_thread);
这也说明了为什么明显的依赖性也不一定影响评估顺序.即使 a
是赋值的目标,这仍然会计算 a
before 计算 b
或 c
.另请注意,虽然我在上面将其写为线程",但这也可以是一个线程池,所有线程都并行执行,因此您无法保证顺序一个增量与另一个增量.
除非硬件直接(并且廉价)支持线程安全队列,否则这可能不会在实际实现中使用(即使这样也不太可能).将某些内容放入线程安全队列通常比执行单个增量具有更多的开销,因此很难想象现实中有人会这样做.然而,从概念上讲,这个想法符合标准的要求:当您使用前/后自增/自减操作时,您指定的操作将在计算该部分表达式之后的某个时间发生,并将在下一个序列点.
虽然它不完全是线程,但某些架构确实允许这种并行执行.举几个例子,Intel Itanium 和 VLIW 处理器(例如某些 DSP)允许编译器指定要并行执行的多个指令.大多数 VLIW 机器都有特定的指令数据包"大小,用于限制并行执行的指令数量.Itanium 也使用指令包,但在指令包中指定了一个位,表示当前包中的指令可以与下一个包中的指令并行执行.使用这样的机制,您可以获得并行执行的指令,就像您在我们大多数人更熟悉的架构上使用多线程一样.
总结:求值顺序与明显的依赖无关
在下一个序列点之前使用该值的任何尝试都会产生未定义的行为——特别是,另一个线程"在那段时间内(可能)修改该数据,而您没有与其他线程同步访问.任何使用它的尝试都会导致未定义的行为.
举一个(不可否认,现在相当牵强)的例子,想想你的代码在 64 位虚拟机上运行,但真正的硬件是一个 8 位处理器.当你增加一个 64 位变量时,它会执行一个类似如下的序列:
加载变量[0]增量存储变量[0]for (int i=1; i<8; i++) {负载变量[i]add_with_carry 0存储变量[i]}
如果您读取该序列中间某处的值,您可以获得一些只修改了一些字节的内容,因此您得到的既不是旧值也不是新值.
这个确切的例子可能相当牵强,但不那么极端的版本(例如,32 位机器上的 64 位变量)实际上相当普遍.
结论
求值顺序不取决于优先级、关联性或(必然)取决于明显的依赖关系.尝试使用在表达式的任何其他部分应用了前/后增量/减量的变量确实会导致完全未定义的行为.虽然不太可能发生实际崩溃,但您绝对不能保证获得旧值或新值——您完全可以得到其他值.
<小时>1 我还没有检查过这篇特定的文章,但相当多的 MSDN 文章讨论了 Microsoft 的 Managed C++ 和/或 C++/CLI(或特定于它们的 C++ 实现),但做的很少或者没有指出它们不适用于标准 C 或 C++.这可能给人一种虚假的印象,即他们声称他们决定应用于自己语言的规则实际上适用于标准语言.在这些情况下,这些文章在技术上并不是错误的——它们只是与标准 C 或 C++ 没有任何关系.如果您尝试将这些语句应用于标准 C 或 C++,结果是错误的.
The terms 'operator precedence' and 'order of evaluation' are very commonly used terms in programming and extremely important for a programmer to know. And, as far as I understand them, the two concepts are tightly bound; one cannot do without the other when talking about expressions.
Let us take a simple example:
int a=1; // Line 1
a = a++ + ++a; // Line 2
printf("%d",a); // Line 3
Now, it is evident that Line 2
leads to Undefined Behavior, since Sequence points in C and C++ include:
Between evaluation of the left and right operands of the && (logical AND), || (logical OR), and comma operators. For example, in the expression
*p++ != 0 && *q++ != 0
, all side effects of the sub-expression*p++ != 0
are completed before any attempt to accessq
.Between the evaluation of the first operand of the ternary "question-mark" operator and the second or third operand. For example, in the expression
a = (*p++) ? (*p++) : 0
there is a sequence point after the first*p++
, meaning it has already been incremented by the time the second instance is executed.At the end of a full expression. This category includes expression statements (such as the assignment
a=b;
), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.Before a function is entered in a function call. The order in which the arguments are evaluated is not specified, but this sequence point means that all of their side effects are complete before the function is entered. In the expression
f(i++) + g(j++) + h(k++)
,f
is called with a parameter of the original value ofi
, buti
is incremented before entering the body off
. Similarly,j
andk
are updated before enteringg
andh
respectively. However, it is not specified in which orderf()
,g()
,h()
are executed, nor in which orderi
,j
,k
are incremented. The values ofj
andk
in the body off
are therefore undefined.3 Note that a function callf(a,b,c)
is not a use of the comma operator and the order of evaluation fora
,b
, andc
is unspecified.At a function return, after the return value is copied into the calling context. (This sequence point is only specified in the C++ standard; it is present only implicitly in C.)
At the end of an initializer; for example, after the evaluation of 5 in the declaration
int a = 5;
.
Thus, going by Point # 3:
At the end of a full expression. This category includes expression statements (such as the assignment a=b;), return statements, the controlling expressions of if, switch, while, or do-while statements, and all three expressions in a for statement.
Line 2
clearly leads to Undefined Behavior. This shows how Undefined Behaviour is tightly coupled with Sequence Points.
Now let us take another example:
int x=10,y=1,z=2; // Line 4
int result = x<y<z; // Line 5
Now its evident that Line 5
will make the variable result
store 1
.
Now the expression x<y<z
in Line 5
can be evaluated as either:
x<(y<z)
or (x<y)<z
. In the first case the value of result
will be 0
and in the second case result
will be 1
. But we know, when the Operator Precedence
is Equal/Same
- Associativity
comes into play, hence, is evaluated as (x<y)<z
.
This is what is said in this MSDN Article:
The precedence and associativity of C operators affect the grouping and evaluation of operands in expressions. An operator's precedence is meaningful only if other operators with higher or lower precedence are present. Expressions with higher-precedence operators are evaluated first. Precedence can also be described by the word "binding." Operators with a higher precedence are said to have tighter binding.
Now, about the above article:
It mentions "Expressions with higher-precedence operators are evaluated first."
It may sound incorrect. But, I think the article is not saying something wrong if we consider that ()
is also an operator x<y<z
is same as (x<y)<z
. My reasoning is if associativity does not come into play, then the complete expressions evaluation would become ambiguous since <
is not a Sequence Point.
Also, another link I found says this on Operator Precedence and Associativity:
This page lists C operators in order of precedence (highest to lowest). Their associativity indicates in what order operators of equal precedence in an expression are applied.
So taking, the second example of int result=x<y<z
, we can see here that there are in all 3 expressions, x
, y
and z
, since, the simplest form of an expression consists of a single literal constant or object. Hence the result of the expressions x
, y
, z
would be there rvalues, i.e., 10
, 1
and 2
respectively. Hence, now we may interpret x<y<z
as 10<1<2
.
Now, doesn't Associativity come into play since now we have 2 expressions to be evaluated, either 10<1
or 1<2
and since the precedence of operator is same, they are evaluated from left to right?
Taking this last example as my argument:
int myval = ( printf("Operator
"), printf("Precedence
"), printf("vs
"),
printf("Order of Evaluation
") );
Now in the above example, since the comma
operator has same precedence, the expressions are evaluated left-to-right
and the return value of the last printf()
is stored in myval
.
In SO/IEC 9899:201x under J.1 Unspecified behavior it mentions:
The order in which subexpressions are evaluated and the order in which side effects take place, except as specified for the function-call (), &&, ||, ?:, and comma operators (6.5).
Now I would like to know, would it be wrong to say:
Order of Evaluation depends on the precedence of operators, leaving cases of Unspecified Behavior.
I would like to be corrected if any mistakes were made in something I said in my question. The reason I posted this question is because of the confusion created in my mind by the MSDN Article. Is it in Error or not?
Yes, the MSDN article is in error, at least with respect to standard C and C++1.
Having said that, let me start with a note about terminology: in the C++ standard, they (mostly--there are a few slip-ups) use "evaluation" to refer to evaluating an operand, and "value computation" to refer to carrying out an operation. So, when (for example) you do a + b
, each of a
and b
is evaluated, then the value computation is carried out to determine the result.
It's clear that the order of value computations is (mostly) controlled by precedence and associativity--controlling value computations is basically the definition of what precedence and associativity are. The remainder of this answer uses "evaluation" to refer to evaluation of operands, not to value computations.
Now, as to evaluation order being determined by precedence, no it's not! It's as simple as that. Just for example, let's consider your example of x<y<z
. According to the associativity rules, this parses as (x<y)<z
. Now, consider evaluating this expression on a stack machine. It's perfectly allowable for it to do something like this:
push(z); // Evaluates its argument and pushes value on stack
push(y);
push(x);
test_less(); // compares TOS to TOS(1), pushes result on stack
test_less();
This evaluates z
before x
or y
, but still evaluates (x<y)
, then compares the result of that comparison to z
, just as it's supposed to.
Summary: Order of evaluation is independent of associativity.
Precedence is the same way. We can change the expression to x*y+z
, and still evaluate z
before x
or y
:
push(z);
push(y);
push(x);
mul();
add();
Summary: Order of evaluation is independent of precedence.
When/if we add in side effects, this remains the same. I think it's educational to think of side effects as being carried out by a separate thread of execution, with a join
at the next sequence point (e.g., the end of the expression). So something like a=b++ + ++c;
could be executed something like this:
push(a);
push(b);
push(c+1);
side_effects_thread.queue(inc, b);
side_effects_thread.queue(inc, c);
add();
assign();
join(side_effects_thread);
This also shows why an apparent dependency doesn't necessarily affect order of evaluation either. Even though a
is the target of the assignment, this still evaluates a
before evaluating either b
or c
. Also note that although I've written it as "thread" above, this could also just as well be a pool of threads, all executing in parallel, so you don't get any guarantee about the order of one increment versus another either.
Unless the hardware had direct (and cheap) support for thread-safe queuing, this probably wouldn't be used in in a real implementation (and even then it's not very likely). Putting something into a thread-safe queue will normally have quite a bit more overhead than doing a single increment, so it's hard to imagine anybody ever doing this in reality. Conceptually, however, the idea is fits the requirements of the standard: when you use a pre/post increment/decrement operation, you're specifying an operation that will happen sometime after that part of the expression is evaluated, and will be complete at the next sequence point.
Edit: though it's not exactly threading, some architectures do allow such parallel execution. For a couple of examples, the Intel Itanium and VLIW processors such as some DSPs, allow a compiler to designate a number of instructions to be executed in parallel. Most VLIW machines have a specific instruction "packet" size that limits the number of instructions executed in parallel. The Itanium also uses packets of instructions, but designates a bit in an instruction packet to say that the instructions in the current packet can be executed in parallel with those in the next packet. Using mechanisms like this, you get instructions executing in parallel, just like if you used multiple threads on architectures with which most of us are more familiar.
Summary: Order of evaluation is independent of apparent dependencies
Any attempt at using the value before the next sequence point gives undefined behavior -- in particular, the "other thread" is (potentially) modifying that data during that time, and you have no way of synchronizing access with the other thread. Any attempt at using it leads to undefined behavior.
Just for a (admittedly, now rather far-fetched) example, think of your code running on a 64-bit virtual machine, but the real hardware is an 8-bit processor. When you increment a 64-bit variable, it executes a sequence something like:
load variable[0]
increment
store variable[0]
for (int i=1; i<8; i++) {
load variable[i]
add_with_carry 0
store variable[i]
}
If you read the value somewhere in the middle of that sequence, you could get something with only some of the bytes modified, so what you get is neither the old value nor the new one.
This exact example may be pretty far-fetched, but a less extreme version (e.g., a 64-bit variable on a 32-bit machine) is actually fairly common.
Conclusion
Order of evaluation does not depend on precedence, associativity, or (necessarily) on apparent dependencies. Attempting to use a variable to which a pre/post increment/decrement has been applied in any other part of an expression really does give completely undefined behavior. While an actual crash is unlikely, you're definitely not guaranteed to get either the old value or the new one -- you could get something else entirely.
1 I haven't checked this particular article, but quite a few MSDN articles talk about Microsoft's Managed C++ and/or C++/CLI (or are specific to their implementation of C++) but do little or nothing to point out that they don't apply to standard C or C++. This can give the false appearance that they're claiming the rules they have decided to apply to their own languages actually apply to the standard languages. In these cases, the articles aren't technically false -- they just don't have anything to do with standard C or C++. If you attempt to apply those statements to standard C or C++, the result is false.
这篇关于运算符优先级与计算顺序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!