问题描述
假设我用 UTF-8 编码我的文件.
Supposed that im encoding my files with UTF-8.
在 PHP 脚本中,将比较字符串:
Within PHP script, a string will be compared:
$string="ぁ";
$string = utf8_encode($string); //Do i need this step?
if(preg_match('/ぁ/u',$string))
//Do if match...
它是没有 utf8_encode() 函数的 string 真的是 UTF-8 吗?如果你用 UTF-8 编码你的文件不需要这个功能吗?
Its that string really UTF-8 without the utf8_encode() function? If you encode your files with UTF-8 dont need this function?
推荐答案
如果您阅读了utf8_encode<的手册条目/code>,它将 ISO-8859-1 编码的字符串转换为 UTF-8.函数名是一个可怕的用词不当,因为它暗示了某种必要的自动编码.事实并非如此.如果您的源代码保存为 UTF-8 并且您将あ"分配给
$string
,则 $string
保存以 UTF-8 编码的字符あ".无需采取进一步措施.事实上,尝试将 UTF-8 字符串(错误地)从 ISO-8859-1 转换为 UTF-8 会造成乱码.
If you read the manual entry for utf8_encode
, it converts an ISO-8859-1 encoded string to UTF-8. The function name is a horrible misnomer, as it suggests some sort of automagic encoding that is necessary. That is not the case. If your source code is saved as UTF-8 and you assign "あ" to $string
, then $string
holds the character "あ" encoded in UTF-8. No further action is necessary. In fact, trying to convert the UTF-8 string (incorrectly) from ISO-8859-1 to UTF-8 will garble it.
为了详细说明,您的源代码是作为字节序列读取的.PHP 以 ASCII 解释对其重要的内容(所有关键字和运算符等).UTF-8 向后兼容 ASCII.这意味着,在 ASCII 和 UTF-8 中,所有正常"ASCII 字符都使用相同的字节表示.所以一个 "
被 PHP 解释为一个 "
,不管它是应该保存在 ASCII 还是 UTF-8 中.引号之间的任何内容,PHP 都只是简单地将其作为文字位序列.所以 PHP 将您的 "あ"
视为 "11100011 10000001 10000010"
.它不关心引号之间到底是什么,它会按原样使用它.
To elaborate a little more, your source code is read as a byte sequence. PHP interprets the stuff that is important to it (all the keywords and operators and so on) in ASCII. UTF-8 is backwards compatible to ASCII. That means, all the "normal" ASCII characters are represented using the same byte in both ASCII and UTF-8. So a "
is interpreted as a "
by PHP regardless of whether it's supposed to be saved in ASCII or UTF-8. Anything between quotes, PHP simply takes as the literal bit sequence. So PHP sees your "あ"
as "11100011 10000001 10000010"
. It doesn't care what exactly is between the quotes, it'll just use it as-is.
这篇关于utf8_encode 函数用途的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!