问题描述
我正在寻求一些帮助以了解字符集的工作原理.这个问题是 Anything wrong with using windows-1252 而不是 UTF-8
I'm looking for a little help in understanding how charsets work. This question is a continuation from Anything wrong with using windows-1252 instead of UTF-8
我有一个测试 ColdFusion 站点,使用...
I have a test ColdFusion site using...
<CFHEADER NAME="Content-Type" value="text/html; charset=windows-1252">
<meta http-equiv="Content-Type" content="text/html; charset=windows-1252" />
以及使用...的测试 Oracle 数据库
and a test Oracle DB using...
NLS_CHARACTERSET: WE8MSWIN1252
NLS_NCHAR_CHARACTERSET: AL16UTF16
根据 windows-1252 字符集,没有平方根符号 (alt+251): √ 但我可以将其键入网页表单的字段中,将其保存到数据库,查询并显示在屏幕上再次就好了.当它在数据库中时,它被存储为:√
.如果它甚至不是字符集的一部分,我该如何输入、存储、查询和显示它?根据字符集,十进制 251 是这样的:Hex:FB |û |00FB |带圆圈的拉丁文小写字母 U
According to the windows-1252 charset there is no square root symbol (alt+251): √ But I can type that into a field on a webpage form, save it to the DB, query it and show it on the screen again just fine. When it's in the DB it's stored as: √
. How can I enter that, store it, query and show it if it's not even part of the charset? According to the charset, decimal 251 is this: Hex:FB | û | 00FB | LATIN SMALL LETTER U WITH CIRCUMFLEX
推荐答案
您并没有真正使用页面和数据库字符集之外的字符.
You're not really using characters outside of the page and database's charset.
因为页面是 windows-1252 编码的,如果你在表单域中输入 Alt+251 然后发布数据,浏览器会说:
Because the page is windows-1252 encoded, if you enter Alt+251 into a form field and then post the data, the browser says:
"Hey this char is not apart of windows-1252 and I need to only send back data
which is in windows-1252, so I will do the best I can and send back the
html character code of char √ -- oh well, I wish I could send back
1 character, since I cannot I will send back 7."
如果您注意到,这是 windows-1252 字符集中的 7 个不同字符.
And if you notice, this is 7 different characters which are in the windows-1252 charset.
如果页面是用多字节字符集编码的,浏览器会发回一些被认为是 1 个字符的东西.
Had the page been encoded with a multibyte charset, the browser would send back something which is considered 1 character.
那么怎么查询呢?
select * from tab where field like '%√%'
您所拥有的是平方根符号的 html 字符:https://www.google.com/#q=html+字符+代码
What you have is the html character of the square root symbol: https://www.google.com/#q=html+character+codes
这是一篇很好的文章,解释了正在发生的事情:http://htmlpurifier.org/docs/enduser-utf8.html
Here is a very good article explaining what is happening: http://htmlpurifier.org/docs/enduser-utf8.html
"...once you start adding characters outside of your encoding...
[the browser might] replace the character with a character entity reference...."
此外,当您在 Windows 机器上输入 Alt+251 时,它会插入平方根符号,在 Unicode 中为 U-221A.
Also when you enter Alt+251 on a windows machine, it inserts the square root symbol which in Unicode it is U-221A.
按 Alt+251 就像键盘宏插入 Unicode 它是 U-221A.
Pressing Alt+251 is just a like a keyboard macro to insert Unicode it is U-221A.
这篇关于为什么我可以使用不属于字符集 (windows-1252) 的字符?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!