问题描述
我在 c# 中有一个字符串,初始化如下:
I have a a string in c# initialised as follows:
string strVal = "£2000";
但是,每当我写出这个字符串时,都会写下以下内容:
However whenever I write this string out the following is written:
2000 英镑
它不会用美元来做这件事.
It does not do this with dollars.
我用来写出值的示例代码:
An example bit of code I am using to write out the value:
System.IO.File.AppendAllText(HttpContext.Current.Server.MapPath("/logging.txt"), strVal);
我猜这与本地化有关,但如果 c# 字符串只是 unicode,这肯定可以工作吗?
I'm guessing it's something to do with localization but if c# strings are just unicode surely this should just work?
澄清:更多信息,Jon Skeet 的回答是正确的,但是当我对字符串进行 URLEncode 时,我也遇到了问题.有没有办法防止这种情况发生?
CLARIFICATION: Just a bit more info, Jon Skeet's answer is correct, however I also get the issue when I URLEncode the string. Is there a way of preventing this?
所以 URL 编码的字符串如下所示:
So the URL encoded string looks like this:
%c2%a32000"
"%c2%a32000"
%c2 = Â%a3 = 英镑
%c2 = Â %a3 = £
如果我编码为 ASCII,£ 输出为 ?
If I encode as ASCII the £ comes out as ?
还有什么想法吗?
推荐答案
在 HTML 页面和 HTTP 标头中使用的 URL 的默认字符集称为 ISO-8859-1 或 ISO Latin-1.
The default character set of URLs when used in HTML pages and in HTTP headers is called ISO-8859-1 or ISO Latin-1.
它与 UTF-8 不同,也与 ASCII 不同,但它确实适合每个字符一个字节.0 到 127 的范围很像 ASCII,0 到 255 的整个范围与 Unicode 的 0000-00FF 范围相同.
It's not the same as UTF-8, and it's not the same as ASCII, but it does fit into one-byte-per-character. The range 0 to 127 is a lot like ASCII, and the whole range 0 to 255 is the same as the range 0000-00FF of Unicode.
因此,您可以通过将每个字符转换为一个字节来从 C# 字符串生成它,或者您可以使用 Encoding.GetEncoding("iso-8859-1")
获取一个对象来执行为您转换.
So you can generate it from a C# string by casting each character to a byte, or you can use Encoding.GetEncoding("iso-8859-1")
to get an object to do the conversion for you.
(在此字符集中,英磅符号为 163.)
(In this character set, the UK pound symbol is 163.)
背景
RFC 规定未编码的文本必须限于传统的 7-bit US ASCII 范围,以及其他任何内容(加上特殊的 URL 分隔符)都必须进行编码.但它留下了一个问题,即在 8 位范围的上半部分使用什么字符集,使其依赖于 URL 出现的上下文.
The RFC says that unencoded text must be limited to the traditional 7-bit US ASCII range, and anything else (plus the special URL delimiter characters) must be encoded. But it leaves open the question of what character set to use for the upper half of the 8-bit range, making it dependent on the context in which the URL appears.
该上下文是由另外两个标准 HTTP 和 HTML 定义的,它们确实指定了默认字符集,并且它们共同对实现者产生了一种实际上不可抗拒的力量,即假设地址栏包含引用 ISO- 的百分比编码.8859-1.
And that context is defined by two other standards, HTTP and HTML, which do specify the default character set, and which together create a practically irresistable force on implementers to assume that the address bar contains percent-encodings that refer to ISO-8859-1.
ISO-8859-1 是通过 HTTP 发送的基于文本的内容的字符集 除非另有说明.因此,当 URL 字符串出现在 HTTP GET 标头中时,它应该在 ISO-8859-1 中.
ISO-8859-1 is the character set of text-based content sent via HTTP except where otherwise specified. So by the time a URL string appears in the HTTP GET header, it ought to be in ISO-8859-1.
另一个因素是 HTML 也使用 ISO-8859-1 作为其默认值,并且 URL 通常源自 HTML 页面中的链接.因此,当您在记事本中制作一个简单的最小 HTML 页面时,您在该文件中键入的 URL 采用 ISO-8859-1 格式.
The other factor is that HTML also uses ISO-8859-1 as its default, and URLs typically originate as links in HTML pages. So when you craft a simple minimal HTML page in Notepad, the URLs you type into that file are in ISO-8859-1.
它有时被描述为标准中的漏洞",但实际上并非如此;只是 HTML/HTTP 填补了 RFC 为 URL 留下的空白.
It's sometimes described as "hole" in the standards, but it's not really; it's just that HTML/HTTP fill in the blank left by the RFC for URLs.
因此,例如,此页面上的建议:
字符的 URL 编码包括一个%"符号,后跟两位十六进制表示ISO-Latin 的(不区分大小写)字符的代码点.
URL encoding of a character consists of a "%" symbol, followed by the two-digit hexadecimal representation (case-insensitive) of the ISO-Latin code point for the character.
(ISO-Latin 是 IS-8859-1 的另一个名称).
(ISO-Latin is another name for IS-8859-1).
理论就讲这么多.将此粘贴到记事本中,将其保存为 .html 文件,然后在几个浏览器中打开它.点击链接,Google 应该会搜索英镑.
So much for the theory. Paste this into notepad, save it as an .html file, and open it in a few browsers. Click the link and Google should search for UK pound.
<HTML>
<BODY>
<A href="http://www.google.com/search?q=%a3">Test</A>
</BODY>
</HTML>
它适用于 IE、Firefox、Apple Safari、Google Chrome - 我目前没有其他可用的.
It works in IE, Firefox, Apple Safari, Google Chrome - I don't have any others available right now.
这篇关于为什么这会出现在我的 c# 字符串中:的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!