问题描述
我尝试将 UTF8 字符串转换为 Java Unicode 字符串.
I try to convert a UTF8 string to a Java Unicode string.
String question = request.getParameter("searchWord");
byte[] bytes = question.getBytes();
question = new String(bytes, "UTF-8");
输入的是汉字,当我比较每个字符的十六进制代码时,它是相同的汉字.所以我很确定字符集是UTF8.
The input are Chinese Characters and when I compare the hex code of each caracter it is the same Chinses character. So I'm pretty sure that the charset is UTF8.
我哪里出错了?
推荐答案
Java 中没有UTF-8 字符串"这样的东西.一切都在 Unicode 中.
There's no such thing as a "UTF-8 string" in Java. Everything is in Unicode.
当您在不指定编码的情况下调用 String.getBytes()
时,会使用平台默认编码 - 这几乎总是一个坏主意.
When you call String.getBytes()
without specifying an encoding, that uses the platform default encoding - that's almost always a bad idea.
您不应该做任何事情来获得正确的字符 - 请求应该为您处理这一切.如果它没有这样做,那么它很可能已经丢失了数据.
You shouldn't have to do anything to get the right characters here - the request should be handling it all for you. If it's not doing so, then chances are it's lost data already.
你能举一个例子来说明实际出了什么问题吗?在您接收的字符串中指定 characters 的 Unicode 值(例如,通过使用 toCharArray()
然后将每个 char
转换为 int
) 以及您期望收到的内容.
Could you give an example of what's actually going wrong? Specify the Unicode values of the characters in the string you're receiving (e.g. by using toCharArray()
and then converting each char
to an int
) and what you expected to receive.
要对此进行诊断,请使用以下内容:
To diagnose this, use something like this:
public static void dumpString(String text) {
for (int i = 0; i < text.length(); i++) {
System.out.println(i + ": " + (int) text.charAt(i));
}
}
请注意,这将给出每个 Unicode 字符的 十进制 值.如果你有一个方便的十六进制库方法,你可能想用它来给你十六进制值.主要的一点是它会转储字符串中的 Unicode 字符.
Note that that will give the decimal value of each Unicode character. If you have a handy hex library method around, you may want to use that to give you the hex value. The main point is that it will dump the Unicode characters in the string.
这篇关于如何将 UTF8 转换为 Unicode的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!