问题描述
我用下面的代码把汉字存成.txt文件,但是用写字板打开的时候看不懂.
I use the following code to save Chinese characters into a .txt file, but when I opened it with Wordpad, I couldn't read it.
StringBuffer Shanghai_StrBuf = new StringBuffer("u4E0Au6D77");
boolean Append = true;
FileOutputStream fos;
fos = new FileOutputStream(FileName, Append);
for (int i = 0;i < Shanghai_StrBuf.length(); i++) {
fos.write(Shanghai_StrBuf.charAt(i));
}
fos.close();
我能做什么?我知道如果我将汉字剪切并粘贴到写字板中,我可以将其保存到 .txt 文件中.我如何在 Java 中做到这一点?
What can I do ? I know if I cut and paste Chinese characters into Wordpad, I can save it into a .txt file. How do I do that in Java ?
推荐答案
这里有几个因素在起作用:
There are several factors at work here:
- 文本文件没有用于描述其编码的内在元数据(对于所有关于尖括号税的讨论,XML 很受欢迎是有原因的)
- Windows 的默认编码仍然是 8 位(或双字节)"ANSI" 字符集的值范围有限 - 以这种格式编写的文本文件不可移植
- 要区分 Unicode 文件和 ANSI 文件,Windows 应用程序依赖于 字节的存在文件开头的顺序标记(不是完全正确 - Raymond Chen 解释了).理论上,BOM 会告诉您数据的endianess(字节顺序).对于 UTF-8,即使只有一个字节顺序,Windows 应用程序也依赖标记字节来自动确定它是 Unicode(尽管您会注意到记事本在其打开/保存对话框中有一个编码选项).
- 说Java坏了是错误的,因为它不会自动编写UTF-8 BOM.例如,在 Unix 系统上,将 BOM 写入脚本文件是错误的,并且许多 Unix 系统使用 UTF-8 作为其默认编码.有时您也不希望在 Windows 上使用它,例如当您将数据附加到现有文件时:
fos = new FileOutputStream(FileName,Append);
- Text files have no intrinsic metadata for describing their encoding (for all the talk of angle-bracket taxes, there are reasons XML is popular)
- The default encoding for Windows is still an 8bit (or doublebyte) "ANSI" character set with a limited range of values - text files written in this format are not portable
- To tell a Unicode file from an ANSI file, Windows apps rely on the presence of a byte order mark at the start of the file (not strictly true - Raymond Chen explains). In theory, the BOM is there to tell you the endianess (byte order) of the data. For UTF-8, even though there is only one byte order, Windows apps rely on the marker bytes to automatically figure out that it is Unicode (though you'll note that Notepad has an encoding option on its open/save dialogs).
- It is wrong to say that Java is broken because it does not write a UTF-8 BOM automatically. On Unix systems, it would be an error to write a BOM to a script file, for example, and many Unix systems use UTF-8 as their default encoding. There are times when you don't want it on Windows, either, like when you're appending data to an existing file:
fos = new FileOutputStream(FileName,Append);
这是一种可靠地将 UTF-8 数据附加到文件的方法:
Here is a method of reliably appending UTF-8 data to a file:
private static void writeUtf8ToFile(File file, boolean append, String data)
throws IOException {
boolean skipBOM = append && file.isFile() && (file.length() > 0);
Closer res = new Closer();
try {
OutputStream out = res.using(new FileOutputStream(file, append));
Writer writer = res.using(new OutputStreamWriter(out, Charset
.forName("UTF-8")));
if (!skipBOM) {
writer.write('uFEFF');
}
writer.write(data);
} finally {
res.close();
}
}
用法:
public static void main(String[] args) throws IOException {
String chinese = "u4E0Au6D77";
boolean append = true;
writeUtf8ToFile(new File("chinese.txt"), append, chinese);
}
注意:如果文件已经存在并且您选择附加并且现有数据不是 UTF-8 编码,那么代码只会造成一团糟.
Note: if the file already existed and you chose to append and existing data wasn't UTF-8 encoded, the only thing that code will create is a mess.
这是此代码中使用的 Closer
类型:
Here is the Closer
type used in this code:
public class Closer implements Closeable {
private Closeable closeable;
public <T extends Closeable> T using(T t) {
closeable = t;
return t;
}
@Override public void close() throws IOException {
if (closeable != null) {
closeable.close();
}
}
}
此代码对如何根据字节顺序标记读取文件进行了 Windows 风格的最佳猜测:
This code makes a Windows-style best guess about how to read the file based on byte order marks:
private static final Charset[] UTF_ENCODINGS = { Charset.forName("UTF-8"),
Charset.forName("UTF-16LE"), Charset.forName("UTF-16BE") };
private static Charset getEncoding(InputStream in) throws IOException {
charsetLoop: for (Charset encodings : UTF_ENCODINGS) {
byte[] bom = "uFEFF".getBytes(encodings);
in.mark(bom.length);
for (byte b : bom) {
if ((0xFF & b) != in.read()) {
in.reset();
continue charsetLoop;
}
}
return encodings;
}
return Charset.defaultCharset();
}
private static String readText(File file) throws IOException {
Closer res = new Closer();
try {
InputStream in = res.using(new FileInputStream(file));
InputStream bin = res.using(new BufferedInputStream(in));
Reader reader = res.using(new InputStreamReader(bin, getEncoding(bin)));
StringBuilder out = new StringBuilder();
for (int ch = reader.read(); ch != -1; ch = reader.read())
out.append((char) ch);
return out.toString();
} finally {
res.close();
}
}
用法:
public static void main(String[] args) throws IOException {
System.out.println(readText(new File("chinese.txt")));
}
(System.out 使用默认编码,因此它是否打印任何合理的内容取决于您的平台和 配置.)
(System.out uses the default encoding, so whether it prints anything sensible depends on your platform and configuration.)
这篇关于如何用java将汉字保存到文件中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持编程学习网!