将带有错误编码数据的 MySQL 表转换为 UTF-8

本文介绍了将带有错误编码数据的 MySQL 表转换为 UTF-8的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个很大的 MySQL 5.1 数据库，出于各种愚蠢的原因，我相信，我一直在 UTF8 表中存储编码为 LATIN1 的 UTF8 字符.真奇怪.我想修复它.

I've got a big ol' MySQL 5.1 database, and for a variety of stupid reasons, I've been storing, I believe, UTF8 characters encoded as LATIN1 in a UTF8 table. It's... strange. And I'd like to fix it.

MySQL - 在 UTF8 上转换 latin1 字符table into UTF8 问题似乎有效——一次一列.但是我有 24 个表和几十个列要转换.我真的在寻找一种解决方案，可以一次转换至少一个表格.

The MySQL - Convert latin1 characters on a UTF8 table into UTF8 question seems to work -- a column at a time. But I have 24 tables and dozens of columns to convert. I'm really looking for a solution that'll convert at least a table at once.

作为参考，对我有用的单列解决方案是:

For reference, the single-column solution that works for me is:

UPDATE foo SET col1 = CONVERT(CAST(CONVERT(col1 USING latin1) AS binary) USING utf8);

对于表格，我可以:

ALTER TABLE foo CONVERT TO CHARACTER SET latin1;
ALTER TABLE foo CONVERT TO CHARACTER SET binary;
ALTER TABLE foo CHARACTER SET utf8  COLLATE utf8_unicode_ci;

这让我非常接近——但是，CONVERT TO CHARACTER SET binary 步骤一举将我所有的 VARCHAR 列转换为 VARBINARY，将我的 TEXT 列转换为 BLOB.我可以通过并将它们改回来，一切似乎都很好......但后来我又回到了让我们单独修改所有列"的世界——在这种情况下，我也可以

which gets me very close -- however, the CONVERT TO CHARACTER SET binary step turns all my VARCHAR columns into VARBINARY and my TEXT columns into BLOBs in one fell swoop. I can go through and change them back and all appears to be well... but then I'm back in the "let's modify all the columns individually" world -- in which case, I may just as well

我对这些 SQL 语句尝试了大约 50 种变体，但我找不到一种既将我的列保留为字符数据类型又正确编码数据的变体.

I've tried about 50 variations on those SQL statements, but I can't find one that both leaves my columns in character data types and encodes the data properly.

有什么建议吗?

更新:决定只修复列而不是等待数据库或表格解决方案，我想出了:

Update: Deciding to just fix the columns rather than waiting for a database- or table-wise solution, I came up with:

#!/usr/bin/env ruby
require 'rubygems'
require 'mysql2'

CONNECT_OPTS = {} # whatever you want
Mysql2::Client.default_query_options.merge!(:as => :array)
conn = Mysql2::Client.new(CONNECT_OPTS)

tables = conn.query("SHOW TABLES").map {|row| row[0] }

# See http://dev.mysql.com/doc/refman/5.0/en/charset-column.html
# One might want to include enum and set columns; I don't have them
TYPES_TO_CONVERT = %w(char varchar text)
tables.each do |table|
  puts "converting #{table}"
  # Get all the columns and we'll filter for the ones we want
  columns = conn.query("DESCRIBE #{table}")
  columns_to_convert = columns.find_all {|row|
    TYPES_TO_CONVERT.include? row[1].gsub(/(d+)/, '')
  }.map {|row| row[0]}
  next if columns_to_convert.empty?

  query = "UPDATE `#{table}` SET "
  query += columns_to_convert.map {|col|
    "`#{col}` = convert(cast(convert(`#{col}` using latin1) as binary) using utf8)"
  }.join ", "
  puts query
  conn.query query
end

... 完成工作.有趣的是，这在我的数据库上运行了 36 秒，而不是花费 13 分钟的 ALTER TABLE 路由(并且有 VARBINARY 问题)或 mysqldump 解决方案，假设我可以让它们运行，则需要超过 20 分钟.

... which gets the job done. Amusingly, this runs on my database in 36 seconds, rather than the ALTER TABLE route which took 13 minutes (and had the VARBINARY problem) or the mysqldump solutions which would take upwards of twenty assuming I could get them to run.

如果有人知道一种优雅的方法可以一步完成整个数据库或表的操作，我仍然会接受答案.

I'll still accept an answer if someone knows an elegant way to do this for a whole database or table in one step.

将带有错误编码数据的 MySQL 表转换为 UTF-8

问题描述

推荐答案

相关文档推荐