Archive for September, 2011

MySQL and character sets

September 5, 2011

Nowadays, it is common place to use UTF-8 when working with text data. Unfortunately, this is not the case when you deal with legacy software or old configuration. MySQL supports a lot of character sets. There are two places where a developer has to choose the character set.

  • Every column of type VARCHAR has an associated character set, which means that data in this column are stored in the specified character set.
  • For every connection, the client can negotiate the character set that wants to send and receive. It can do that by issuing a SET NAMES SQL statement.

MySQL handles conversion between character sets when necessary. It can actually get really messy if you do not use the same character set everywhere. To demonstrate this fact, I have written a Python utility that tries combinations of character sets and operations and produces a nice summary in HTML. You can find the summary here: MySQL – character sets comparison tables.