What COLLATE should i set to use all kind of possible languages?

问题: I have a column called username, i want the user to be able to insert text in japanese, roman, arabic, korean, and everything that is possible, including special chars [htt...

问题:

I have a column called username, i want the user to be able to insert text in japanese, roman, arabic, korean, and everything that is possible, including special chars [https://en.wiktionary.org/wiki/Index:All_languages], what COLLATE should i set on my database and tables?

I'm using utf_general_ci, i'm new so i don't know if this is the best COLLATE for my needs. I need to choose the right COLLATE to avoid sql error, because i will not use preg_replace or a function to replace special chars, i will only use prepared statement to avoid SLQ injection and protect by database.


回答1:

  • First choice (MySQL 8.0): utf8mb4_0900_ai_ci
  • Second choice (as of 5.6): utf8mb4_unicode_520_ci
  • Third choice (5.5+): utf8mb4_unicode_ci
  • Before 5.5, you can't handle all of Chinese, nor Emoji: utf8_unicode_ci

The numbers refer to Unicode standards 9.0, 5.20, and (no number) 4.0.

No collation is good for sorting all languages at the same time. Spanish, German, Turkish, etc, have quirks that are incompatible. The collations above are the 'best' general purpose ones available.

utf8mb4 handles all characters yet specified by Unicode (including Cherokee, Klingon, Cuneiform, Byzantine, etc.)

If Portuguese is the focus:

See https://pt.stackoverflow.com/ and MySQL collation for Portugese .

Study this for 8.0 or this for pre 8.0 to see which utf8/utf8mb4 collation comes closest to sorting Portuguese 'correctly'. Perhaps utf8mb4_danish_ci or utf8mb4_de_pb_0900_ai_ci would be best.

(Else go with the 'choices' listed above.)


回答2:

If you are using MySQL 5.5.3 or higher, I would recommend UTF-8 character encoding utf8mb4_unicode_ci . AFAIK it supports most, if not all languages, and implements the Unicode standard for sorting and comparison. As a second choice, have a look at utf8mb4_general_ci, which may be faster but also less accurate.

See this excellent SO post for (many) more details, or check out the official MySQL doc.

Below 5.5.3, utf8_unicode_ci is your friend.

  • 发表于 2019-01-10 02:04
  • 阅读 ( 187 )
  • 分类:网络文章

条评论

请先 登录 后评论
不写代码的码农
小编

篇文章

作家榜 »

  1. 小编 文章
返回顶部
部分文章转自于网络,若有侵权请联系我们删除