世界線航跡蔵

Mad web programmerのYuguiが技術ネタや日々のあれこれをお送りします。

2009年12月05日

Don't use String#force_encoding

I can find many Ruby codes which use String#force_encoding. But most of them are wrong. You should not use the method.

Ruby 1.9 Era

In this year, the first release of Ruby 1.9 series was shipped. And I will soon release Ruby 1.9.1-p376. 2009 was the year of Ruby 1.9.

In the next year, Ruby 1.9.2 will be released. It will be completely compatible with Rails 3. It also completely pass to RubySpec and JRuby will soon get compatible with Ruby 1.9.2. You should start porting your codes to Ruby 1.9 right now. Ruby 1.9 is a so good language.

Encoding

When you port your Ruby code to Ruby 1.9, the largest problem is the character encoding problem. Ruby 1.9 treats a string as a sequence of characters rather than a byte sequence. JEG2 explained this topic in his articles "Understanding M17N".

encode and force_encoding

There are three methods, String#encode, #encode! and #force_encoding.

For a String, String#encode keeps the characters but changes the encoding in which the characters are encoded. #encode is not destructive but #encode! is the destructive version. The byte representation of a character is depend on encoding. So #encode and #encode! generally change the byte representation of the string.

#force_encoding in contrarily keeps byte representation but changes characters. After #force_encoding, sometimes the string become invalid as a character sequence.

In other words, #encode treats a String object as a character sequence but #force_encoding treats it as a byte sequence.

Abuse of force_encoding

force_encoding is too much used. I think this is because of example codes. The following codes are quoted from the rdoc of Regexp.fixed_encoding.

r.fixed_encoding?                               #=> true
r.encoding                                      #=> #<Encoding:UTF-8>
r =~ "\u{6666} a"                               #=> 0
r =~ "\xa1\xa2".force_encoding("euc-jp")        #=> ArgumentError
r =~ "abc".force_encoding("euc-jp")             #=> nil

M17N example code need to create string with various encodings in order to show how M17N works. So that kind of codes tend to use force_encoding much. In addition, example codes sometimes cannot assume its source encoding. Particularly, when the code is printed in a book, a string literal cannot have any encoding. So book author sometimes needs to use force_encoding.

On the other hand, general application codes can have their right source encoding with the magic comment. In most cases, the codes must concern with characters rather than their byte representation. Ruby 1.9 was designed so that you can write M17N applications only with #encode and magic comments. You don't need force_encoding.

force_encoding is for middleware authors. For example, net/http, rack, rails or PostgreSQL adapter. These kind of libraries must accept byte sequence from an external stream and reinterpret the sequence into a string with an encoding. So they need to use force_encoding. And they have the responsibility to provide a string with a correct encoding to library users. If you need force_encoding in your application, it must be a bug of a middleware.

Conclusion

What you need are

  • magic comments,
  • String#encode.

You don't need to use String#force_encoding in your application.

トラックバック

http://yugui.jp/articles/850/ping

現在のところトラックバックはありません

コメント

blog comments powered by Disqus

ご案内

前の記事
次の記事

タグ一覧

過去ログ

  1. 2016年07月
  2. 2016年01月
  3. 2015年09月
  4. 2015年08月
  5. 過去ログ一覧

フィード

フィードとは

その他

Powered by "rhianolethe" the blog system