classification
Title: cannot decode from or encode to big5 \xf9\xd8
Type: behavior Stage:
Components: Unicode Versions: Python 2.6
process
Status: open Resolution:
Dependencies: Superseder:
Assigned To: hyeshik.chang Nosy List: Xuefer.x, hyeshik.chang, loewis, rpetrov
Priority: normal Keywords:

Created on 2010-02-05 05:07 by Xuefer.x, last changed 2010-02-05 21:54 by rpetrov.

Messages (6)
msg98865 - (view) Author: Xuefer x (Xuefer.x) Date: 2010-02-05 05:07
using iconv:
$ printf "\xf9\xd8" | iconv -f big5 -t utf-8 | xxd
0000000: e8a3 8f                                  ...
$ printf "\xe8\xa3\x8f" | iconv -f utf-8 -t big5 | xxd
0000000: f9d8                                     ..

using python
>>> print "\xf9\xd8".decode("big5")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'big5' codec can't decode bytes in position 0-1: illegal multibyte sequence
>>> print "\xe8\xa3\x8f".decode("utf-8").encode("big5")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'big5' codec can't encode character u'\u88cf' in position 0: illegal multibyte sequence
msg98866 - (view) Author: Martin v. Löwis (loewis) Date: 2010-02-05 05:16
That iconv supports it is not convincing, IMO. Do you have other sources (like tables in the web somewhere) that support your request?
msg98867 - (view) Author: Martin v. Löwis (loewis) Date: 2010-02-05 05:41
In particular, the Unicode consortium mapping table, now at

http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT

doesn't map f9d8 to anything; the current version of that table (in unihan.zip) has these mappings for U+88CF:

U+88CF	kCCCII	232E61
U+88CF	kCNS1986	E-444E
U+88CF	kCNS1992	3-444E
U+88CF	kEACC	215763
U+88CF	kGB1	3279
U+88CF	kHKSCS	F9D8
U+88CF	kJis0	4602
U+88CF	kKPS0	D9E0
U+88CF	kKSC0	5574
U+88CF	kTaiwanTelegraph	5937
U+88CF	kXerox	241:102

As you can see, it isn't supported in big5.
msg98868 - (view) Author: Xuefer x (Xuefer.x) Date: 2010-02-05 06:05
sure after enlighten by your url which is OBSOLETE
see: http://www.unicode.org/Public/MAPPINGS/EASTASIA/ReadMe.txt
i found http://unicode.org/charts/unihan.html
then http://www.unicode.org/Public/UNIDATA/
then http://www.unicode.org/Public/UNIDATA/Unihan.zip
in side the zip, open Unihan_OtherMappings.txt
big 5 includes
#	kBigFive
#	kHKSCS
which are listed in Unihan_OtherMappings.txt
HKSCS is one of the big-5 encoding
and i search for F9D8 got
U+88CF	kHKSCS	F9D8

you may also want to update other encoding map table to catch up with Unihan_OtherMappings.txt

thanks for your quick reply btw
msg98869 - (view) Author: Martin v. Löwis (loewis) Date: 2010-02-05 06:23
perky, what do you think?
msg98911 - (view) Author: Roumen Petrov (rpetrov) Date: 2010-02-05 21:54
> That iconv supports it is not convincing, ...

GNU libc is not convincing . What you talking about ?
History
Date User Action Args
2010-02-05 21:54:39rpetrovsetnosy: + rpetrov
messages: + msg98911
2010-02-05 06:23:02loewissetassignee: hyeshik.chang

messages: + msg98869
nosy: + hyeshik.chang
2010-02-05 06:05:10Xuefer.xsetmessages: + msg98868
2010-02-05 05:41:48loewissetmessages: + msg98867
2010-02-05 05:16:15loewissetnosy: + loewis
messages: + msg98866
2010-02-05 05:07:48Xuefer.xcreate