Issue7856
Created on 2010-02-05 05:07 by Xuefer.x, last changed 2010-02-05 21:54 by rpetrov.
|
msg98865 - (view) |
Author: Xuefer x (Xuefer.x) |
Date: 2010-02-05 05:07 |
|
using iconv:
$ printf "\xf9\xd8" | iconv -f big5 -t utf-8 | xxd
0000000: e8a3 8f ...
$ printf "\xe8\xa3\x8f" | iconv -f utf-8 -t big5 | xxd
0000000: f9d8 ..
using python
>>> print "\xf9\xd8".decode("big5")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'big5' codec can't decode bytes in position 0-1: illegal multibyte sequence
>>> print "\xe8\xa3\x8f".decode("utf-8").encode("big5")
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'big5' codec can't encode character u'\u88cf' in position 0: illegal multibyte sequence
|
|
msg98866 - (view) |
Author: Martin v. Löwis (loewis) |
Date: 2010-02-05 05:16 |
|
That iconv supports it is not convincing, IMO. Do you have other sources (like tables in the web somewhere) that support your request?
|
|
msg98867 - (view) |
Author: Martin v. Löwis (loewis) |
Date: 2010-02-05 05:41 |
|
In particular, the Unicode consortium mapping table, now at
http://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/OTHER/BIG5.TXT
doesn't map f9d8 to anything; the current version of that table (in unihan.zip) has these mappings for U+88CF:
U+88CF kCCCII 232E61
U+88CF kCNS1986 E-444E
U+88CF kCNS1992 3-444E
U+88CF kEACC 215763
U+88CF kGB1 3279
U+88CF kHKSCS F9D8
U+88CF kJis0 4602
U+88CF kKPS0 D9E0
U+88CF kKSC0 5574
U+88CF kTaiwanTelegraph 5937
U+88CF kXerox 241:102
As you can see, it isn't supported in big5.
|
|
msg98868 - (view) |
Author: Xuefer x (Xuefer.x) |
Date: 2010-02-05 06:05 |
|
sure after enlighten by your url which is OBSOLETE
see: http://www.unicode.org/Public/MAPPINGS/EASTASIA/ReadMe.txt
i found http://unicode.org/charts/unihan.html
then http://www.unicode.org/Public/UNIDATA/
then http://www.unicode.org/Public/UNIDATA/Unihan.zip
in side the zip, open Unihan_OtherMappings.txt
big 5 includes
# kBigFive
# kHKSCS
which are listed in Unihan_OtherMappings.txt
HKSCS is one of the big-5 encoding
and i search for F9D8 got
U+88CF kHKSCS F9D8
you may also want to update other encoding map table to catch up with Unihan_OtherMappings.txt
thanks for your quick reply btw
|
|
msg98869 - (view) |
Author: Martin v. Löwis (loewis) |
Date: 2010-02-05 06:23 |
|
perky, what do you think?
|
|
msg98911 - (view) |
Author: Roumen Petrov (rpetrov) |
Date: 2010-02-05 21:54 |
|
> That iconv supports it is not convincing, ...
GNU libc is not convincing . What you talking about ?
|
|
| Date |
User |
Action |
Args |
| 2010-02-05 21:54:39 | rpetrov | set | nosy:
+ rpetrov messages:
+ msg98911
|
| 2010-02-05 06:23:02 | loewis | set | assignee: hyeshik.chang
messages:
+ msg98869 nosy:
+ hyeshik.chang |
| 2010-02-05 06:05:10 | Xuefer.x | set | messages:
+ msg98868 |
| 2010-02-05 05:41:48 | loewis | set | messages:
+ msg98867 |
| 2010-02-05 05:16:15 | loewis | set | nosy:
+ loewis messages:
+ msg98866
|
| 2010-02-05 05:07:48 | Xuefer.x | create | |
|