如月中天: 笔记——可以用 encode('raw_unicode_escape') 处理并非 unicode 编码的 unicode 字符串

2011-06-05

笔记——可以用 encode('raw_unicode_escape') 处理并非 unicode 编码的 unicode 字符串

http://groups.google.com/group/python-cn/browse_thread/thread/a601a5b202e7c65e python抓取中文网页内容是[u'\xbe\xaf\xcc\xe8\xba\xab\xba\xae……

>>> s = u'\xbe\xaf\xcc\xe8'
>>> s.encode('raw_unicode_escape')
'\xbe\xaf\xcc\xe8'
>>> s.encode('raw_unicode_escape').decode('gbk')
u'\u8b66\u60d5'
>>> print s.encode('raw_unicode_escape').decode('gbk')
警惕
（以上是俺在 pys60 上跑的）
--

如月中天

2011-06-05

笔记——可以用 encode('raw_unicode_escape') 处理并非 unicode 编码的 unicode 字符串

没有评论:

发表评论

长毛象

雪泥鸿爪

算了