字符串与unicode

时间：2020-08-06 12:25:20 阅读：88 评论：0 收藏：0 [点我收藏+]

#coding=utf-8
"""
在python3中文本字符串类型(使用unicode数据存储)被命名为str,字节字符串类型被命名为bytes
在python2中,python3中的str类在python2中名称为unicode,但是python3中bytes类型被命名为str,
这意味着在python3中str是一个字符串,在python2中str是字节字符串

与python3不同,python2会在文本字符串与字节字符串之间尝试进行隐式转换,该工作机制是,
如果解释器遇到一个不同种类字符串的混合操作,解释器首先会将字节字符串转换为文本字符串,然后
对文本字符串进行操作,解释器使用默认编码进行隐式转换,用以下方法提供隐式默认编码
import sys
print(sys.getdefaultencoding())

"""
test_str = u‘\u03b1 is for alpha‘

print test_str.encode(‘utf-8‘)

print test_str.encode(‘utf-8‘).encode(‘utf-8‘)

# python2隐式转换报错如下,
"""
Traceback (most recent call last):
  File "D:/code/test/???????unicode.py", line 18, in <module>
    print test_str.encode(‘utf-8‘).encode(‘utf-8‘)
UnicodeDecodeError: ‘ascii‘ codec can‘t decode byte 0xce in position 0: ordinal not in range(128)
"""

# 对于解释器来说,最后一行代码相当于
print test_str.encode(‘utf-8‘).decode(‘ascii‘).encode(‘utf-8‘)

"""
如果你是使用的是python2.6以上的版本,可以使用from __future__ import unicode_literals,
一旦调用该方法,没有前缀的字符串就会转换成unicode

"""

字符串与unicode

原文：https://www.cnblogs.com/xwyjh/p/13444866.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)