python字符串详解

时间：2020-04-18 16:42:07 阅读：55 评论：0 收藏：0 [点我收藏+]

字符串
字符串的定义和初始化
字符串元素访问 - 下标
字符串+连接
字符串join连接
字符串分割
partition（重要）
字符串大小写
字符串排版
字符串的修改（重要）
字符串查找（重要）
字符串查找
字符串判断（重要）
字符串判断is系列
字符串格式化（重要）
字符串练习

字符串

一个个字符组成的有序的序列，是字符的集合
使用单引号、双引号、三引号引住的字符序列
字符串是不可变对象
python3起，字符串就是Unicode类型，是utf-8的。

我们对所有字符串的变化，都生成了一个新的字符串给你，这就是不可变对象。
元组就是一个不可变对象。字符串是第二个不可变对象。

字符串的定义和初始化

举例：

s1 = ‘string‘
s2 = ‘string2‘
s3 = ‘‘‘this‘s a "string"‘‘
s4 = ‘hello \n magedu.com‘
s5 = r‘hello \n magedu.com‘ # 加r就会原样输出，字符都是原本意思
s6 = ‘c:\windows\nt‘
s7 = R‘c:\windows\nt‘ # 处理目录
s8 = ‘c:\windows\\nt‘ # 转义
sql="""select * from user where name=‘tom‘""" # 处理sql语句

字符串是字符的组合，
这句话在python当中不准确，
应该是字符串是字符串的组合。
因为你用type查看单个字符的类型，就是str

Cpython和pypy的垃圾回收机制有可能是不一样的。
因为他们的效率不一样。

字符串元素访问 - 下标

字符串支持使用索引访问
- sql = "select * from user where name = ‘tom‘"
- sql[4] # 字符串‘c‘
- sql[4] = ‘o‘ # 这是不允许的，报错。
有序的字符集合，字符序列
for c in sql:
print(c)
print(type(c))
可迭代
- lst=list(sql)

list和tuple都有工厂方法，上面带一个参数iteratable
字符串也是可迭代对象

字符串+连接

+ -> str

将两个字符串连接在一起
返回一个新字符串

字符串join连接

"string".join(iterable) -> str

将可迭代对象连接起来
可迭代对象本身元素都是字符串
返回一个新字符串
lst=[‘1‘,‘2‘,‘3‘]
print(""".join(lst)) # 分隔符是双引号，转义引号，用引号分割开元素
print(" ".join(lst))
print("\n".join(lst))
lst=["1",[‘a‘,‘b‘],‘3‘]
print(" ".join(lst)) # lst里面有引用类型，报错。

使用join的时候，一定要注意可迭代对象里面的元素，不能是引用类型。

有序的序列，都是可以用‘’
s1 = ‘abc‘
s24 -> ‘abcabcabcabc‘

字符串分割

分割字符串的方法分为2类

split系
- 将字符串按照分隔符分割成若干字符串，并返回列表
partition系
- 将字符串按照分隔符分割成2段，返回这2段和分隔符的元组

字符串处理，非常重要，比如处理日志
split和partition用的都非常多。

split(sep=None,maxsplit=-1) -> list of strings

从左至右
sep指定分割字符串，缺省的情况下空白字符串作为分隔符
maxsplit指定分割的次数，-1表示遍历整个字符串

s1 = "I‘m \ta super student."
s1.split() # ["I‘m", ‘a‘, ‘super‘, ‘student.‘]
s1.split(‘s‘) # ["I‘m \ta ", ‘uper ‘, ‘tudent.‘]
s1.split(‘super‘) # ["I‘m \ta ", ‘ student.‘]
s1.split(‘super ‘) # ["I‘m \ta ", ‘student.‘]
s1.split(‘ ‘) # ["I‘m", ‘\ta‘, ‘super‘, ‘student.‘]
s1.split(‘ ‘,maxsplit=2) # ["I‘m", ‘\ta‘, ‘super student.‘]
s1.split(‘\t‘,maxsplit=2) # ["I‘m ", ‘a super student.‘]

rsplit(sep=None,maxsplit=-1) -> list of strings

从右向左
sep指定分割字符串，缺省的情况下空白字符串作为分隔符
maxsplit指定分割的次数，-1表示遍历整个字符串

s1 = "I‘m \ta super student."

s1.rsplit() 
# ["I‘m", ‘a‘, ‘super‘, ‘student.‘]

s1.rsplit(‘s‘) 
# ["I‘m \ta ", ‘uper ‘, ‘tudent.‘]

s1.rsplit(‘super‘) 
# ["I‘m \ta ", ‘ student.‘]

s1.rsplit(‘super ‘)  
# ["I‘m \ta ", ‘student.‘]

s1.rsplit(‘ ‘) 
# ["I‘m", ‘\ta‘, ‘super‘, ‘student.‘]

s1.rsplit(‘ ‘,maxsplit=2)  
# ["I‘m \ta", ‘super‘, ‘student.‘]

s1.rsplit(‘\t‘,maxsplit=2)  
# ["I‘m ", ‘a super student.‘]

这种方式，从路径里面取文件名，最适合。
从右边开始找，按照斜杠和反斜杠切。

splitlines([keepends]) -> list of strings

按照行来切分字符串
keepends指的是，是否保留行分隔符
行分隔符包括\n,‘\r\n‘,‘\r‘等

‘ab c\n\nde fg\rkl\r\n‘.splitlines()
‘ab c\n\nde fg\rkl\r\n‘.splitlines(True)
s1 = "I‘m a super student.
You‘re a super teacher."
print(s1)
print(s1.splitlines())
print(s1.splitlines(True))

partition（重要）

partition(sep) -> (head,sep,tail)

从左到右，遇到分隔符，就把字符串分割成两部分，返回头、分隔符、尾三部分的三元组；
如果没有找到分隔符，就返回头、2个空元素的三元组
sep分割字符串，必须制定

s1 = "I‘m a super student."
s1.partition(‘s‘)
s1.partition(‘stu‘)
s1.partition(‘‘) # 空分隔符，会报错
s1.partition(‘abc‘)

rpatition(sep) -> (head,sep,tail)
- 从右到左，遇到分隔符就把字符串分割成两部分，
- 返回头、分隔符、尾三部分的三元组
- 没有找到分隔符，就返回2个空元素和尾的三元组

rpartition，处理路径当中的文件名的时候，非常适合。

字符串大小写

upper() - 全大写
lower() - 全小写
大小写，做判断的时候用
swapcase() - 交互大小写

字符串排版

title()
- 标题的每个单词，都大写
capitalize()
- 首个单词大写
center(width[,fillchar])
- width 打印宽度
- fillchar 填充的字符
zfill(width)
- width打印宽度，居右，左边用0填充
ljust(width[,fillchar])
- 左对齐
rjust(width[,fillchar])
- 右对齐
这些中文用的少，了解一下

字符串的修改（重要）

replace(old,new[,count]) -> str
- 字符串中找到了匹配替换为新子串，返回新字符串
- count表示替换几次，不指定就是全部替换

‘www.magedu.com‘.replace(‘w‘,‘p‘)
‘www.magedu.com‘.replace(‘w‘,‘p‘,2)
‘www.magedu.com‘.replace(‘w‘,‘p‘,3)
‘www.magedu.com‘.replace(‘ww‘,‘p‘,2)
‘www.magedu.com‘.replace(‘www‘,‘python‘,2)
‘www.magedu.com‘.replace(‘w‘,‘p‘)

strip([chars])

从字符串的两端，去除指定的字符集chars中的所有字符
如果chars没有指定，去除两端的空白字符

s = "\r\n\t Hello Python \n \t"
s.strip()
s = "I am very very very sorry"
s.strip(‘ly‘)
s.strip(‘ly ‘)

lstrip([chars]) - 从左开始
rstrip([chars]) - 从右开始

字符串查找（重要）

遇到查找，就要考虑效率问题

find(sub[,start[,end]])

从指定的区间[start,end]，从左至右，查找子串sub，找到返回索引，没找到返回-1

rfind(sub[,start[,end]])

从指定的区间[start,end]，从右至左，查找子串sub，找到返回索引，没找到返回-1

s = "I‘m very very very sorry"
s.find(‘very‘)
s.find(‘very‘,5)
s.find(‘very‘,6,13)
s.find(‘very‘,10)
s.find(‘very‘,10,15)
s.find(‘very‘,10,-1)

index(sub[,start[,end]]) - int

在指定的区间[start,end]，从左至右，查找子串sub,找到返回索引，没找到抛出异常ValueError

rindex(sub[,start[,end]]) - int

在指定的区间[start,end]，从右至左，查找子串sub,找到返回索引，没找到抛出异常ValueError

s = "I am very very very sorry"
s.index(‘very‘)
s.index(‘very‘,5)
s.index(‘very‘,6,13)
s.index(‘very‘,10)
s.index(‘very‘,10,15)
s.index(‘very‘,-10,-1)

index和find相比，倾向于用find，因为index会抛出来异常
抛异常不处理，程序就崩了
因为异常有的时候，很难测试出来

字符串查找

时间复杂度
- index和count方法都是O(n)
- 随着列表数据规模的增大，而效率下降
len(string)
- 返回字符串的长度，也就是字符的个数

count[sub[,start[,end]]] - int

在指定的区间[start,end]，从左至右，统计子串sub出现的次数

s = "I am very very very sorry"
s.count(‘very‘)
s.count(‘very‘,5)
s.count(‘very‘,10,14)

真的给你文本（一本小说），单词有多少个，就是要用字典
如果文本是源源不断产生的，就像是日志，就要用count

字符串判断（重要）

endswith(suffix[,start[,end]]) - bool
- 在指定的区间[start,end],字符串是否是suffix结尾
startswith(prefix[,start[,end]]) - bool
- 在指定的区间[start,end],字符串是否是prefix开头

s = "I am very very very sorry"
s.startswith(‘very‘)
s.startswith(‘very‘,5)
s.startswith(‘very‘,5,9)
s.endswith(‘very‘,5,9)
s.endswith(‘sorry‘,5)
s.endswith(‘sorry‘,5,-1)
s.endswith(‘sorry‘,5,100)

字符串判断is系列

isalnum() - bool - 是否是字母和数字组成
isalpha() 是否是字母 # 用处大
isdecimal() 是否只包含十进制数字
isdigit() 是否全部数字(0-9)
isidentifier()是不是字母和下划线开头，其他都是字母、数字、下划线 # 用处大
islower()是否都是小写
isupper()是否全部大写
isspace()是否只包含空白字符

有正则表达式之后，慢慢都不太用这些玩意

字符串格式化（重要）

字符串的格式化是一种拼接字符串输出样式的手段，更灵活方便
- join拼接只能够使用分隔符，且要求被拼接的是可迭代对象
- +拼接字符还算方便，但是非字符串需要先转换为字符串才能拼接
在2.5版本之前，只能够使用printf style风格的print输出
- printf-style formatting，来自于C语言的printf函数
- 格式要求
  - 占位符：使用%和格式字符组成，例如%s和%d等
    - s调用str()，r会调用repr()。
    - 所有对象都可以被这两个转换
  - 占位符中还可以插入修饰字符，例如%03d表示打印3个位置，不够前面补零
  - format % value，格式字符串和被格式的值之间使用%分隔
  - values只能是一个对象，或是一个和格式字符占位符数目相等的元组，或一个字典

并不是所有对象，都能够用str()函数来进行强制转换的，所以+这种方式，也有点局限

"I am %03d" % (20,) # 020
"I like %s" % ‘Python‘ # I like Python
"I an %s%%" % 20" # I am 20%
"%03.2f%%,0x%x,0X%02X" % (89.7654,10,15) # 89.77%,0xa,0X0F 有效数字，要求数量小于有效数字，按照有效打，大于补0
"I am %-5d" % (20,) # 默认右对齐，-号是左对齐

format函数格式字符串语法——python鼓励使用
- "{}{xxx}".format(*args,**kwargs) - > str
- args是位置参数，是一个元组
- kwargs是关键字参数，是一个字典
- 花括号表示占位符
- {}表示按照顺序匹配位置参数，{n}表示取位置参数索引为n的值
- {xxx}表示在关键字参数中搜索名称一致的
- {{}}表示打印花括号

位置参数

# 这就是按照位置顺序用位置参数替换前面的格式字符串的占位符
"{}:{}".format(‘192.168.1.100‘,8888)

关键字参数或命名参数

# 位置参数按找序号匹配，关键字参数按找名词匹配
"{server}{1}:{0}".format(8888,‘192.168.1.100‘,server=‘Web Server Info：‘)

访问元素

"{0[0]}.{0[1]}".format((‘magedu‘,‘com‘))

对象属性访问

from collections import namedtuple
Point = namedtuple(‘Point‘,‘x y‘)
p = Point(4,5)
"{{{0.x},{0.y}}}".format(p) # {4,5}

字符串拼接是new出来一个字符串，format也是new一个。

对齐

‘{0}*{1}={2:<2}‘.format(3,2,2*3) # 左对齐
‘{0}*{1}={2:<02}‘.format(3,2,2*3) #  
‘{0}*{1}={2:>02}‘.format(3,2,2*3) # 默认右对齐
‘{:^30}‘.format("centered") # 居中
‘{:*^30}‘.format("centered") # 居中

- 进制

```python
"int:{0:d};hex:{0:x};oct:{0:o};bin:{0:b}".format(42)
"int:{0:d};hex:{0:#x};oct:{0:#o};bin:(0:#b)".format(42)
octets=[192,168,0,1] # 点分四段十进制表示法的IPv4地址
‘{:02X}{:02X}{:02X}{:02X}‘.format(*octets) #*octets 参数解构

在python中，请优先使用format函数格式化字符串

字符串练习

用户输入一个数字
- 判断是几位数
- 打印每一位数字及其重复的次数。打印顺序个、十、百、千、万...位打印
输入5个数字，打印每个数字的位数，将三个数字排序打印，要求升序打印

num = input(‘>>>>‘)
print(‘这是一个{}位数‘.format(len(num)))

int_num = int(num)

count=[0]*10
for i in range(len(num)):
    a = int_num % 10 # 求余数
    count[a-1] += 1
    print(‘第{0}位数字是{1}，这个{1}出现了{2}次‘.format(i+1,a,count[a-1]))
    int_num = int_num // 10 # 减少一位

python字符串详解

原文：https://www.cnblogs.com/gnuzsx/p/12726493.html

踩

(0)

评论一句话评论（0）

分享档案

更多>

2021年09月23日 (328)
2021年09月24日 (313)
2021年09月17日 (191)
2021年09月15日 (369)
2021年09月16日 (411)
2021年09月13日 (439)
2021年09月11日 (398)
2021年09月12日 (393)
2021年09月10日 (160)
2021年09月08日 (222)