python中的正则表达式
re模块:match、search、findall、finditer、sub、compile
正则作用:1.查找内容,2.替换内容(内容为字符串,即文本)
match和search方法,返回的值是类
match和search的区别是, match能够匹配的是从头开始的内容,而search匹配的可以是不从头开始
必须能够将需求的字符串获取出来, 必须会写模式
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 import reline = "huang123" pattern = '^hua' match_res = re.search(pattern, line) print (match_res)line = 'h\n' pattern = '^h.' match_res = re.search(pattern, line, re.S) print (match_res)line = "h122222eeee123" pattern = '^h\d*?' match_res = re.search(pattern, line, re.S) print (match_res)pattern = '.*3$' match_res = re.search(pattern, line) print (match_res)line = 'hkd3' pattern = '^h.3$' match_res = re.search(pattern, line) print (match_res)pattern = '^h.*3$' match_res = re.search(pattern, line) print (match_res)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import reline = 'ahuuhhaaahang123' pattern = 'h(.*?h(.*))(h.*)h' res = re.search(pattern, line) print (res.group(1 ))print ('2' , res.group(2 ))print ('3' , res.group(3 ))
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 import reline = 'ahuuhhaaahhhhang123' line = 'ahuuhhaaahhhhang123' pattern = 'h?h' res = re.search(pattern, line) print (res)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 import reline = 'abc4' line = 'abc4' pattern = "[0-9]\d+" res = re.search(pattern, line) print (res)
findall的用法,找到匹配的所有的字符串,返回的是字符串列表
1 2 3 4 5 6 7 8 9 10 11 12 import reline = 'aaaabbbbccccaaaa' pattern = '.{4}' res = re.search(pattern, line) print (res)res = re.findall(pattern, line) print (res)
正则的练习
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 import re# text = "He was carefully disguised but captured quickly by police." # pattern = '([a-zA-Z]*ly)[^a-zA-Z]' # res = re.findall(pattern, text) # # print(res) text = '' 'Ross McFluff: 834.345.1254 155 Elm Street Ronald Heathmore: 892.345.3428 436 Finley Avenue Frank Burger: 925.541.7625 662 South Dogwood Way Heather Albrecht: 548.326.4584 919 Park Place' '' pattern = '(.*): (\d{3}\.\d{3}\.\d{4}) (.*)' res = re.findall(pattern, text) # print(res) result_dict = {} for item in res: name = item[0 ] phone_num = item[1 ] addr = item[2 ] result_dict[name] = {'phone_num' : phone_num, 'address' : addr} print(result_dict)
finditer的使用
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 import redef my_finditer (pattern, text ): pat_obj = re.compile (pattern) print ('生成器开始工作' ) pos = 0 while True : print ('生成器循环' ) res = pat_obj.search(text, pos) if res == None : break pos = res.end() print ('我返回给你一个内容:' , res) yield res if __name__ == '__main__' : text = "He was carefully disguised but captured quickly by police." pattern = '\w*ly' res = my_finditer(pattern, text) print (res) for item in res: print ('我收到了:' , item)
sub的使用
1 2 3 4 5 6 7 8 9 10 import retext = "He was carefully disguised but captured quickly by police." pattern = '\w*ly' res = re.sub(pattern, 'dog' , text) print (res)