Python正则表达式的使用（re模块）

正则表达式不属于任何语言本身，而是独立的用于处理字符串的强大工具，它有独立的语法和处理引擎，只要你熟悉其语法任何语言下都能使用（部分非常用语法可能在各种语言中的支持不一致）Python处理正则表达式主要用到re模块，本文主要介绍re模块的使用，正则表达式相关语法不作为本文重点

相关语法参考

Python正则表达式常用语法解释

关于re模块

re是 regular expression 的缩写，即正则表达式.

通过re模块使用正则表达式主要有三个过程：

编译正则表达式得到Pattern对象
通过Pattern对象匹配文本得到Match对象
从Match对象中获取匹配信息

re.compile(pattern[, flag]):

re.compile的第一个参数指定正则表达式字符串，第二个参数可选，用于指定匹配模式：

re.I(IGNORECASE): 忽略大小写
re.M(MULTILINE): 多行模式
re.S(DOTALL): 点任意匹配模式
re.L(LOCALE): 使预定字符类 \w \W \b \B \s \S 取决于当前区域设定
re.U(UNICODE): 使预定字符类 \w \W \b \B \s \S \d \D 取决于unicode定义的字符属性
re.X(VERBOSE): 详细模式.这个模式下正则表达式可以是多行，忽略空白字符，并可以加入注释.

该方法返回一个Pattern对象，实际上它是Pattern类的 Factory Method

Match对象

每个pattern匹配成功都会返回一个match对象的结果，通过它我们可以获取一次成功匹配的诸多信息

其常用的方法如下：

group([group1, …]): 返回匹配的一组或多组结果，没有则为None，其中参数group1…可以使用别名也可以使用编号,代表第n组结果，当编号为0时代表整个匹配的子串，若参数为多个则以元组形式返回，若不给参数则视为group(0)

groups([default]):返回所有组的匹配结果的元组，相当于group(1,2,3…),没有匹配到的组以default代替，默认None

groupdict([default]): 返回有别名的组的匹配结果的字典，其中别名为键，匹配子串为值

start([group]): 返回指定组的匹配子串在string中的起始索引，group默认值为0

end([group]): 返回指定组的匹配子串在string中的结束索引（最后一个字符的索引+1），group默认值为0

span([group]): 等价于(start(group), end(group))

#-*- coding:utf-8 -*-
import re
p = re.compile(r'My name is (\w+), I am (?P<age>\d+) years old , I come from (?P<city>\w+).')
m = p.match('My name is Felix, I am 10 years old , I come from Canton.')

print "group(0):", m.group(0)
print "group(1):", m.group(1)
print "groups():", m.groups()
print "groupdict():", m.groupdict()
print "start(1):", m.start(1)
print "end(1):", m.end(1)
print "span(1):", m.span(1)

##输出##
# group(0): My name is Felix, I am 10 years old , I come from Canton.
# group(1): Felix
# groups(): ('Felix', '10', 'Canton')
# groupdict(): {'city': 'Canton', 'age': '10'}
# start(1): 11
# end(1): 16
# span(1): (11, 16)

Pattern对象

search(string[, pos[, endpos]]) 或 re.search(pattern, string[, flags]):

查找字符串string，返回正则匹配的第一个结果，pos和endpos指定起始点，默认pos=0,endpos=len(string)，即全扫描

#-*- coding:utf-8 -*-

import re
p = re.compile(r'\d+')
m = p.search('number 123 number 456')
print m.group()

##输出（不会匹配456）##
# 123

#-*- coding:utf-8 -*-

import re
p = re.compile(r'\d+')
m = p.search('number 123 number 456', 10)
print m.group()

##输出（注意第二个参数）##
# 456

match(string[, pos[, endpos]]) 或 re.match(pattern, string[, flags]):

从string的pos位置开始匹配，直到正则匹配完毕则返回一个匹配对象，否则当匹配到endpos位置依然未匹配完则视为失败，返回None，这个方法和search有很大的区别，search是允许指定区间内的任意位置开始匹配，而match规定pos为开始匹配的位置

#-*- coding:utf-8 -*-

import re
p = re.compile(r'hello')
m = p.match('hello world')
if m:
    print m.group()
else:
    print 'not match'

##输出##
# hello

#-*- coding:utf-8 -*-

import re
p = re.compile(r'world')
m = p.match('hello world')
if m:
    print m.group()
else:
    print 'not match'

##输出##
# not match

split(string[, maxsplit]) 或 re.split(pattern, string[, maxsplit]):

将匹配的子串作为分割点分割string，返回一个分割后的列表，maxsplit指定分割次数，默认全分割

#-*- coding:utf-8 -*-

import re
p = re.compile(r'[#|!]')
res = p.split('I#am!from!China')
print res

##输出##
# ['I', 'am', 'from', 'China']

findall(string[, pos[, endpos]]) 或 re.findall(pattern, string[, flags]):

搜索所有匹配的结果，并以列表形式返回，pos、endpos参数意义同上所述

#-*- coding:utf-8 -*-

import re
p = re.compile(r'\d+')
res = p.findall('one 1 two 2 three 3')
print res

##输出##
# ['1', '2', '3']

finditer(string[, pos[, endpos]]) 或 re.finditer(pattern, string[, flags]):

搜素所有匹配结果，返回一个有序的匹配结果对象的迭代器，pos、endpos参数意义同上所述

#-*- coding:utf-8 -*-

import re
p = re.compile(r'\d+')
res = p.finditer('one 1 two 2 three 3')
for m in res:
    print m.group()

##输出##
# 1
# 2
# 3

sub(repl, string[, count]) 或 re.sub(pattern, repl, string[, count]):

repl为字符串时将string的所有匹配结果替换成repl,当repl为方法时将string的匹配结果作为参数（match对象）传给repl方法处理，处理结果作为替换值，count指代替换的次数，默认全部替换.

#-*- coding:utf-8 -*-

import re
p = re.compile(r'\d+')
res = p.sub('?','I am 18 years old, I have 2 brothers and 6 sisters')
print res

##输出（替换所有数字为问号）##
# I am ? years old, I have ? brothers and ? sisters

#-*- coding:utf-8 -*-

import re

def str_capital(m):
    return m.group().upper()

p = re.compile(r'\b\w+\b')
res = p.sub(str_capital,'I am 18 years old, I have 2 brothers and 6 sisters')
print res

##输出（将所有单词转换为大写）##
# I AM 18 YEARS OLD, I HAVE 2 BROTHERS AND 6 SISTERS

FelixHo's Space

42 is the ANS to everything

Menu

Widgets

Search

Python正则表达式的使用（re模块）

相关语法参考

关于re模块

re.compile(pattern[, flag]):

Match对象

Pattern对象

转载请注明出处：