Golangregexp正则匹配

做爬虫的时候，往往需要对爬取的内容进行文本搜索，此时使用正则表达式进行模式匹配是一种优雅的做法。

golang的标准库regexp提供了正则表达式的相关用法。

regexp语法规则

如果想要利用正则表达式直接进行模式匹配，比较常见的是直接使用regexp.Match函数和regexp.MatchString函数，或者使用Regexp类型及其相关方法。

`regexp.Match`

定义：

func Match(pattern string, b []byte) (matched bool, err error)
复制代码

该方法可判断byte slice是否匹配正则表达式。返回匹配结果，以及error。

示例：

matched, _ := regexp.Match("b", []byte("hello golang"))
fmt.Println(matched) // false
复制代码

`regexp.MatchString`

定义：

func Match(pattern string, s string) (matched bool, err error)
复制代码

该方法可判断string是否匹配正则表达式。返回匹配结果，以及error。

示例：

matched, _ := regexp.MatchString("b", "hello golang")
fmt.Println(matched)
复制代码

`Regexp`类型

类型定义：

type Regexp struct {}
复制代码

Regexp是编译后的正则表达式，可以被多个goroutines并发使用。

Regexp类型实例的获取方法，常见的有Compile和MustCompile。

`Compile`

该方法编译正则表达式，然后将编译后的结果返回。方法定义为：

func Compile(expr string) (*Regexp, error)
复制代码

`MustCompile`

该方法和Compile很类似，但是如果编译发生错误，不会返回错误，而是直接panic。

Regexp常见的匹配方法

Regexp包含很多匹配方法，下面将列举说明较为常见的。

`Find`

该方法返回第一个匹配的[]byte。

定义为：

func (re *Regexp) Find(b []byte) []byte
复制代码

示例：

re := regexp.MustCompile("a")
match := re.Find([]byte("hello golang"))
fmt.Println(string(match)) // a
复制代码

`FindAll`

该方法返回由匹配结果组成的slice。

定义为：

func (re *Regexp) FindAll(b []byte, n int) [][]byte
复制代码

返回的slice长度由参数n指定：
- 当n < 0时，返回所有匹配个数；
- 当n >= 0且n <= 总匹配个数，返回n个结果；
- 当n > 总匹配个数，返回所有结果。
示例：

re := regexp.MustCompile("l[a-z]")
match := re.FindAll([]byte("hello world, hello golang"), -1)
for _, m := range match {
	fmt.Println(string(m))
}
// ll
// ld
// ll
// la
复制代码

`FindString`

该方法返回匹配的字符串。

定义为：

func (re *Regexp) FindString (s string) string
复制代码

示例：

re := regexp.MustCompile("l[a-z]")
match := re.FindString("hello world, hello golang")
fmt.Println(match) // ll
复制代码

`FindAllString`

定义为：

func (re *Regexp) FindAllString(s string, n int) []string
复制代码

该方法根据参数n，返回匹配结果，如果匹配不成功，则返回nil

示例：

re := regexp.MustCompile("l[a-z]")
match := re.FindAllString("hello world, hello golang", -1)
for _, m := range match {
	fmt.Println(string(m))
}
// ll
// ld
// ll
// la
复制代码

`FindIndex`

该方法返回第一个匹配结果在原始字符串中的位置。

定义为：

func (re *Regexp) FindIndex(b []byte) (loc []int)
复制代码

loc[0]是匹配结果的开始位置，loc[1]是匹配结果的结束位置+1，如果找不到匹配，则返回空slice。

示例：

re := regexp.MustCompile("l[a-z]")
match := re.FindIndex([]byte("hello world, hello golang"))
fmt.Println(match) // [2 4]
复制代码

`FinAllIndex`

该方法是FindIndex的all版本，它根据n参数，决定返回结果的个数。

定义为：

func (re *Regexp) FindAllIndex(b []byte, n int) [][]int
复制代码

n的用法参见FindAll。

示例：

re := regexp.MustCompile("l[a-z]")
match := re.FindAllIndex([]byte("hello world, hello golang"), -1)
for _, m := range match {
	fmt.Println(m)
}
// [2 4] [9 11] [15 17] [21 23]
复制代码

`FindStringIndex`

该方法的作用同FindIndex，只是传入的参数类型不同。

定义为：

func (re *Regexp) FindStringIndex(s string) (loc []int)
复制代码

`FindAllStringIndex`

该方法是FindStringIndex的all版本。

定义为：

func (re *Regexp) FindAllStringIndex(s string, n int) [][]int
复制代码

`FindStringSubmatch`

该方法返回匹配的字符串组。

定义为：

func (re *Regexp) FindStringSubmatch(s string) []string
复制代码

仅仅看说明可能难以理解该方法的作用，下面举例说明：

...
re := regexp.MustCompile(`(aaa)bb(c)`)
fmt.Printf("%q\n", re.FindStringSubmatch("aaabbc"))
复制代码

返回结果为：

["aaabbc", "aaa", "c"]
复制代码

`FindAllStringSubmatch`

该方法是FindStringSubmatch的all版本。

实现为：

func (re *Regexp) FindAllStringSubmatch(s string, n int) [][]string
复制代码

`Match`

该方法判断byte slice是否匹配正则表达式。

定义：

func (re *Regexp) Match(b []byte) bool
复制代码

示例：

re := regexp.MustCompile(`hello`)
match := re.Match([]byte("hello everyone"))
fmt.Println(match) // true
复制代码

`MatchString`

判断字符串是否匹配正则表达式。

定义：

func (re *Regexp) MatchString(s string) bool
复制代码

示例：

re := regexp.MustCompile(`hello`)
match := re.MatchString("hello everyone")
fmt.Println(match) // true
复制代码

`ReplaceAll`

定义为：

func (re *Regexp) ReplaceAll(src, repl []byte) []byte
复制代码

该方法返回src的一个拷贝，该拷贝中的所有匹配项都会被repl替换。

示例：

re := regexp.MustCompile(`hello`)
match := re.ReplaceAll([]byte("hello everyone"), []byte("hi!"))
fmt.Println(string(match)) // hi! everyone
复制代码

`ReplaceAllString`

定义为：

func (re *Regexp) ReplaceAllString(src, repl string) string
复制代码

该方法返回src的一个拷贝，该拷贝的所有匹配项都会被repl替换。

示例：

re := regexp.MustCompile(`hello`)
match := re.ReplaceAllString("hello everyone", "hi!")
fmt.Println(match) // hi! everyone
复制代码

`Split`

定义为：

func (re *Regexp) Split(s string, n int) []string
复制代码

该方法以匹配项作为分割符，将s分割成多个字符串，并且返回字符串组成的切片。

示例：

re := regexp.MustCompile(`a`)
s := re.Split("abacadaeafff", -1)
fmt.Println(s) // ["", "b", "c", "d", "e", "fff"]
复制代码

参数n控制返回的slice长度：
- n > 0：返回最多n个字符串，最后一个是剩余未进行切割的部分
- n == 0：返回nil
- n < 0：返回所有字符串

正则表达式语法规则

字符	描述
^	匹配字符串开始位置
$	匹配字符串结束位置
*	匹配前面的子表达式零次或多次
+	匹配前面的子表达式一次或多次
？	匹配前面的子表达式零次或一次
{n}	匹配n次
{n,}	至少匹配n次
{n,m}	至少匹配n次，最多匹配m次
?	跟在 * + ? {n} {n.} {n,m} 后面时，表示非贪婪匹配
.	匹配除"\n"之外的任何单个字符
x\|y	匹配x或者y
[xyz]	匹配所包含的任意一个字符
[^xyz]	匹配未包含的字符
[a-z]	匹配字符范围
[^a-z]	匹配不在指定范围内的字符
\b	匹配一个单词的边界
\B	匹配一个非单词边界
\d	匹配一个数字字符
\D	匹配一个非数字字符
\f	匹配一个换页符
\n	匹配一个换行符
\r	匹配一个回车符
\s	匹配任何空白字符
\S	匹配任何非空白字符
\t	匹配一个制表符
\v	匹配一个垂直制表符
\w	匹配包括下划线的任何单词字符
\W	匹配任何非单词字符

Golangregexp正则匹配

regexp语法规则

`regexp.Match`

`regexp.MatchString`

`Regexp`类型

`Compile`

`MustCompile`

Regexp常见的匹配方法

`Find`

`FindAll`

`FindString`

`FindAllString`

`FindIndex`

`FinAllIndex`

`FindStringIndex`

`FindAllStringIndex`

`FindStringSubmatch`

`FindAllStringSubmatch`

`Match`

`MatchString`

`ReplaceAll`

`ReplaceAllString`

`Split`

正则表达式语法规则

近期文章

近期评论

标签

热门

文章归档

分类目录

功能

regexp语法规则

regexp.Match

regexp.MatchString

Regexp类型

Compile

MustCompile

Regexp常见的匹配方法

Find

FindAll

FindString

FindAllString

FindIndex

FinAllIndex

FindStringIndex

FindAllStringIndex

FindStringSubmatch

FindAllStringSubmatch

Match

MatchString

ReplaceAll

ReplaceAllString

Split

正则表达式语法规则

近期文章

近期评论

标签

热门

文章归档

分类目录

功能

`regexp.Match`

`regexp.MatchString`

`Regexp`类型

`Compile`

`MustCompile`

`Find`

`FindAll`

`FindString`

`FindAllString`

`FindIndex`

`FinAllIndex`

`FindStringIndex`

`FindAllStringIndex`

`FindStringSubmatch`

`FindAllStringSubmatch`

`Match`

`MatchString`

`ReplaceAll`

`ReplaceAllString`

`Split`