关于 php 截取中文的问题

众所周知,php 自带的 strlen 与 substr 函数没法处理中文字符,于是,我们会用 mb_ 系列函数替代。但是,没有 mbstring 库怎么办?这就需要我们自己写一个来替代了,废话不多说,先上代码

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
if ( !function_exists('mb_strlen') ) {
function ($text, $encode) {
if ($encode=='UTF-8') {
return preg_match_all('%(?:
[x09x0Ax0Dx20-x7E] # ASCII
| [xC2-xDF][x80-xBF] # non-overlong 2-byte
| xE0[xA0-xBF][x80-xBF] # excluding overlongs
| [xE1-xECxEExEF][x80-xBF]{2} # straight 3-byte
| xED[x80-x9F][x80-xBF] # excluding surrogates
| xF0[x90-xBF][x80-xBF]{2} # planes 1-3
| [xF1-xF3][x80-xBF]{3} # planes 4-15
| xF4[x80-x8F][x80-xBF]{2} # plane 16
)%xs',$text,$out);
}else{
return strlen($text);
}
}
}


if (!function_exists('mb_substr')) {
function mb_substr($str, $start, $len = '', $encoding="UTF-8"){
$limit = strlen($str);

for ($s = 0; $start > 0;--$start) {// found the real start
if ($s >= $limit)
break;

if ($str[$s] <= "x7F")
++$s;
else {
++$s; // skip length

while ($str[$s] >= "x80" && $str[$s] <= "xBF")
++$s;
}
}

if ($len == '')
return substr($str, $s);
else
for ($e = $s; $len > 0; --$len) {//found the real end
if ($e >= $limit)
break;

if ($str[$e] <= "x7F")
++$e;
else {
++$e;//skip length

while ($str[$e] >= "x80" && $str[$e] <= "xBF" && $e < $limit)
++$e;
}
}

return substr($str, $s, $e - $s);
}
}

以上代码摘自 wp-utf8-excerpt 插件,效果可以见本站首页,所有文章摘要都是该插件负责截取的。