用于嵌套 Div 标签的 PHP RegExp

本文介绍了用于嵌套 Div 标签的 PHP RegExp的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要一个可以与 PHP 的 preg_match_all() 一起使用的正则表达式来匹配 div 标签中的内容.div 看起来像这样:

Content

到目前为止，我已经提出了这个正则表达式，它可以匹配所有 id="t[number]" 的 div

/(.*?)<\/div>/

问题是当内容包含更多 div 时，嵌套的 div 是这样的:

<div id="t1">内容<div>更多内容</div></div>

关于如何让我的正则表达式与嵌套标签一起工作的任何想法?

谢谢

解决方案

尝试使用解析器:

require_once "simple_html_dom.php";$text = 'foo <div id="t1">内容<div>更多的东西</div></div>bar <div>甚至更多</div>baz <div id="t2">是</div>';$html = str_get_html($text);foreach($html->find('div') as $e) {if(isset($e->attr['id']) && preg_match('/^td++/', $e->attr['id'])) {echo $e->outertext ."
";}}

输出:

<div id="t1">内容<div>更多内容</div></div><div id="t2">是</div>

在此处下载解析器:http://simplehtmldom.sourceforge.net/

更多为了我自己的娱乐，我尝试用正则表达式来做.这是我想出的:

$text = 'foo <div id="t1">Content <div>more stuff</div></div>bar <div>甚至更多</div>baz <div id="t2">是 <div>aaa<div>bbb<div>ccc</div>bbb</div>aaa</div>

';if(preg_match_all('#<divs+id="td+">[^<>]*(<div[^>]*>(?:[^<>]*|(?1))*</div>)[^<>]*</div>#si', $text, $matches)) {打印_r($matches[0]);}

输出:

数组([0] =><div id="t1">内容<div>更多内容</div></div>[1] =><div id="t2">是 <div>aaa<div>bbb<div>ccc</div>bbb</div>aaa</div>

)

还有一个小解释:


# 匹配一个开头的 'div' 和一个以 't' 开头的 id 和一些数字[^<>]* # 匹配零个或多个除 '<' 之外的字符和 '>'( # 打开组 1<div[^>]*># 匹配一个开头的 'div'(?: # 打开一个不匹配的组[^<>]* # 匹配零个或多个除 '<' 之外的字符和 '>'|#     或者(?1) # 递归匹配组 1 定义的内容)* # 关闭不匹配的组并重复零次或多次

# 匹配一个结束的 'div') # 关闭组 1[^<>]* # 匹配零个或多个除 '<' 之外的字符和 '>'

require_once "simple_html_dom.php"; $text = 'foo <div id="t1">Content <div>more stuff</div></div> bar <div>even more</div> baz <div id="t2">yes</div>'; $html = str_get_html($text); foreach($html->find('div') as $e) { if(isset($e->attr['id']) && preg_match('/^td++/', $e->attr['id'])) { echo $e->outertext . " "; } }

$text = 'foo <div id="t1">Content <div>more stuff</div></div> bar <div>even more</div> baz <div id="t2">yes <div>aaa<div>bbb<div>ccc</div>bbb</div>aaa</div> </div>'; if(preg_match_all('#<divs+id="td+">[^<>]*(<div[^>]*>(?:[^<>]*|(?1))*</div>)[^<>]*</div>#si', $text, $matches)) { print_r($matches[0]); }

<divs+id="td+"> # match an opening 'div' with an id that starts with 't' and some digits [^<>]* # match zero or more chars other than '<' and '>' ( # open group 1 <div[^>]*> # match an opening 'div' (?: # open a non-matching group [^<>]* # match zero or more chars other than '<' and '>' | # OR (?1) # recursively match what is defined by group 1 )* # close the non-matching group and repeat it zero or more times </div> # match a closing 'div' ) # close group 1 [^<>]* # match zero or more chars other than '<' and '>' </div> # match a closing 'div'

用于嵌套 Div 标签的 PHP RegExp

问题描述

相关文档推荐