模块课程(3/4)
贪婪与懒惰
By default quantifiers are greedy: they consume as much as possible
while keeping the pattern valid. Adding ? after a quantifier (*?, +?,
??, {n,m}?) gives you the lazy version: it consumes as little as
possible.
Sample: <b>uno</b> e <i>due</i>
Greedy pattern <.*> matches: <b>uno</b> e <i>due</i> (everything)
Lazy pattern <.*?> matches: <b> </b>
<i> </i> (4 matches with the g flag)The exact same pattern, a single character of difference (? added to the
quantifier), totally different results.
When it matters
- To extract the content between delimiters (HTML tags, quotes, parentheses) the lazy version is almost always the right one.
- To match up to end of line you usually want greedy (
.*).
Greedy vs Lazy strategies in the engine
A greedy quantifier consumes as much text as possible and backtracks only if forced. By adding ? (lazy), the engine consumes the absolute minimum and advances one character at a time searching for the next match of the pattern.
Try it
Extract every HTML tag from the sample (e.g. `<b>`, `</b>`, `<i>`, `</i>`) using the lazy version `.*?`.
显示提示
Greedy <.*> matches from the start of the first tag to the end of the last one. Lazy <.*?> stops at the first > it encounters.
3 次尝试后可用的解决方案
Review exercise
Extract every string between double quotes in the text (e.g. "ciao", "mondo"). Use the lazy version of the quantifier to avoid jumping to later closings.
显示提示
".*?" stops at the first closing double quote.
3 次尝试后可用的解决方案
Additional challenge
Extract all text blocks enclosed in square brackets (e.g. `[text]`), including the brackets, using a lazy quantifier so as not to merge separate blocks.
显示提示
Use \[ for the open bracket, then .*? and finally \] for the closed bracket.
3 次尝试后可用的解决方案