Module lessons (3/4)
Greedy vs lazy
By default quantifiers are greedy: they consume as much as possible
while keeping the pattern valid. Adding ? after a quantifier (*?, +?,
??, {n,m}?) gives you the lazy version: it consumes as little as
possible.
Sample: <b>uno</b> e <i>due</i>
Greedy pattern <.*> matches: <b>uno</b> e <i>due</i> (everything)
Lazy pattern <.*?> matches: <b> </b>
<i> </i> (4 matches with the g flag)The exact same pattern, a single character of difference (? added to the
quantifier), totally different results.
When it matters
- To extract the content between delimiters (HTML tags, quotes, parentheses) the lazy version is almost always the right one.
- To match up to end of line you usually want greedy (
.*).
Greedy vs Lazy strategies in the engine
A greedy quantifier consumes as much text as possible and backtracks only if forced. By adding ? (lazy), the engine consumes the absolute minimum and advances one character at a time searching for the next match of the pattern.
Try it
Extract every HTML tag from the sample (e.g. `<b>`, `</b>`, `<i>`, `</i>`) using the lazy version `.*?`.
Show hint
Greedy <.*> matches from the start of the first tag to the end of the last one. Lazy <.*?> stops at the first > it encounters.
Solution available after 3 attempts
Review exercise
Extract every string between double quotes in the text (e.g. "ciao", "mondo"). Use the lazy version of the quantifier to avoid jumping to later closings.
Show hint
".*?" stops at the first closing double quote.
Solution available after 3 attempts
Additional challenge
Extract all text blocks enclosed in square brackets (e.g. `[text]`), including the brackets, using a lazy quantifier so as not to merge separate blocks.
Show hint
Use \[ for the open bracket, then .*? and finally \] for the closed bracket.
Solution available after 3 attempts