Direct naar de hoofdinhoud
eLearner.app
Module 2 · Les 4 van 48/32 in de cursus~12 min
Modulelessen (4/4)

Backtracking: een overzicht

When a greedy quantifier "overshoots", the engine performs backtracking: it steps back, tries another configuration, retries. On simple patterns it is invisible; on poorly written patterns it can become exponential and freeze the browser for seconds. It is the root of ReDoS (Regular expression Denial of Service).

Code
Pattern: .*\d
Sample:  abc123def
Steps:
  1.  .*  consumes everything: "abc123def"
  2.  \d  fails (end of string).
  3.  Backtrack: .* drops 'f' -> "abc123de", \d fails on 'f'.
  4.  Backtrack: .* drops 'ef' -> "abc123d", \d fails on 'e'.
  5.  ... and so on until \d finds a digit: match = "abc123" + "?"... actually
       the pattern matches "abc123" because .* leaves "def" and \d looks at '3'? No: the
       backtrack continues and .* ends up being "abc12", \d=3.  Final match: "abc123".

The engine always gets there, but if a greedy quantifier contains another greedy quantifier (e.g. (a+)+), the number of configurations to try can explode.

Catastrophic patterns

Code
Pattern: (a+)+b
Sample:  aaaaaaaaaaaaaaaaaaaaa

No trailing b → the pattern fails, but to get there the engine tries every possible partition of the as. Exponential growth: with 30 as you can wait for hours.

Perils of backtracking and ReDoS

When a pattern contains overlapping or nested quantifiers (e.g. (a+)+), the engine evaluates an exponential number of paths before failing. Replacing generic wildcards with excluding character classes dramatically reduces backtracking paths.

Try it

Oefening#regex.m2.l4.e1
Pogingen: 0Laden…

Extract the content between square brackets for every occurrence in the text. Use the lazy version of the quantifier to avoid jumping to later closings.

Editor laden…
Toon hint

\\[.*?\\] stops at the first closing ]: no heavy backtracking.

Oplossing beschikbaar na 3 pogingen

Review exercise

Oefening#regex.m2.l4.e2
Pogingen: 0Laden…

Find every `http://...` or `https://...` URL in the text, using the `?` quantifier for the optional `s` and `\\S+` for the non-space characters of the path.

Editor laden…
Toon hint

https? means 'http' followed by an optional 's'. Then //, then \\S+ (non-spaces).

Oplossing beschikbaar na 3 pogingen

Additional challenge

Oefening#regex.m2.l4.e3
Pogingen: 0Laden…

Match strings enclosed in double quotes, avoiding excessive backtracking by using the negated class `[^"]*` instead of `.*`.

Editor laden…
Toon hint

Replace the dot with the negated class [^"] to restrict the engine's backtracking paths.

Oplossing beschikbaar na 3 pogingen