Skip to main content
eLearner.app
Module 2 · Lesson 4 of 48/32 in the course~12 min
Module lessons (4/4)

Backtracking: an overview

When a greedy quantifier "overshoots", the engine performs backtracking: it steps back, tries another configuration, retries. On simple patterns it is invisible; on poorly written patterns it can become exponential and freeze the browser for seconds. It is the root of ReDoS (Regular expression Denial of Service).

Code
Pattern: .*\d
Sample:  abc123def
Steps:
  1.  .*  consumes everything: "abc123def"
  2.  \d  fails (end of string).
  3.  Backtrack: .* drops 'f' -> "abc123de", \d fails on 'f'.
  4.  Backtrack: .* drops 'ef' -> "abc123d", \d fails on 'e'.
  5.  ... and so on until \d finds a digit: match = "abc123" + "?"... actually
       the pattern matches "abc123" because .* leaves "def" and \d looks at '3'? No: the
       backtrack continues and .* ends up being "abc12", \d=3.  Final match: "abc123".

The engine always gets there, but if a greedy quantifier contains another greedy quantifier (e.g. (a+)+), the number of configurations to try can explode.

Catastrophic patterns

Code
Pattern: (a+)+b
Sample:  aaaaaaaaaaaaaaaaaaaaa

No trailing b → the pattern fails, but to get there the engine tries every possible partition of the as. Exponential growth: with 30 as you can wait for hours.

Perils of backtracking and ReDoS

When a pattern contains overlapping or nested quantifiers (e.g. (a+)+), the engine evaluates an exponential number of paths before failing. Replacing generic wildcards with excluding character classes dramatically reduces backtracking paths.

Try it

Exercise#regex.m2.l4.e1
Attempts: 0Loading…

Extract the content between square brackets for every occurrence in the text. Use the lazy version of the quantifier to avoid jumping to later closings.

Loading editor…
Show hint

\\[.*?\\] stops at the first closing ]: no heavy backtracking.

Solution available after 3 attempts

Review exercise

Exercise#regex.m2.l4.e2
Attempts: 0Loading…

Find every `http://...` or `https://...` URL in the text, using the `?` quantifier for the optional `s` and `\\S+` for the non-space characters of the path.

Loading editor…
Show hint

https? means 'http' followed by an optional 's'. Then //, then \\S+ (non-spaces).

Solution available after 3 attempts

Additional challenge

Exercise#regex.m2.l4.e3
Attempts: 0Loading…

Match strings enclosed in double quotes, avoiding excessive backtracking by using the negated class `[^"]*` instead of `.*`.

Loading editor…
Show hint

Replace the dot with the negated class [^"] to restrict the engine's backtracking paths.

Solution available after 3 attempts