முக்கிய உள்ளடக்கத்திற்குச் செல்லவும்
eLearner.app
தொகுதி 2 · பாடம் 4 இன் 4பாடத்திட்டத்தில் 8/32~12 min
தொகுதி பாடங்கள் (4/4)

பின்னோட்டம்: ஒரு கண்ணோட்டம்

When a greedy quantifier "overshoots", the engine performs backtracking: it steps back, tries another configuration, retries. On simple patterns it is invisible; on poorly written patterns it can become exponential and freeze the browser for seconds. It is the root of ReDoS (Regular expression Denial of Service).

Code
Pattern: .*\d
Sample:  abc123def
Steps:
  1.  .*  consumes everything: "abc123def"
  2.  \d  fails (end of string).
  3.  Backtrack: .* drops 'f' -> "abc123de", \d fails on 'f'.
  4.  Backtrack: .* drops 'ef' -> "abc123d", \d fails on 'e'.
  5.  ... and so on until \d finds a digit: match = "abc123" + "?"... actually
       the pattern matches "abc123" because .* leaves "def" and \d looks at '3'? No: the
       backtrack continues and .* ends up being "abc12", \d=3.  Final match: "abc123".

The engine always gets there, but if a greedy quantifier contains another greedy quantifier (e.g. (a+)+), the number of configurations to try can explode.

Catastrophic patterns

Code
Pattern: (a+)+b
Sample:  aaaaaaaaaaaaaaaaaaaaa

No trailing b → the pattern fails, but to get there the engine tries every possible partition of the as. Exponential growth: with 30 as you can wait for hours.

Perils of backtracking and ReDoS

When a pattern contains overlapping or nested quantifiers (e.g. (a+)+), the engine evaluates an exponential number of paths before failing. Replacing generic wildcards with excluding character classes dramatically reduces backtracking paths.

Try it

உடற்பயிற்சி#regex.m2.l4.e1
முயற்சிகள்: 0ஏற்றுகிறது…

Extract the content between square brackets for every occurrence in the text. Use the lazy version of the quantifier to avoid jumping to later closings.

எடிட்டரை ஏற்றுகிறது…
குறிப்பைக் காட்டு

\\[.*?\\] stops at the first closing ]: no heavy backtracking.

3 முயற்சிகளுக்குப் பிறகு தீர்வு கிடைக்கும்

Review exercise

உடற்பயிற்சி#regex.m2.l4.e2
முயற்சிகள்: 0ஏற்றுகிறது…

Find every `http://...` or `https://...` URL in the text, using the `?` quantifier for the optional `s` and `\\S+` for the non-space characters of the path.

எடிட்டரை ஏற்றுகிறது…
குறிப்பைக் காட்டு

https? means 'http' followed by an optional 's'. Then //, then \\S+ (non-spaces).

3 முயற்சிகளுக்குப் பிறகு தீர்வு கிடைக்கும்

Additional challenge

உடற்பயிற்சி#regex.m2.l4.e3
முயற்சிகள்: 0ஏற்றுகிறது…

Match strings enclosed in double quotes, avoiding excessive backtracking by using the negated class `[^"]*` instead of `.*`.

எடிட்டரை ஏற்றுகிறது…
குறிப்பைக் காட்டு

Replace the dot with the negated class [^"] to restrict the engine's backtracking paths.

3 முயற்சிகளுக்குப் பிறகு தீர்வு கிடைக்கும்