Module lessons (1/4)
Validating an email
Validating an email "correctly" is a much harder problem than it seems: RFC 5322 is a monster. In practice we pick a good enough pattern that accepts common cases and rejects the obviously wrong ones.
Pattern: ^[\w.+-]+@[\w-]+(?:\.[\w-]+)+$[\w.+-]+-- local part: letters, digits,_,.,+,-.@-- separator.[\w-]+(?:\.[\w-]+)+-- domain + at least one TLD separated by..
What it accepts: mario.rossi@example.com, foo+bar@sub.example.co.uk,
user_123@test-domain.io.
What it rejects: mario@, @example.com, mario@example (no TLD), spaces.
Trade-offs
The pattern above does not accept:
- Quoted strings
"strange (things)"@example.com(RFC allows them). - Domains with Unicode characters (
\u00fcber@m\u00fcnchen.de). - Very long TLDs that don't pass
\w(e.g. internationalized ones).
If you need them, widen the class to \p{L} with the u flag, or delegate to a
specialized library.
Trade-offs and best practices for email validation
No regex can guarantee that an email address actually exists. Overly complex patterns degrade performance and exclude unusual but valid domains. It is better to use a simple validation pattern to reject obvious typos and send a verification code.
Try it
Recognize every email in the text. Use a simple pattern: letters/digits/dots before the @, domain with at least one dot.
Show hint
Widen the local part to [\\w.+-]+ and the domain to [\\w-]+\\.[\\w.-]+ to accept multiple TLDs.
Solution available after 3 attempts
Review exercise
Capture the domain (the part after @) of every email as a named group `dominio`.
Show hint
Wrap the domain part in a named group: (?<dominio>[\\w.-]+).
Solution available after 3 attempts
Additional challenge
Find the domain part of an email (everything after `@`), excluding the `@` sign using lookbehind.
Show hint
Use a positive lookbehind (?<=@) in front of the domain character class.
Solution available after 3 attempts