🔍 Finding All Regex Matches Has Always Been O(n²)

5 STAR | Source: iev.ee | Date: 2026-03-24

Summary

Search a document for a pattern and it takes a second. Search one a hundred times larger and it doesn't take a hundred seconds — it can take almost three hours. Every regex engine, in every language, has had this problem since the 1970s, and nobody fixed it.

The Problem

Every regex engine that advertises linear-time matching — RE2, Go's regexp, rust's regex crate, .NET's NonBacktracking mode — means linear time for a single match. The moment you call find_iter or FindAll, that guarantee is gone.

Classic Example

Pattern: .*a|b, Haystack: n b's

Why It Went Unnoticed

Backtracking Is Even Worse

With backtracking, a user-supplied pattern and a 50-character input can take longer than the heat death of the universe — exponential time. Thompson published the NFA construction that avoids this in 1968. That's nearly 60 years of a solved problem being actively unsolved at scale.

Existing Solutions

Aho-Corasick (1975)

Solved linear-time matching for fixed strings only. Builds a trie from patterns, adds failure links, scans input once. Cannot handle regex.

Hyperscan/Vectorscan

True linear-time all-matches regex engine using "earliest match" semantics. Changes results — reports match the moment DFA enters match state instead of longest.

RE#: Two Pass Solution

RE# is the first regex engine that can find all matches in two passes, regardless of pattern or input, without altering semantics:

  1. Reverse DFA marks where matches could start
  2. Forward DFA resolves the longest match at each marked position

Performance Results

Input SizeNormalHardenedSpeedup
1,0000.7ms28μs25x
10,00073ms303μs241x
50,0001.8s1.6ms1,125x

Key Insight

The quadratic blowup requires a pathological pattern AND a structured input that's long enough. Take a pattern like [A-Z][a-z]+: every match starts at an uppercase letter and ends when engine sees something that isn't lowercase. No ambiguity, no quadratic case possible.

Tags

Regex Algorithm Time Complexity RE# Performance

🔗 Original Article