top of page

SubQ: The First Commercial Subquadratic LLM with a 12 Million Token Context Window

  • 6 days ago
  • 2 min read

On May 5, 2026, Miami-based startup Subquadratic launched SubQ 1M-Preview, which Subquadratic describes as the first commercially available large language model built on a fully subquadratic attention architecture. Backed by $29 million in seed funding and led by CEO Justin Dangel and CTO Alexander Whedon, formerly Head of GenAI at Meta, Subquadratic is making a direct claim on one of the hardest practical bottlenecks in LLM deployment: making very long context practical and affordable.


What Subquadratic Means

Standard transformer attention is quadratic: its compute and memory requirements scale with the square of the input length. That means doubling your context length roughly quadruples your cost. This is why million-token context windows have existed in theory but remained impractical for most production workloads.


SubQ uses an architecture Subquadratic calls SSA, or Subquadratic Selective Attention. Subquadratic says SSA scales linearly with input length and focuses computational resources only on the token relationships that actually matter, rather than computing full pairwise attention across all tokens. The result is dramatically lower cost at long contexts.


Performance Claims

Subquadratic reports that SubQ’s sparse attention is 52 times faster than FlashAttention in its architecture-level comparison at 1 million tokens. On the RULER 128K long-context benchmark, Subquadratic reports 95.6% accuracy on RULER 128K; third-party coverage describes an $8 cost comparison, but that cost figure should be treated as a company-reported benchmark until independently reproduced. For comparison, Subquadratic reports achieving similar accuracy with Claude Opus cost approximately $2,600 for the same benchmark, representing roughly a 300-fold cost reduction.


At the full 12 million-token context window, Subquadratic claims compute requirements are reduced by approximately 1,000 times compared with other frontier models. The model is entering private beta with an API, a coding agent called SubQ Code, and a long-context search tool called SubQ Search.


Why This Architecture Matters

The practical use cases for a 12 million-token context window are substantial. An entire large codebase in a single prompt. Years of customer interaction history for a support agent. Full legal document sets for analysis. These have been technically possible but economically impractical with standard transformer architectures.


If SubQ's performance claims hold up under independent evaluation, it could change the economics of long-context applications significantly. The key question for engineers evaluating it is whether the selective attention mechanism preserves accuracy on tasks that require genuine cross-document reasoning, not just retrieval from a specific location.


What to Watch

SubQ is entering private beta, not broad public release. Independent benchmarks on tasks other than RULER 128K will be the signal to watch. The startup's claim that cost is roughly one-fifth of frontier models like Claude Opus or GPT-5.5 would represent a structural shift in long-context pricing if validated. Engineers interested in long-context applications should request API access and test against their specific workloads.


Sources

 
 
bottom of page