top of page

Cursor Composer 2.5: A Kimi K2.5-Powered Coding Agent That Rivals Opus 4.7 at One-Tenth the Cost

  • 6 days ago
  • 2 min read

Cursor shipped Composer 2.5 on May 18, 2026, an AI coding agent built on the Kimi K2.5 base model from Moonshot AI. With 79.8% on SWE-Bench Multilingual and input pricing starting at $0.50 per million tokens, Cursor is positioning Composer 2.5 as a cost-efficient alternative to frontier coding models for long, complex coding sessions.


What Changed from Composer 2

Composer 2.5 keeps Kimi K2.5 as the base model but applies substantially heavier post-training. Cursor reports that 85% of the total compute budget went to their own reinforcement learning pipeline and post-training work. The model trained on 25 times more synthetic tasks than its predecessor and used more difficult reinforcement learning environments alongside targeted textual feedback.


In March 2026, Cursor faced community criticism when Composer 2's Kimi K2.5 base was discovered without being clearly disclosed. Cursor addressed this directly: the Composer 2.5 announcement named Kimi K2.5 in the opening paragraph.


Benchmark Results

Composer 2.5 scores 79.8% on SWE-Bench Multilingual and 63.2% on CursorBench v3.1. Cursor reports these results are competitive with Claude Opus 4.7 and GPT-5.5 on the same benchmarks, at approximately one-tenth the cost per token.


Standard pricing is $0.50 per million input tokens and $2.50 per million output tokens. This makes extended coding sessions with Composer 2.5 significantly cheaper than comparable sessions with frontier models at full Opus or GPT-5.5 pricing.


What It Handles Better

Cursor specifically designed Composer 2.5 for longer coding jobs. The agent targets tasks that require sustained context over a long session: multi-file refactors, debugging sessions that span many tool calls, and implementation work that requires maintaining coherent state across dozens of steps.


For indie developers and smaller teams working within Cursor, the cost-performance profile is the main value proposition. The benchmark parity with more expensive frontier models, combined with pricing that makes extended sessions economically viable, makes it worth evaluating for production coding workflows.


Considerations for Engineering Teams

Benchmarks and production performance diverge. SWE-Bench Multilingual measures specific task completion, and your coding workflows may have different characteristics. Before adopting Composer 2.5 for production use, run a structured evaluation on representative tasks from your actual codebase.


The Kimi K2.5 attribution situation also illustrates a broader point about AI coding tools: the base model behind a product can change, and the post-training work that differentiates it may be substantial or minimal. Treat benchmark claims as a starting point for your own evaluation, not a final answer.


Sources

 
 
bottom of page