← d3dev

Cursor "Included-Request Usage" on Enterprise Plans

At a Glance

Cursor's "included-request usage" is not a count of requests — it's a dollar amount of API token consumption included in your plan. Each team seat ($40/mo) includes $20/mo of usage charged at model API rates per million tokens. Enterprise plans offer pooled usage (shared across the whole team) instead of per-user allocation. The single biggest lever to minimize usage is model selection: Claude 4.6 Opus output costs 8.3x more per token than Gemini 3 Flash.

Cursor "Included-Request Usage" on Enterprise Plans

Metadata

At a Glance

Cursor's "included-request usage" is not a count of requests — it's a dollar amount of API token consumption included in your plan. Each team seat ($40/mo) includes $20/mo of usage charged at model API rates per million tokens. Enterprise plans offer pooled usage (shared across the whole team) instead of per-user allocation. The single biggest lever to minimize usage is model selection: Claude 4.6 Opus output costs 8.3x more per token than Gemini 3 Flash.

Quotes

"Each plan includes usage charged at model inference API prices."

— Cursor Docs, Pricing [1]

"We work hard to grant additional bonus capacity beyond the guaranteed included usage. Since different models have different API costs, your model selection affects token output and how quickly your included usage is consumed."

— Cursor Docs, Pricing [1]

"All non-Auto agent requests include a $0.25 per million tokens fee. This covers: Semantic search, Custom model execution (Tab, Apply, etc.), Infrastructure and processing costs."

— Cursor Docs, Team Pricing [2]

"Max Mode uses token-based pricing at the model's API rate plus a 20% upcharge."

— Cursor Docs, Models [3]

"Member spend limits on Enterprise pooled usage accounts apply to total usage, not just on-demand usage."

— Cursor Docs, Spend Limits [4]

Sam's TLDR;

"Request usage" = dollars of tokens consumed, not number of requests. Your plan includes a dollar budget that gets eaten by every agent interaction at the model's per-token API rate. Expensive models (Opus) burn through it fast; cheap models (Gemini Flash, Grok Code) stretch it. Enterprise's pooled usage is the key differentiator — your team shares one big bucket instead of each person having a tiny $20 bucket. To minimize: pick cheaper models for routine work, avoid Max Mode, keep conversations short, and use Auto mode for balanced cost/quality.

Key Points

What "Included Usage" Actually Is

How Usage Is Calculated

Every interaction with the AI consumes tokens across four categories, each priced differently [3]:

Token TypeDescriptionRelative Cost
InputYour prompt, attached files, contextBase rate
Cache WriteFirst-time context processing1.25x input rate
Cache ReadRe-used context from prior turns~10% of input rate
OutputModel's response (code, text)3-5x input rate

On Teams/Enterprise, there's also a Cursor Token Fee of $0.25 per million tokens (all token types) on non-Auto requests, covering semantic search, Apply, and infrastructure [2].

Model Pricing Comparison (per 1M tokens)

ModelInputCache ReadOutputCost Ratio vs Flash
Claude 4.6 Opus$5.00$0.50$25.008.3x
Claude 4.6 Sonnet$3.00$0.30$15.005x
Composer 1.5$3.50$0.35$17.505.8x
Gemini 3.1 Pro$2.00$0.20$12.004x
GPT-5.2 / 5.3 Codex$1.75$0.175$14.004.7x
Auto mode$1.25$0.25$6.002x
Gemini 3 Flash$0.50$0.05$3.001x (cheapest)
Grok Code$0.20$0.02$1.500.5x (cheapest)

Enterprise-Specific Features

Typical Usage Levels [1]

User TypeTypical Monthly Usage
Daily Tab users onlyUnder $20
Limited Agent usersOften within $20
Daily Agent users$60–$100/mo
Power users (multi-agent/automation)$200+/mo

How to Minimize Request Usage

1. Choose Cheaper Models for Routine Work

The single biggest lever. Grok Code output costs $1.50/M tokens vs Claude Opus at $25/M tokens — a 16.7x difference. Use Opus only for complex architectural decisions; use Flash or Grok for routine edits, refactoring, and simple tasks [3].

2. Use Auto Mode

At $1.25/M input and $6/M output, Auto mode is cheaper than manually selecting most premium models. Cursor dynamically picks the best model for each task, optimizing for both quality and cost [3].

3. Avoid Max Mode Unless Necessary

Max Mode extends context to a model's maximum (up to 1M tokens for Claude) but adds a 20% upcharge on all token costs. Default 200k context is sufficient for most tasks [3].

4. Keep Conversations Short

Each message in a conversation carries forward the full context of all previous messages. Starting new conversations resets context and avoids ballooning input token costs. Long conversations compound rapidly because every new message re-sends the entire history.

5. Write Specific, Targeted Prompts

Vague prompts cause the model to explore broadly and generate more output tokens. Precise instructions ("change line 42 of app.js to use async/await") produce less output than "refactor this file to be more modern."

6. Leverage Tab Completions

Tab completions are unlimited on Pro and above and don't consume request usage in the same way as Agent requests [1]. For small, predictable edits, Tab is essentially free.

7. Use Cursor Rules and Context Wisely

Good .cursorrules and project rules mean the model needs fewer rounds of clarification, reducing total token consumption per task.

8. Set Enterprise Spend Controls

9. Monitor Via Dashboard and API

Full Summary

Cursor has moved from a request-count model to a token-based, dollar-denominated usage system. Every AI interaction (Agent chat, Composer, Cloud Agents) consumes tokens — input tokens for your prompt and context, output tokens for the model's response — priced at each model's public API rate per million tokens.

For Teams plan ($40/user/mo): each user gets $20/mo of included usage. When that's exhausted, on-demand usage continues automatically at the same rates, billed monthly. Usage is isolated per user — one person's heavy usage doesn't affect another's allocation. For Enterprise plan (custom pricing): the key differentiator is pooled usage. Instead of each user having an individual $20 bucket, the entire team shares a negotiated pool. This means a team of 50 where only 15 are heavy Agent users can effectively subsidize those 15 with the unused capacity of the other 35. Enterprise also adds per-member spend limits, billing groups for departmental chargebacks, model access restrictions to prevent users from selecting expensive models, and full programmatic control via the Admin API.

The cost math is straightforward: usage = tokens × price per model. A single Agent turn with Claude 4.6 Opus using 10k input tokens and 2k output tokens costs roughly $0.10. The same turn with Gemini 3 Flash costs about $0.01. Over hundreds of daily interactions across a team, this 10x difference is material.

Additional costs include the Cursor Token Fee ($0.25/M tokens on non-Auto requests) covering infrastructure like semantic search and code application, and the optional Max Mode upcharge (20% on all token costs) for extended 1M-token context windows.

The recommended strategy for enterprise teams is: default to Auto or Gemini Flash for routine work, allow premium models only for complex tasks, enforce per-member spend limits, and monitor usage patterns via the Analytics Dashboard to identify optimization opportunities.

References

  1. [1]Cursor Docs — Pricing. https://cursor.com/docs/account/pricing
  2. [2]Cursor Docs — Team Pricing. https://cursor.com/docs/account/teams/pricing
  3. [3]Cursor Docs — Models. https://cursor.com/docs/models
  4. [4]Cursor Docs — Spend Limits. https://cursor.com/docs/account/billing/spend-limits
  5. [5]Cursor Docs — Enterprise. https://cursor.com/docs/enterprise
  6. [6]Cursor Pricing Page. https://cursor.com/pricing