MCP Deployment

Rate Limits

How requests are counted, when the counter resets, what a client sees on the over-quota path, and what each subscription tier actually entitles you to. Open this page when you see a 429 in your logs or when you are sizing a plan against expected traffic.

What gets counted

Every successfully resolved MCP request — tool call or resource fetch — increments the owning server's monthly counter. The counter is keyed per-server and per-month, so an organisation running multiple servers has one independent bucket per server, not a shared total.

Per server, not per organisation

Each server has its own quota. If your plan entitles you to two servers and N requests per server, the practical capacity is 2N — but you cannot pool the two buckets, and a heavily used server will not borrow capacity from a quiet one.

When it resets

Monthly. The counter rolls over at the start of each calendar month automatically. There is no manual reset; if you want capacity sooner, upgrade your plan.

What a client sees on 429

When the bucket is exhausted, the next request is rejected with HTTP 429 and a JSON-RPC error body. The MCP runtime surfaces it as a structured error, so well-behaved clients should display the message rather than crash. Subsequent requests within the same month continue to 429 until rollover or until the plan is upgraded.

Per-plan allowances

Plan-specific monthly allowances and private-server caps live on the Pricing page, since they move with billing rather than with the deployment surface. The baseline allowance is 10,000 requests per server per month — treat that number as the floor, not the published quota for any specific plan.

For the current per-tier numbers, see the Pricing page on this site.

Observing your usage

Each request is captured in two places: the per-server counter described above, and a row in the request log table — server, version, tool, IP, response status. The aggregated counter answers "how close am I to my quota"; the request log answers "which exact calls were rejected with 429". Both are surfaced in the Observability group when those pages come online.

Public and private use the same shape

The quota model is identical for public and private servers. Private servers also carry per-token data on each request, which is captured in the request log and could feed token-level rate limiting in future — today, the only enforced limit is the per-server monthly counter.

When you exceed the quota

  • Upgrade the organisation's plan — capacity is restored on the same billing cycle.
  • Wait for the next monthly rollover. The counter resets automatically.
  • Disable non-critical tools to reduce request volume. Each tool invocation is one increment, so trimming chatty tools (or moving them to a separate, less-loaded server) buys headroom.

On the roadmap (not yet shipped)

  • Per-token rate limits for private servers — using the per-token data already captured in request logs.
  • Soft-limit alerting before the quota is fully exhausted, so you can react before clients see a 429.

A worked scenario

Server A is on a plan with a 10,000-request monthly quota. By day 20, traffic has hit 10,000 — the next call returns 429 and the bucket stays drained until the calendar rolls over. You either upgrade the plan (capacity restored immediately) or accept the gap until the first of next month, when the counter starts again at zero.

No automatic backoff: Vectoralix does not throttle gradually as you approach the quota — every request up to the limit is served at full speed, and every request after the limit is hard-rejected with 429. Build retry logic into your MCP client accordingly.