Status
Proposed on 2026-03-14 by Lars Solem.
Context
Dataverket exposes a tenant-facing control-plane API for infrastructure operations that can be expensive, slow, and operationally risky.
Without explicit rate limiting and resource protection, a single tenant, script bug, or abusive client could overload:
- the public API
- task orchestration
- backing databases
- NATS consumers
- domain services performing high-impact operations
Decision
Dataverket adopts a layered resource protection model for public and operator-facing APIs.
The API baseline must support:
- request rate limiting
- throttling of expensive operations
- concurrency limits for destructive or high-impact workflows
- operator-visible rejection or backpressure behavior
Protection layers
The platform should distinguish between:
Request protection Generic per-client or per-token request rate limits.
Workflow protection Limits on how many expensive tasks a caller may trigger concurrently.
Domain protection Limits on sensitive resource classes such as provisioning, network changes, or failover actions.
This prevents the platform from treating all API traffic as equal when the underlying impact is not equal.
Tenant and operator scope
Protection rules should consider:
- tenant scope
- project scope
- actor identity
- operator versus tenant access class
Operator access may need broader limits than tenant access, but it must still not be unbounded.
High-impact operations
The first protection model must be especially strict for operations such as:
- bare-metal provisioning
- VM creation bursts
- network intent changes
- cluster creation or upgrade
- failover-triggering operations
These operations should not be governed by raw HTTP request rate alone.
Failure behavior
When protection limits are hit, the platform should:
- reject or defer work explicitly
- return machine-readable errors
- preserve audit and operator visibility
- avoid silently dropping important requests
Backpressure is acceptable. Invisible loss is not.
Relationship to quotas
Rate limiting is not the same as quotas.
- Rate limiting protects platform stability over time.
- Quotas protect long-term resource allocation boundaries.
The platform needs both.
Operator visibility
Operators must be able to see:
- which limits are being hit
- by which tenants or actors
- which resource classes are under pressure
- whether protection is preventing larger control-plane instability
Explicit non-decisions for now
This ADR intentionally does not yet choose:
- exact rate values
- exact token-bucket or leaky-bucket implementation
- exact fairness model between tenants
- exact placement of protection logic between gateway and services
Those require later implementation decisions.
Consequences
- API safety becomes a first-class control-plane concern
- tenants cannot be assumed to behave perfectly
- high-impact operations get stronger protection than ordinary reads
Decision Outcome
Proposed. This ADR records the current preferred direction and still needs acceptance before it becomes binding.
Related Decisions
- Public API behavior must align with 008-public-api-style.md.
- Tenant scoping must align with 009-resource-inventory-and-tenancy-model.md.
- Workflow protection must align with 019-workflow-retry-dead-letter-and-reconciliation.md.
- Operator visibility must align with 022-operator-visibility-and-control-surface.md.
More Information
- rate-limit policy by actor and resource class
- concurrency control model for high-impact workflows
- API error contract for throttling and backpressure
Audit
- 2026-03-14: ADR proposed.