Status
Proposed on 2026-03-14 by Lars Solem.
Context
Dataverket will use NATS for communication between platform components. The repository already points toward an event-based architecture, but currently does not define the concrete contract that services must follow.
Without a shared NATS taxonomy, each service will invent its own subject naming, payload style, and reliability model. That would make orchestration, auditing, SDK integration, and cross-service debugging unnecessarily fragile.
Decision
Dataverket uses NATS as the internal command and event backbone with a shared subject taxonomy and a standard message envelope.
NATS is also the standard communication mechanism between Dataverket datacenters.
NATS is an internal control-plane transport between Dataverket services and workers. It is not the primary external integration surface for tenant or operator clients.
NATS is used for:
- commands between control plane services and workers
- domain events emitted by services
- task lifecycle events
- request/reply where synchronous service coordination is justified
- intra-datacenter control-plane communication within each datacenter
- inter-datacenter control-plane communication
Public clients integrate through the Sentral-owned HTTP API and task resources. NATS subjects and envelopes are internal platform contracts unless a later ADR explicitly promotes a specific stream or bridge to a supported external interface.
NATS is not the long-term system of record. Desired state remains in service databases, primarily PostgreSQL.
Inter-datacenter model
Dataverket supports two or more datacenters, and NATS is the standard communication path between those sites for platform coordination.
This means:
- each datacenter uses NATS as its standard internal control-plane transport
- site-local services publish and consume through NATS within their datacenter
- cross-site coordination also happens through NATS subjects rather than through ad hoc custom protocols
- datacenter identity must be explicit in topology, placement, and operational tracing
The inter-datacenter design should prefer site-local streams and explicit cross-site coordination flows over treating every subject as globally shared by default.
NATS is therefore both an intra-datacenter and inter-datacenter control-plane transport.
Subject taxonomy
All NATS subjects use the dv. prefix.
The standard subject families are:
dv.<service>.cmd.<action>dv.<service>.evt.<entity>.<verb>dv.task.evt.<verb>dv.rpc.<service>.<operation>
Examples:
dv.maskin.cmd.provisiondv.maskin.evt.server.provisioneddv.nett.cmd.applydv.plattform.evt.cluster.readydv.task.evt.updateddv.rpc.identitet.introspect_token
Semantics by subject family
Commands
Commands express desired work. They are addressed to one service domain and handled by a responsible consumer group.
Commands:
- must be durable
- must be idempotent
- must carry correlation and actor metadata
- may produce zero or more follow-up events
Commands should normally be backed by JetStream.
Events
Events are facts about something that already happened inside a service domain.
Events:
- are immutable
- may be consumed by multiple downstream services
- must not be rewritten as hidden RPC responses
- should describe domain state transitions, not log spam
Important integration events should be published through JetStream so they can be replayed by consumers.
Task events
Long-running operations must expose task lifecycle through dv.task.evt.*.
The minimum task verbs are:
createdqueuedstartedprogresssucceededfailedcancelled
RPC
dv.rpc.* exists for narrow synchronous interactions where request/reply is materially better than an asynchronous workflow.
RPC should be used sparingly. If the operation changes infrastructure state or may take more than a few seconds, it should be modeled as a command plus task instead.
Standard message envelope
All commands and events must use a common envelope.
The envelope fields are:
specversion: envelope version, initially1.0id: unique message IDtype: logical message type, such asmaskin.server.provision.requestedsource: emitting service, such assentralormaskinsubject: resource identifier within the emitting domaintime: UTC timestamp in RFC 3339 formatdatacontenttype: usuallyapplication/jsontenant_id: tenant or organization identifier when applicableproject_id: project identifier when applicableenvironment_id: environment identifier when applicabledatacenter_id: datacenter or site identifier when applicableactor: identity that initiated the actioncorrelation_id: stable ID shared across a workflowcausation_id: parent message ID that triggered this messagedata: message payload
This is intentionally close to CloudEvents structure, but tailored to Dataverket’s control-plane needs.
Payload rules
- Payloads must be JSON objects
- Payload schemas must be versioned
- Consumers must ignore unknown fields
- Producers must not silently change field meaning without a schema version bump
- Opaque binary blobs must not be embedded directly in normal message payloads
Large artifacts should be stored elsewhere and referenced by URI or object ID.
Reliability model
The platform reliability model is:
- PostgreSQL stores desired state and authoritative resource state
- JetStream stores durable commands and important integration events
- consumers are responsible for idempotent handling
- at-least-once delivery is assumed
- workflow correlation is mandatory
- cross-datacenter links must be treated as failure-prone and partitionable
Cross-site NATS usage must therefore distinguish between:
- site-local workflow streams
- cross-site coordination subjects
- replicated or restorable durable state needed for recovery
No service may assume exactly-once processing.
Services must also not assume permanent low-latency connectivity between datacenters.
Ordering model
Ordering is only guaranteed within the limits of a subject stream and consumer behavior. Services must therefore:
- not rely on global ordering
- tolerate duplicate delivery
- validate current state before applying effects
- use correlation and causation IDs for workflow reconstruction
- tolerate delayed or temporarily partitioned cross-datacenter delivery
Failure-handling baseline
The first production implementation must include a baseline policy for retries, poison messages, and replay.
The minimum baseline is:
- transient failures may be retried with bounded backoff
- repeated permanent failures must transition work into a visible failed task state
- poison messages must be diverted to a dead-letter path after bounded retry exhaustion
- operators must be able to inspect dead-lettered work with correlation context intact
- replay must be an explicit operational action, not an accidental side effect of restart
This does not replace a later detailed ADR, but it establishes the minimum safety bar for building on NATS.
Naming guidance
Service segment names should use accepted Dataverket service identities:
sentralidentitetmaskinplattformtjenesteobjektnett
Additional service names must be introduced by ADR before becoming part of the stable taxonomy.
Consequences
- every service now has a uniform NATS contract
- observability and auditing become easier because workflows can be correlated consistently
- JetStream becomes an infrastructure dependency for orchestration
- teams must design idempotent consumers from the start
- NATS remains the transport layer, not the authoritative data store
- cross-datacenter coordination now shares the same transport and message model as intra-site orchestration
Decision Outcome
Proposed. This ADR records the current preferred direction and still needs acceptance before it becomes binding.
More Information
- stream and consumer layout in JetStream
- schema registry or schema publication workflow
- retry and dead-letter policy
- inter-datacenter NATS topology and failure handling
Audit
- 2026-03-14: ADR proposed.