NATS subject taxonomy and event envelope DATAVERKET 007

Proposed Integration Messaging Nats Events Jetstream

Defines the shared NATS subject taxonomy, message envelope, and reliability baseline for Dataverket control-plane communication.

Author
Lars Solem
Updated

Status

Proposed on 2026-03-14 by Lars Solem.

Context

Dataverket will use NATS for communication between platform components. The repository already points toward an event-based architecture, but currently does not define the concrete contract that services must follow.

Without a shared NATS taxonomy, each service will invent its own subject naming, payload style, and reliability model. That would make orchestration, auditing, SDK integration, and cross-service debugging unnecessarily fragile.

Decision

Dataverket uses NATS as the internal command and event backbone with a shared subject taxonomy and a standard message envelope.

NATS is also the standard communication mechanism between Dataverket datacenters.

NATS is an internal control-plane transport between Dataverket services and workers. It is not the primary external integration surface for tenant or operator clients.

NATS is used for:

  • commands between control plane services and workers
  • domain events emitted by services
  • task lifecycle events
  • request/reply where synchronous service coordination is justified
  • intra-datacenter control-plane communication within each datacenter
  • inter-datacenter control-plane communication

Public clients integrate through the Sentral-owned HTTP API and task resources. NATS subjects and envelopes are internal platform contracts unless a later ADR explicitly promotes a specific stream or bridge to a supported external interface.

NATS is not the long-term system of record. Desired state remains in service databases, primarily PostgreSQL.

Inter-datacenter model

Dataverket supports two or more datacenters, and NATS is the standard communication path between those sites for platform coordination.

This means:

  • each datacenter uses NATS as its standard internal control-plane transport
  • site-local services publish and consume through NATS within their datacenter
  • cross-site coordination also happens through NATS subjects rather than through ad hoc custom protocols
  • datacenter identity must be explicit in topology, placement, and operational tracing

The inter-datacenter design should prefer site-local streams and explicit cross-site coordination flows over treating every subject as globally shared by default.

NATS is therefore both an intra-datacenter and inter-datacenter control-plane transport.

Subject taxonomy

All NATS subjects use the dv. prefix.

The standard subject families are:

  • dv.<service>.cmd.<action>
  • dv.<service>.evt.<entity>.<verb>
  • dv.task.evt.<verb>
  • dv.rpc.<service>.<operation>

Examples:

  • dv.maskin.cmd.provision
  • dv.maskin.evt.server.provisioned
  • dv.nett.cmd.apply
  • dv.plattform.evt.cluster.ready
  • dv.task.evt.updated
  • dv.rpc.identitet.introspect_token

Semantics by subject family

Commands

Commands express desired work. They are addressed to one service domain and handled by a responsible consumer group.

Commands:

  • must be durable
  • must be idempotent
  • must carry correlation and actor metadata
  • may produce zero or more follow-up events

Commands should normally be backed by JetStream.

Events

Events are facts about something that already happened inside a service domain.

Events:

  • are immutable
  • may be consumed by multiple downstream services
  • must not be rewritten as hidden RPC responses
  • should describe domain state transitions, not log spam

Important integration events should be published through JetStream so they can be replayed by consumers.

Task events

Long-running operations must expose task lifecycle through dv.task.evt.*.

The minimum task verbs are:

  • created
  • queued
  • started
  • progress
  • succeeded
  • failed
  • cancelled

RPC

dv.rpc.* exists for narrow synchronous interactions where request/reply is materially better than an asynchronous workflow.

RPC should be used sparingly. If the operation changes infrastructure state or may take more than a few seconds, it should be modeled as a command plus task instead.

Standard message envelope

All commands and events must use a common envelope.

The envelope fields are:

  • specversion: envelope version, initially 1.0
  • id: unique message ID
  • type: logical message type, such as maskin.server.provision.requested
  • source: emitting service, such as sentral or maskin
  • subject: resource identifier within the emitting domain
  • time: UTC timestamp in RFC 3339 format
  • datacontenttype: usually application/json
  • tenant_id: tenant or organization identifier when applicable
  • project_id: project identifier when applicable
  • environment_id: environment identifier when applicable
  • datacenter_id: datacenter or site identifier when applicable
  • actor: identity that initiated the action
  • correlation_id: stable ID shared across a workflow
  • causation_id: parent message ID that triggered this message
  • data: message payload

This is intentionally close to CloudEvents structure, but tailored to Dataverket’s control-plane needs.

Payload rules

  • Payloads must be JSON objects
  • Payload schemas must be versioned
  • Consumers must ignore unknown fields
  • Producers must not silently change field meaning without a schema version bump
  • Opaque binary blobs must not be embedded directly in normal message payloads

Large artifacts should be stored elsewhere and referenced by URI or object ID.

Reliability model

The platform reliability model is:

  • PostgreSQL stores desired state and authoritative resource state
  • JetStream stores durable commands and important integration events
  • consumers are responsible for idempotent handling
  • at-least-once delivery is assumed
  • workflow correlation is mandatory
  • cross-datacenter links must be treated as failure-prone and partitionable

Cross-site NATS usage must therefore distinguish between:

  • site-local workflow streams
  • cross-site coordination subjects
  • replicated or restorable durable state needed for recovery

No service may assume exactly-once processing.

Services must also not assume permanent low-latency connectivity between datacenters.

Ordering model

Ordering is only guaranteed within the limits of a subject stream and consumer behavior. Services must therefore:

  • not rely on global ordering
  • tolerate duplicate delivery
  • validate current state before applying effects
  • use correlation and causation IDs for workflow reconstruction
  • tolerate delayed or temporarily partitioned cross-datacenter delivery

Failure-handling baseline

The first production implementation must include a baseline policy for retries, poison messages, and replay.

The minimum baseline is:

  • transient failures may be retried with bounded backoff
  • repeated permanent failures must transition work into a visible failed task state
  • poison messages must be diverted to a dead-letter path after bounded retry exhaustion
  • operators must be able to inspect dead-lettered work with correlation context intact
  • replay must be an explicit operational action, not an accidental side effect of restart

This does not replace a later detailed ADR, but it establishes the minimum safety bar for building on NATS.

Naming guidance

Service segment names should use accepted Dataverket service identities:

  • sentral
  • identitet
  • maskin
  • plattform
  • tjeneste
  • objekt
  • nett

Additional service names must be introduced by ADR before becoming part of the stable taxonomy.

Consequences

  • every service now has a uniform NATS contract
  • observability and auditing become easier because workflows can be correlated consistently
  • JetStream becomes an infrastructure dependency for orchestration
  • teams must design idempotent consumers from the start
  • NATS remains the transport layer, not the authoritative data store
  • cross-datacenter coordination now shares the same transport and message model as intra-site orchestration

Decision Outcome

Proposed. This ADR records the current preferred direction and still needs acceptance before it becomes binding.

More Information

  • stream and consumer layout in JetStream
  • schema registry or schema publication workflow
  • retry and dead-letter policy
  • inter-datacenter NATS topology and failure handling

Audit

  • 2026-03-14: ADR proposed.