Status
Proposed on 2026-03-14 by Lars Solem.
Context
Dataverket needs a VM substrate for the Maskin service.
The platform already intends to manage bare metal directly and run Talos for Kubernetes-oriented hosts. What remains open is how virtual machines should be provisioned, scheduled, and controlled in a way that fits the rest of the control plane.
Decision
Dataverket keeps the v1 VM runtime and control surface unknown for now.
Maskin is responsible for:
- hypervisor inventory
- VM placement decisions
- VM lifecycle management through the selected runtime
- image preparation and attachment
- network attachment through Nett-provided constructs
The platform does not yet commit to VMware, Proxmox, OpenStack, KVM with libvirt, or another VM control substrate.
Why keep this open
The repository does not yet establish enough constraints to lock the VM runtime responsibly.
The VM substrate choice affects:
- host operating model
- storage integration
- live migration possibilities
- failover behavior
- operational tooling
- how much Dataverket must build itself versus integrate
Locking a runtime too early would create unnecessary architectural drag.
What the eventual VM runtime must provide
The first selected VM runtime must support:
- non-interactive automation
- predictable VM lifecycle control
- inventory integration
- datacenter-aware placement
- network attachment through Nett-managed constructs
- image lifecycle integration
- sufficient observability for task tracking and troubleshooting
Hypervisor model
The v1 hypervisor model is:
- dedicated hypervisor hosts
- a runtime-specific management interface selected later
- host networking integrated with Nett-managed VLAN and bridge constructs where applicable
Each hypervisor host is part of operator-managed platform inventory, not a tenant-facing resource.
VM provisioning model
The standard VM lifecycle is:
- Sentral persists desired VM state.
- Maskin selects a suitable hypervisor in the requested datacenter and project constraints.
- Nett allocates or validates the required network attachment.
- Maskin creates or attaches the VM disk image.
- Maskin provisions the VM through the selected runtime.
- Maskin emits lifecycle events and task updates through NATS.
Image model
The v1 image model should support:
- operator-managed base images
- immutable image versioning
- cloud-init or equivalent guest initialization where the guest OS requires it
- Talos images where Talos is used inside VMs
Maskin should treat image definitions as platform-managed artifacts, not arbitrary user-provided ad hoc disks in the first version.
Scheduling model
VM placement must consider:
- datacenter
- hypervisor capacity
- network attachment availability
- anti-affinity or spread requirements where requested
- future failover intent when applicable
The initial scheduler can be simple and deterministic. It does not need to be a general-purpose cluster scheduler in v1.
Explicit non-decisions for now
The following remain open until a later ADR:
- exact VM runtime
- exact hypervisor host OS
- migration feature expectations
- storage coupling model
Explicit non-goals for v1
The following are out of scope for the first version:
- live migration as a hard requirement
- tenant-direct hypervisor APIs
These may be reconsidered later if operational demand justifies them.
Consequences
- Maskin gets a clear responsibility boundary, but not a locked runtime yet
- VM runtime selection now needs a follow-up evaluation ADR before deep implementation starts
- VM automation can still follow the same inventory and NATS patterns as bare metal
- the team avoids locking itself to a substrate before storage and failover assumptions are clearer
Decision Outcome
Proposed. This ADR records the current preferred direction and still needs acceptance before it becomes binding.
Related Decisions
- This ADR depends on the resource and ownership model in 009-resource-inventory-and-tenancy-model.md.
- VM lifecycle orchestration must follow the transport and envelope rules in 007-nats-subject-and-event-envelope.md.
- Any eventual VM runtime choice must fit the datacenter placement and failover direction in 012-inter-datacenter-topology-and-failover.md.
- Candidate platform selection should be justified through 014-vm-platform-selection-criteria.md.
More Information
- VM platform selection criteria
- guest image catalog model
- storage model for VM disks
- hypervisor host OS and hardening profile
- live migration and failover policy
Audit
- 2026-03-14: ADR proposed.