VM platform selection criteria MASKIN 014

Proposed Infrastructure Compute Vendor selection Maskin Virtualization Criteria

Defines the lifecycle, networking, storage, and multi-datacenter criteria for selecting Maskin's first supported VM platform.

Author
Lars Solem
Updated

Status

Proposed on 2026-03-14 by Lars Solem.

Context

Dataverket has intentionally left the v1 VM runtime open.

That is the right decision for now, but it also means the eventual VM platform choice needs an explicit evaluation framework. Otherwise the choice may be driven by familiarity or convenience instead of the actual needs of Maskin, tenancy, automation, and inter-datacenter operations.

Decision

Dataverket will not select its first VM runtime until candidate platforms are evaluated against a shared capability checklist.

The first supported VM platform must satisfy the requirements in this document.

Mandatory platform requirements

The first supported VM platform must support:

  • non-interactive automation
  • predictable VM create, start, stop, reboot, and delete workflows
  • attachable disk and network lifecycle control
  • machine-readable or reliably scriptable state inspection
  • durable host and VM identity for inventory reconciliation
  • enough operational observability for task tracking and debugging

If the platform cannot be driven safely by automation, it is not suitable as the first Dataverket VM substrate.

Inventory and tenancy requirements

The first supported VM platform must fit the Dataverket resource model.

That means it must support or allow:

  • clear separation between operator-managed hypervisor hosts and tenant-facing VMs
  • inventory reconciliation between runtime state and Sentral inventory
  • project and environment-aware placement metadata
  • quota and capacity accounting at a level useful for tenant-facing services

The platform must not force Dataverket into a resource model that conflicts with tenant -> project -> environment scoping.

Networking requirements

The first supported VM platform must support:

  • attachment to Nett-managed networks
  • deterministic network interface assignment
  • VLAN-backed or equivalent isolated network attachment
  • enough host networking control to preserve Dataverket’s network intent model

If a candidate platform hides too much of the network path or makes network attachment opaque, it is a poor first choice.

Image and provisioning requirements

The first supported VM platform must support:

  • operator-managed base images
  • immutable image versioning or a close equivalent
  • automated guest initialization
  • Talos guest support where Talos-in-VM is required
  • reproducible provisioning from declared inputs

Ad hoc manual image handling should not be the primary operating model.

Multi-datacenter requirements

Because Dataverket is explicitly multi-datacenter, candidate VM platforms must be evaluated for:

  • datacenter-aware placement
  • failure-domain-aware scheduling inputs
  • clear behavior during site failover
  • realistic recovery workflows for VMs in another datacenter

The first platform does not need full transparent active/active VM mobility, but it must not block the site-active, service-level active/passive direction.

Storage requirements

The first supported VM platform must be compatible with a storage model that can support:

  • predictable VM disk lifecycle
  • backup and restore workflows
  • future replication or recovery design
  • inventory visibility for attached storage

The platform does not need to solve all storage problems itself, but it must not make them unmanageable.

Operational requirements

The first platform should be operable by a small team.

That means preference should be given to platforms with:

  • understandable failure modes
  • clear day-2 operations
  • practical lab reproducibility
  • maintainable upgrade workflows
  • accessible telemetry and troubleshooting paths

Integration requirements

The first supported VM platform must fit the Dataverket control-plane model:

  • Sentral as system of record
  • Maskin as the VM lifecycle owner
  • NATS-based task and event orchestration
  • no tenant-direct dependence on the underlying runtime API

If the platform expects to be the dominant control plane itself, that should count against it.

Evaluation model

Candidate platforms should be scored in five dimensions:

  1. Lifecycle automation fit VM lifecycle, observability, and scriptability.

  2. Network and storage fit Ability to integrate cleanly with Nett and the future storage model.

  3. Multi-datacenter fit Placement, recovery, and failover alignment with the site-active model.

  4. Operational fit Day-2 workload, reliability, upgrades, and troubleshooting.

  5. Commercial and procurement fit Cost, availability, support, and realistic deployment path.

No platform should be selected unless it is acceptable in all five dimensions.

Consequences

  • Dataverket now has a disciplined basis for choosing its first VM platform
  • 009 can remain intentionally open without becoming vague
  • the eventual VM runtime decision can be defended against concrete platform requirements

Decision Outcome

Proposed. This ADR records the current preferred direction and still needs acceptance before it becomes binding.

Audit

  • 2026-03-14: ADR proposed.