Development and test environments DATAVERKET 003

Proposed Testing Development Environments

Defines the layered development and test environments used to build and validate Dataverket.

Author
Lars Solem
Updated

Status

Proposed on 2026-03-14 by Lars Solem.

Context

Dataverket spans bare metal, Talos, network automation, VMs, storage, and multi-datacenter orchestration. That scope is too large to develop effectively if every change requires access to a full physical environment.

The platform needs a deliberate development and test strategy that separates control-plane development, virtualization and cluster lifecycle testing, hardware-adjacent validation, and real fabric or provisioning validation.

Decision Drivers

  • Most control-plane work should be possible without access to production-like hardware.
  • VM, Talos, and cluster lifecycle logic still need realistic integration coverage.
  • Multi-datacenter claims require an environment that can represent partitions, failover, and recovery behavior honestly.
  • Some infrastructure behaviors cannot be trusted without hardware-adjacent validation.

Considered Options

One production-like environment for all development and testing

Require most meaningful development and validation to happen against a hardware-heavy environment that resembles production closely.

Mostly local software-only development and testing

Treat local containerized development as the main environment and defer richer infrastructure validation until much later.

Layered environment model

Use different environments for software iteration, virtualization integration, multi-site behavior, and hardware validation.

Decision

Dataverket adopts a layered development and test model with four main environments:

  1. Local software environment for most control-plane and workflow development.
  2. Single-machine virtualization lab for VM, Talos-in-VM, and platform integration testing on one host.
  3. Multi-site simulation environment for datacenter partition, failover, and recovery scenario testing.
  4. Small physical lab for hardware-facing validation that cannot be trusted in pure virtualization.

The platform must be developable in useful slices without a full production-like datacenter.

Local software environment

The default developer environment should use local container orchestration such as:

  • docker compose or equivalent local orchestration
  • containerized NATS
  • containerized PostgreSQL
  • containerized control-plane services
  • mocked or simulated external dependencies where practical
  • persistent volumes for stateful local services

This environment should be sufficient for API development, task and workflow handling, tenancy and inventory logic, reconciliation logic, and most operator-surface work.

The default expectation is that these dependencies are not installed directly on the developer workstation. They should run in containers with persistent local volumes so the environment is reproducible without losing all state between restarts.

Single-machine virtualization lab

A single-machine virtualization host, such as Proxmox or an equivalent environment, is a valid development and test target.

This environment should be used for:

  • VM lifecycle testing
  • Talos-in-VM testing
  • cluster bootstrap workflows
  • network attachment and placement experiments
  • break-glass and operator workflow validation in virtualized form

Multi-site simulation environment

Dataverket also needs an environment specifically for multi-datacenter behavior. Docker Compose, a single-host lab, and a small hardware lab are not sufficient on their own to validate site failover semantics.

This environment may be built from:

  • multiple virtual clusters or VM groups representing separate sites
  • isolated NATS and PostgreSQL instances or topologies representing site-local versus cross-site behavior
  • controllable inter-site links where latency, loss, and partition can be injected
  • automation to trigger failover, promotion, replay, and recovery scenarios repeatedly

This environment should be used for:

  • inter-site partition testing
  • failover workflow validation
  • site-loss and degraded-mode exercises
  • recovery and rejoin testing after partition or outage
  • validation of operator visibility and runbook behavior during cross-site incidents

Small physical lab

Dataverket should also maintain a small physical lab for hardware-adjacent validation.

This may include:

  • a small number of low-cost servers
  • single-board-computer clusters where appropriate
  • test switches or network appliances
  • representative BMC-capable hardware where available

This environment is reserved for:

  • provisioning validation
  • discovery and inventory validation
  • network automation edge cases
  • failure testing that virtualization cannot represent honestly

Consequences

Positive

  • Most control-plane development can happen without access to production-like hardware.
  • A single-host virtualization layer becomes a practical bridge between local software work and hardware validation.
  • Multi-datacenter claims now require a dedicated simulation layer rather than being inferred from single-site tests.

Negative

  • The platform now has to maintain several different validation environments instead of one simple setup.
  • Some scenarios may still require careful judgment about which environment is sufficient before changes are trusted.

Neutral

  • The test strategy should distinguish between unit tests, service integration tests, local workflow tests, virtualization-lab tests, multi-site simulation tests, and hardware-in-the-loop tests.
  • CI should prefer fast software-only validation first, then use gated or scheduled runs for more expensive multi-site and hardware-backed layers.

Decision Outcome

Proposed. Dataverket should use a layered environment model rather than forcing all work through one environment class.

More Information

  • Follow-up ADRs are still needed for local stack composition and tooling, the virtualization lab standard, the multi-site simulation standard, the hardware lab minimum topology, and CI test-layer policy.

Audit

  • 2026-03-14: ADR proposed.