Skip to content

Architecture Overview

wf is a single static Go binary. The core execution model requires no background services and no network calls to external systems. Everything happens in a single process execution.

A trigger listener (wf listen) is planned as an optional additive component for event-driven scheduling — it does not affect direct wf run invocations. See Triggers for details.


Execution Pipeline

wf run my-workflow
┌─────────────────────────────────────────────────────────┐
│  CLI (cmd/run.go)                                       │
│  • Parse flags                                          │
│  • Validate --var values                                │
│  • Select executor                                      │
└────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│  Parser (internal/dag/parser.go)                        │
│  • Validate workflow name (path traversal check)        │
│  • Read TOML file                                       │
│  • Validate env keys, working_dir, variable names       │
│  → WorkflowDefinition                                   │
└────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│  Builder (internal/dag/builder.go)                      │
│  • Expand matrix tasks                                  │
│  • Resolve depends_on edges                             │
│  • Detect cycles (Kahn's algorithm)                     │
│  • Topological sort → Levels                            │
│  → DAG{Levels [][]*Node}                                │
└────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│  Executor (internal/executor/)                          │
│  • Create Run record in SQLite                          │
│  • For each task:                                       │
│    - Evaluate `if` condition (ContextMap)               │
│    - Inject matrix vars                                 │
│    - Interpolate {{.var}} in cmd                        │
│    - Execute shell command (retry loop)                 │
│    - Capture stdout/stderr (10 MiB limit)               │
│    - Write log file (0600)                              │
│    - Capture last line → ContextMap.Set()               │
│    - Update TaskExecution in SQLite                     │
│    - Fire forensic handler on failure                   │
└────────────────────┬────────────────────────────────────┘
┌─────────────────────────────────────────────────────────┐
│  Storage (internal/storage/)                            │
│  • SQLite with WAL mode                                 │
│  • Append-only audit trail                              │
│  • Variable snapshots at task completion                │
└─────────────────────────────────────────────────────────┘

Key Packages

Package File(s) Responsibility
cmd/ run.go, resume.go, graph.go, … Cobra CLI commands; flag parsing; executor selection
internal/dag parser.go, builder.go, dag.go TOML → WorkflowDefinitionDAG; cycle detection; topological sort
internal/executor executor.go, sequential.go, parallel.go, work_stealing.go Task lifecycle; retry loop; forensic trap invocation; progress events
internal/storage store.go, schema.go, model.go SQLite persistence; versioned migrations; audit trail; context snapshots
internal/contextmap contextmap.go, template.go Thread-safe variable registry; {{.var}} interpolation; if evaluation
internal/security validate.go Path traversal prevention; env key deny-list; working_dir restrictions
internal/config config.go Viper-based configuration; config.Get() singleton
internal/logger logger.go log/slog wrapper; console + file output
internal/tty tty.go Terminal detection for ANSI colour suppression

Key Data Structures

WorkflowDefinition

Raw parsed TOML. One-to-one mapping with the TOML schema. Contains all task definitions before matrix expansion or dependency resolution.

DAG

The processed, validated execution graph. Contains:

  • Nodes map[string]*Node — all nodes (including matrix-expanded)
  • Levels [][]*Node — topologically sorted slices; level 0 has no dependencies
  • ForensicTasks map[string]*Node — forensic tasks excluded from normal levels

Node

A single executable unit. May be a matrix expansion. All mutable fields are protected by a sync.RWMutex and accessed through thread-safe methods:

node.MarkRunning(startTime time.Time)
node.MarkSuccess(endTime time.Time, output string, exitCode int)
node.MarkFailed(endTime time.Time, err error)
node.MarkSkipped(output string, conditionResult bool)
node.MarkEarlyFailed(err error, exitCode int, stackTrace string)
node.MarkConditionMet(result bool)
node.GetState() NodeState
node.Reset()

ContextMap

Thread-safe variable registry backed by sync.RWMutex. Key operations:

cm.Set(taskID, name, value)           // register a variable (owner = taskID)
cm.Get(name)                          // read a variable value
cm.SetMatrix(taskID, vars)            // register read-only matrix vars
cm.EvalCondition(expr)                // evaluate an `if` expression
cm.InterpolateCommand(taskID, cmd)    // substitute {{.var}} in a command
cm.Snapshot()                         // serialise to JSON for DB storage
cm.Restore(data)                      // deserialise from JSON on resume

Store (SQLite)

All database access goes through the Store interface. The implementation uses parameterized queries exclusively. Schema migrations are numbered and forward-only.


Executor Implementations

Three structs implement the Executor interface (Execute, Resume, GetStore):

Implementation File Strategy
SequentialExecutor sequential.go Level-by-level, one task at a time
ParallelExecutor parallel.go Level-by-level, tasks within a level run concurrently under a semaphore
WorkStealingExecutor work_stealing.go Dependency-driven; task enqueued the moment all deps complete

All three share the core task lifecycle logic in executor.go:

  • executeNode() — evaluates condition, interpolates command, runs subprocess, handles retry loop
  • runCommand() — subprocess setup, process group management, timeout via context, output capture
  • doResume() — shared resume logic; variable restoration; pre-marking succeeded tasks

Process Management

Task commands are executed as sh -c <cmd> (Unix) or cmd.exe /C <cmd> (Windows).

On Unix, SysProcAttr.Setpgid = true creates a new process group for each task. On timeout or context cancellation, SIGKILL is sent to the process group (negative PID) — this kills the shell and all its children, preventing orphaned subprocesses from keeping stdout/stderr pipes open.


Storage Schema

Tables (versioned migrations in internal/storage/schema.go):

Table Purpose
runs One row per workflow run. Stores status, mode, timeout, tags, task counts.
task_executions One row per task per run. Stores state, exit code, attempt, log path, matrix vars.
context_snapshots Variable values at task completion. Used for resume variable restoration.
forensic_logs Output from forensic tasks; crash dumps; timeout records.
task_dependencies Dependency edges for the DAG — used for re-building the graph on inspect/resume.
audit_trail Append-only chronological event log.
dag_cache Serialised DAG JSON for fast inspection without re-parsing.

Run ID Format

Run IDs use KSUID — K-Sortable Unique IDentifiers. They are:

  • Time-sortable (embedded timestamp) — wf runs default sort is chronological without an index scan
  • Globally unique without coordination — no central ID server needed
  • URL-safe (base62 encoded)

Configuration Architecture

Configuration follows the 12-factor app model:

CLI flags  →  config file  →  WF_* env vars  →  built-in defaults
(highest priority)                              (lowest priority)

internal/config/config.go wraps Viper. Access is through config.Get() which returns a pointer to the current config struct. There is no global config.C variable — callers must always call config.Get().


Design Decisions

Pure Go SQLitemodernc.org/sqlite is used instead of mattn/go-sqlite3. This avoids CGo, keeping the binary fully static and cross-compilable without a C toolchain.

No goroutine pools — the work-stealing executor uses one goroutine per task. Tasks are expected to be long-running (seconds to minutes), so goroutine overhead is negligible. This keeps the scheduling logic simple and avoids deadlocks from pool exhaustion.

log/slog instead of zap — the standard library logger was chosen to eliminate a dependency and align with the Go standard. slog provides structured logging with equivalent performance for the throughput required by wf.

Append-only audit trail — the audit table has no UPDATE or DELETE code paths. This is a deliberate constraint that provides a tamper-evident history within the database's integrity guarantees.

limitedBuffer instead of io.LimitReaderio.LimitReader returns EOF at the limit, which would terminate the subprocess's pipe and send SIGPIPE. limitedBuffer silently drops bytes beyond the limit and always returns (len(p), nil) — the subprocess continues running and completes normally; only the in-memory capture is truncated.