Skip to content

Workflow File Reference

Workflow files are TOML documents that live in the configured workflows directory. Each file defines exactly one workflow. The filename (without .toml) must match the name field.


File Naming

workflows/
├── my-pipeline.toml        # name = "my-pipeline"
├── nightly-etl.toml        # name = "nightly-etl"
└── deploy-staging.toml     # name = "deploy-staging"

Workflow-Level Fields

These fields appear at the top of the file, outside any [tasks.*] section.

Field Type Required Description
name string yes Workflow identifier. Must match filename without .toml.
description string no Human-readable description shown in wf list.
tags []string no Searchable labels. Used with wf runs --tag.
on_failure string no Task ID of the global forensic handler. Runs when any task fails.

Example

name        = "deploy-production"
description = "Full production deployment with smoke tests"
tags        = ["deploy", "production", "nightly"]
on_failure  = "alert-oncall"

Task Fields

Tasks are declared as TOML tables: [tasks.<id>]. The task ID is the key used in depends_on references.

Required

Field Type Description
cmd string Shell command to execute. Supports multi-line with """...""". Interpolated with {{.varname}} before execution.

Identity and Type

Field Type Default Description
name string task ID Display name shown in progress output and reports.
type string "normal" "normal" or "forensic". Forensic tasks only run on failure, wired via on_failure.

Dependencies

Field Type Default Description
depends_on []string [] List of task IDs that must complete successfully before this task runs.

Execution Environment

Field Type Default Description
working_dir string process CWD Directory in which cmd is executed. Validated against path traversal at parse time.
env map[string]string {} Additional environment variables injected into cmd. Keys must be valid POSIX names.
clean_env bool false If true, start with an empty environment instead of inheriting the parent process env. WF_RUN_ID, WF_WORKFLOW, and WF_TASK_ID are still injected.

Variables

Field Type Default Description
register string Variable name to store the last non-empty stdout line. Available in downstream tasks as {{.name}}.
if string Boolean expression. Task is skipped (not failed) if it evaluates to false. References registered variables by name.

Matrix Expansion

Field Type Default Description
matrix map[string][]string Parameter grid. Each key-value-array combination produces one expanded node.

Resilience

Field Type Default Description
ignore_failure bool false If true, non-zero exit codes are treated as success. Dependants continue normally.
retries int 0 Number of retry attempts after a non-zero exit. Total executions = retries + 1.
retry_delay duration 0s Delay between retry attempts. Supports Go duration syntax: 5s, 1m30s, 2h.
timeout duration Per-task execution timeout. The task is killed (SIGKILL to process group) if it exceeds this limit.

Failure Handling

Field Type Default Description
on_failure string Task ID of a forensic handler to run when this task fails. The referenced task must have type = "forensic".

cmd — Command Syntax

cmd is executed by /bin/sh -c (Unix) or cmd.exe /C (Windows). All standard shell syntax is available.

Multi-line commands

[tasks.setup]
cmd = """
set -euo pipefail
mkdir -p /tmp/workspace
cd /tmp/workspace
echo "Ready"
"""

Variable interpolation in cmd

[tasks.deploy]
cmd = "./deploy.sh --env={{.target_env}} --version={{.app_version}}"

Only {{.varname}} substitution is supported. Template logic ({{if}}, {{range}}, {{call}}) is emitted verbatim — it will cause the shell command to fail visibly.


env — Environment Variables

[tasks.build]
cmd = "go build ."
env = {GOOS = "linux", GOARCH = "amd64", CGO_ENABLED = "0"}

Keys must satisfy POSIX naming rules: letters, digits, underscore; must not start with a digit.

The following keys are blocked and will cause a parse error:

Blocked Key Reason
LD_PRELOAD Dynamic-linker preload override
LD_LIBRARY_PATH Dynamic-linker library path override
LD_AUDIT Dynamic-linker audit module
LD_DEBUG Dynamic-linker debug flag
LD_DEBUG_OUTPUT Dynamic-linker debug output redirect
DYLD_INSERT_LIBRARIES macOS dynamic-linker insert override
DYLD_LIBRARY_PATH macOS dynamic-linker library path
DYLD_FRAMEWORK_PATH macOS dynamic-linker framework path

if — Condition Expressions

[tasks.cleanup]
cmd = "rm -rf /tmp/cache"
if  = 'disk_pct > "85"'

The condition is evaluated after all depends_on tasks complete. If it evaluates to false, the task state is set to skipped and dependants continue as if it succeeded.

Operators

==   !=   >   <   >=   <=   &&   ||   !

Examples

if = 'status == "healthy"'
if = 'cert_days < "30"'
if = 'errors != "0"'
if = 'score >= "0.9" && model_type == "transformer"'
if = 'env == "production" || env == "staging"'

matrix — Expansion

[tasks.test]
cmd    = "go test ./{{.test.pkg}}/..."
matrix = {pkg = ["api", "storage", "executor", "contextmap"]}

Produces: test[pkg=api], test[pkg=storage], test[pkg=executor], test[pkg=contextmap].

Matrix variables are referenced in cmd as {{.taskid.paramname}}.

Multi-dimensional:

[tasks.build]
cmd    = "GOOS={{.build.os}} GOARCH={{.build.arch}} go build -o bin/app-{{.build.os}}-{{.build.arch}} ."
matrix = {os = ["linux", "darwin"], arch = ["amd64", "arm64"]}

Produces 4 nodes (2×2).


Forensic Tasks

[tasks.rollback-db]
type    = "forensic"
cmd     = "psql -c 'ROLLBACK TO SAVEPOINT pre_deploy;'"
timeout = "2m"
retries = 1

Forensic tasks:

  • Must have type = "forensic"
  • Are excluded from normal DAG execution (not counted in levels)
  • Run only when wired via on_failure
  • Support all standard task fields (timeout, retries, retry_delay, env, clean_env, ignore_failure)
  • Have access to {{.failed_task}} and {{.error_message}} injected variables

Duration Syntax

timeout and retry_delay use Go duration syntax:

Example Meaning
30s 30 seconds
5m 5 minutes
1m30s 1 minute 30 seconds
2h 2 hours
24h 24 hours

Complete Example

name        = "data-pipeline"
description = "Nightly data processing pipeline"
tags        = ["etl", "nightly", "data"]
on_failure  = "notify-failure"

# ── Setup ─────────────────────────────────────────────────

[tasks.init]
name    = "Initialise Run"
cmd     = """
DATE=$(date +%Y-%m-%d)
echo "Starting pipeline for $DATE"
echo "$DATE"
"""
register = "run_date"

# ── Parallel extraction ────────────────────────────────────

[tasks.extract-api]
name           = "Extract from API"
cmd            = "python extract_api.py --date={{.run_date}} && echo 12500"
register       = "api_rows"
depends_on     = ["init"]
ignore_failure = true
retries        = 2
retry_delay    = "30s"
timeout        = "15m"
env            = {API_TOKEN = "{{.run_date}}", MAX_RETRIES = "3"}

[tasks.extract-db]
name           = "Extract from Database"
cmd            = "python extract_db.py --date={{.run_date}} && echo 84200"
register       = "db_rows"
depends_on     = ["init"]
ignore_failure = true
timeout        = "20m"
clean_env      = true

# ── Transform ─────────────────────────────────────────────

[tasks.transform]
name       = "Transform and Merge"
cmd        = """
python transform.py \
  --api-rows={{.api_rows}} \
  --db-rows={{.db_rows}} \
  --date={{.run_date}}
echo 96700
"""
register   = "merged_rows"
depends_on = ["extract-api", "extract-db"]
timeout    = "30m"
working_dir = "/data/pipeline"

# ── Load ──────────────────────────────────────────────────

[tasks.load]
name       = "Load to Warehouse"
cmd        = "python load.py --rows={{.merged_rows}}"
depends_on = ["transform"]
retries    = 3
retry_delay = "1m"
timeout    = "1h"

# ── Validate ──────────────────────────────────────────────

[tasks.validate]
name       = "Validate Row Count"
cmd        = "echo 'Loaded {{.merged_rows}} rows successfully'"
depends_on = ["load"]
if         = 'merged_rows > "0"'

# ── Forensic handler ──────────────────────────────────────

[tasks.notify-failure]
type           = "forensic"
cmd            = """
echo "Pipeline failed at task: {{.failed_task}}"
echo "Error: {{.error_message}}"
curl -s $SLACK_WEBHOOK -d "{\"text\": \"ETL failed: {{.failed_task}}\"}"
"""
ignore_failure = true
timeout        = "30s"