Workflows

A workflow chains multiple nodes into a pipeline. Expanse automatically resolves dependencies from each node’s inputs.from fields and handles data transfer between stages, including across clusters.

Workflow YAML Structure

name: my-pipeline
kind: workflow

stages:
  - name: generate
    ref: nodes/generator
    cluster: local

  - name: process
    ref: nodes/processor
    cluster: archer2

  - name: report
    ref: nodes/reporter
    cluster: local

Field Reference

FieldRequiredDescription
nameYesLogical name of the workflow
kindYesMust be "workflow"
envNoEnvironment variables available to all stages
preflightNoPreflight validation settings
stagesYesOrdered list of steps to execute

Stage Fields

FieldRequiredDescription
nameYesHuman-readable label for the stage (used in logs and the CLI progress display)
refYesPath to the node directory, relative to the project root
clusterNoOverrides the default cluster for this stage only

Preflight Configuration

Preflight checks validate that clusters are reachable and configurations are valid before submitting any jobs.

preflight:
  enabled: true
  concurrency: 3
  fail_fast: true
  timeout: "30s"
FieldTypeDefaultDescription
enabledbooltrueRun preflight checks before execution
concurrencyint-Maximum parallel preflight checks
fail_fastbool-Stop on first preflight failure
timeoutstring-Timeout per preflight check (e.g. "30s", "1m")

Examples

Simple Pipeline

Three local nodes executing in sequence:

name: simple-pipeline
kind: workflow

stages:
  - name: generate
    ref: nodes/generator
    cluster: local

  - name: process
    ref: nodes/processor
    cluster: local

  - name: summarise
    ref: nodes/reporter
    cluster: local

Cross-Cluster Workflow

Preprocess locally, run heavy compute on an HPC cluster, then collect results locally:

name: montecarlo-pipeline
kind: workflow

stages:
  - name: prep
    ref: nodes/prep
    cluster: local

  - name: simulate
    ref: nodes/simulate
    cluster: archer2

  - name: aggregate
    ref: nodes/aggregate
    cluster: local

When nodes target different clusters, Expanse transfers data between them automatically. No manual file management required.

Multi-Language Pipeline

Nodes written in different languages can exchange data directly:

name: multilang-pipeline
kind: workflow

stages:
  - name: python-source
    ref: nodes/python_source
    cluster: local

  - name: fortran-scale
    ref: nodes/fortran_scale
    cluster: local

  - name: c-offset
    ref: nodes/c_offset
    cluster: local

  - name: python-final
    ref: nodes/python_final
    cluster: local

Workflow-Level Environment Variables

Set environment variables that apply to all stages:

name: training-pipeline
kind: workflow

env:
  EXPERIMENT_ID: "exp-2024-001"
  LOG_LEVEL: "info"

stages:
  - name: train
    ref: nodes/trainer
    cluster: gpu-cluster

  - name: evaluate
    ref: nodes/evaluator
    cluster: local

Running Workflows

Run by workflow name (resolves from workflows/ directory):

expanse run simple-pipeline

Or by explicit file path:

expanse run workflows/pipeline.yaml

Run Flags

FlagDescription
--cluster, -cOverride the default cluster for all stages
--no-cacheSkip build cache and force fresh builds
--quiet, -qCompact single-line progress output
--verbose, -vShow detailed progress including file transfers
--no-preflightSkip preflight builds; builds happen during execution instead
--skip-validationSkip pre-run configuration validation

Examples

# Override cluster for all stages
expanse run simple-pipeline --cluster cirrus

# Force fresh builds
expanse run simple-pipeline --no-cache

# Verbose output with transfer details
expanse run simple-pipeline --verbose

Execution Flow

When you run a workflow, Expanse:

  1. Parses the workflow YAML and validates all stage references
  2. Loads each node’s node.yaml to understand inputs and outputs
  3. Builds a dependency graph from inputs.from declarations
  4. Validates configuration (unless --skip-validation is set)
  5. Runs preflight checks against target clusters
  6. Executes stages in dependency order. Each node goes through a four-step pipeline (setup, deps, build, run). The deps step installs dependencies when a manifest file (e.g. requirements.txt, conanfile.txt) is detected. Data is transferred between clusters as needed.
  7. Downloads outputs marked with local_copy to your local machine

Reusing Nodes Across Workflows

The same node directory can be referenced by multiple workflows. Define your nodes once, then compose different pipelines:

nodes/
├── preprocess/
├── solver/
├── postprocess/
└── visualise/

workflows/
├── full-pipeline.yaml        # All four nodes
├── quick-test.yaml           # preprocess + solver only
└── visualise-only.yaml       # postprocess + visualise