Workflows
A workflow chains multiple nodes into a pipeline. Expanse automatically resolves dependencies from each node’s inputs.from fields and handles data transfer between stages, including across clusters.
Workflow YAML Structure
name: my-pipeline
kind: workflow
stages:
- name: generate
ref: nodes/generator
cluster: local
- name: process
ref: nodes/processor
cluster: archer2
- name: report
ref: nodes/reporter
cluster: local Field Reference
| Field | Required | Description |
|---|---|---|
name | Yes | Logical name of the workflow |
kind | Yes | Must be "workflow" |
env | No | Environment variables available to all stages |
preflight | No | Preflight validation settings |
stages | Yes | Ordered list of steps to execute |
Stage Fields
| Field | Required | Description |
|---|---|---|
name | Yes | Human-readable label for the stage (used in logs and the CLI progress display) |
ref | Yes | Path to the node directory, relative to the project root |
cluster | No | Overrides the default cluster for this stage only |
Preflight Configuration
Preflight checks validate that clusters are reachable and configurations are valid before submitting any jobs.
preflight:
enabled: true
concurrency: 3
fail_fast: true
timeout: "30s" | Field | Type | Default | Description |
|---|---|---|---|
enabled | bool | true | Run preflight checks before execution |
concurrency | int | - | Maximum parallel preflight checks |
fail_fast | bool | - | Stop on first preflight failure |
timeout | string | - | Timeout per preflight check (e.g. "30s", "1m") |
Examples
Simple Pipeline
Three local nodes executing in sequence:
name: simple-pipeline
kind: workflow
stages:
- name: generate
ref: nodes/generator
cluster: local
- name: process
ref: nodes/processor
cluster: local
- name: summarise
ref: nodes/reporter
cluster: local Cross-Cluster Workflow
Preprocess locally, run heavy compute on an HPC cluster, then collect results locally:
name: montecarlo-pipeline
kind: workflow
stages:
- name: prep
ref: nodes/prep
cluster: local
- name: simulate
ref: nodes/simulate
cluster: archer2
- name: aggregate
ref: nodes/aggregate
cluster: local When nodes target different clusters, Expanse transfers data between them automatically. No manual file management required.
Multi-Language Pipeline
Nodes written in different languages can exchange data directly:
name: multilang-pipeline
kind: workflow
stages:
- name: python-source
ref: nodes/python_source
cluster: local
- name: fortran-scale
ref: nodes/fortran_scale
cluster: local
- name: c-offset
ref: nodes/c_offset
cluster: local
- name: python-final
ref: nodes/python_final
cluster: local Workflow-Level Environment Variables
Set environment variables that apply to all stages:
name: training-pipeline
kind: workflow
env:
EXPERIMENT_ID: "exp-2024-001"
LOG_LEVEL: "info"
stages:
- name: train
ref: nodes/trainer
cluster: gpu-cluster
- name: evaluate
ref: nodes/evaluator
cluster: local Running Workflows
Run by workflow name (resolves from workflows/ directory):
expanse run simple-pipeline Or by explicit file path:
expanse run workflows/pipeline.yaml Run Flags
| Flag | Description |
|---|---|
--cluster, -c | Override the default cluster for all stages |
--no-cache | Skip build cache and force fresh builds |
--quiet, -q | Compact single-line progress output |
--verbose, -v | Show detailed progress including file transfers |
--no-preflight | Skip preflight builds; builds happen during execution instead |
--skip-validation | Skip pre-run configuration validation |
Examples
# Override cluster for all stages
expanse run simple-pipeline --cluster cirrus
# Force fresh builds
expanse run simple-pipeline --no-cache
# Verbose output with transfer details
expanse run simple-pipeline --verbose Execution Flow
When you run a workflow, Expanse:
- Parses the workflow YAML and validates all stage references
- Loads each node’s
node.yamlto understand inputs and outputs - Builds a dependency graph from
inputs.fromdeclarations - Validates configuration (unless
--skip-validationis set) - Runs preflight checks against target clusters
- Executes stages in dependency order. Each node goes through a four-step pipeline (setup, deps, build, run). The deps step installs dependencies when a manifest file (e.g.
requirements.txt,conanfile.txt) is detected. Data is transferred between clusters as needed. - Downloads outputs marked with
local_copyto your local machine
Reusing Nodes Across Workflows
The same node directory can be referenced by multiple workflows. Define your nodes once, then compose different pipelines:
nodes/
├── preprocess/
├── solver/
├── postprocess/
└── visualise/
workflows/
├── full-pipeline.yaml # All four nodes
├── quick-test.yaml # preprocess + solver only
└── visualise-only.yaml # postprocess + visualise