Skip to main content

Workflows

Workflows chain multiple nodes together into a pipeline. Expanse automatically resolves dependencies based on each node's inputs.from fields and handles data transfer between stages.

Workflow Folder Structure

Each project has a workflows/ directory containing one or more workflow definitions:

my-simulation/
├── expanse.yaml
├── project.yaml
├── nodes/
│ ├── preprocess/
│ ├── solver/
│ └── postprocess/
└── workflows/
├── full_pipeline.yaml # Complete end-to-end workflow
├── training.yaml # ML training workflow
├── evaluation.yaml # Model evaluation only
└── quick_test.yaml # Subset for testing

You can define as many workflow YAMLs per project as needed. Each is an independent workflow that can be run separately. The same nodes can be reused across multiple workflows by pointing different workflow stages at the same nodes/... directory.

Workflow YAML Structure

A workflow YAML is small and declarative:

name: <workflow-name>
kind: workflow
stages:
- name: <stage-name>
ref: <path-to-node>
cluster: <cluster-name>
FieldDescription
nameLogical name of the workflow
kindMust be workflow (distinguishes from nodes)
stagesOrdered list of steps to execute
stages[].nameHuman-readable label for the stage (used in logs/UI)
stages[].refPath to node directory or node.yaml, relative to project root
stages[].clusterOptional; overrides default cluster for this stage only

Workflow Examples

Simple Pipeline

# workflows/pipeline.yaml
name: simple-pipeline
kind: workflow
stages:
- name: generate
ref: nodes/generator
cluster: local
- name: process
ref: nodes/processor
cluster: local
- name: summarise
ref: nodes/reporter
cluster: local

Multi-Language Zero-Copy Pipeline

# workflows/multilang.yaml
name: multilang-zero-copy
kind: workflow
stages:
- name: python-source
ref: nodes/python_source
cluster: local
- name: fortran-scale
ref: nodes/fortran_scale
cluster: local
- name: c-offset
ref: nodes/c_offset
cluster: local
- name: python-square
ref: nodes/python_square
cluster: local
- name: fortran-finish
ref: nodes/fortran_finish
cluster: local

Cross-Cluster Workflow

# workflows/cross_cluster.yaml
name: montecarlo-pipeline
kind: workflow
stages:
- name: prep
ref: nodes/prep
cluster: local # Preprocess locally
- name: simulate
ref: nodes/simulate
cluster: archer2 # Heavy compute on HPC
- name: aggregate
ref: nodes/aggregate
cluster: local # Collect results locally

Running Workflows

Run workflows by name or path:

# Run by workflow name (from project root)
expanse run full_pipeline

# Run by explicit path
expanse run workflows/training.yaml

# Run with cluster override
expanse run full_pipeline --cluster cirrus

# Dry run to validate without executing
expanse run full_pipeline --dry-run

Execution Flow

When you run a workflow, Expanse:

  1. Parses the workflow YAML and validates all stage references
  2. Loads each node's node.yaml to understand inputs/outputs
  3. Builds an execution graph from inputs.from dependencies
  4. Executes stages in order, handling cross-cluster data transfer automatically
  5. Collects telemetry for future failure prediction
  6. Copies outputs with path: specified to results/

Data flow between stages is derived from each node's inputs.from fields; workflows are just lists of node references, and the wiring happens automatically based on the node definitions.

Campaigns (Coming Soon)

Campaigns enable parameter sweeps and ensemble runs; they execute the same workflow multiple times with varying inputs.

# workflows/parameter_sweep.yaml
name: hyperparameter-search
kind: campaign
base_workflow: workflows/training.yaml

parameters:
learning_rate: [0.001, 0.01, 0.1]
batch_size: [32, 64, 128]
dropout: [0.1, 0.2, 0.3]

strategy: grid # grid, random, or bayesian
max_parallel: 10 # Concurrent runs

This would generate 27 workflow runs (3 × 3 × 3 grid), executing up to 10 in parallel.

Use Cases

  • Hyperparameter tuning for ML models
  • Monte Carlo simulations with different seeds
  • Sensitivity analysis across parameter ranges
  • Ensemble forecasting
note

Campaigns are not yet implemented. This documents the planned feature for future releases.