Skip to main content

Node Registry

Nodes are the building blocks of Expanse workflows. Each node represents a computational step that can be executed on a cluster.

Node Types

Expanse supports two primary node patterns:

  • Source Code Nodes: A folder containing node.yaml plus source files. Used when you have code that needs to be built or run directly (Python scripts, C/Fortran programs with Makefiles).

  • Black Box Solver Nodes: A standalone node.yaml with a bash command pointing to an existing executable. Used for commercial solvers or pre-built binaries where you just need to wrap an existing tool.

At the YAML level, all nodes use kind: node with type: command. The distinction is in how the command is structured and whether a build step is required.

Project Structure

my-simulation/
├── expanse.yaml # Cluster configuration
├── nodes/
│ ├── preprocess/ # Source code node (folder)
│ │ ├── node.yaml
│ │ ├── pyproject.toml # Python dependency spec
│ │ └── main.py
│ ├── solver/ # Source code node (folder)
│ │ ├── node.yaml
│ │ ├── Makefile # C/Fortran build
│ │ └── solver.f90
│ ├── postprocess/ # Source code node (folder)
│ │ ├── node.yaml
│ │ ├── Makefile
│ │ └── analyse.c
│ └── commercial_cfd.yaml # Black box solver (single file)
└── workflows/
└── full_pipeline.yaml

Node Folder Requirements

Each source code node folder requires specific files depending on the language runtime:

LanguageRequired FilesDescription
Pythonpyproject.toml or requirements.txtDependency specification for pip install
CMakefileBuild rules producing executable in bin/
FortranMakefileBuild rules producing executable in bin/
Shell/BashNoneDirect command execution, no build step

Node Configuration (node.yaml)

Each node requires a node.yaml that defines how it runs and what data it consumes/produces.

Source Code Node Example (Fortran)

# nodes/solver/node.yaml
name: solver
kind: node
type: command
cluster: archer2 # Default cluster for this node

# Build step (optional) - runs once, cached by content hash
build:
command: make -C nodes/solver

# Run step (required)
run:
command: ./bin/solver
args: []

# HPC modules to load before execution
modules:
- cray-fftw
- cray-hdf5

# Resource requirements
resources:
nodes: 4
cpus_per_node: 128
walltime: "02:00:00"
partition: standard

inputs:
- name: mesh
from: preprocess/mesh # Reference to previous node's output
type: array[float64, 2]
required: true # Optional, defaults to true
- name: params
from: data/simulation.json # Or reference project data/ folder
type: json

outputs:
- name: solution
type: array[float64, 2]
path: solution.bin # Optional user-visible filename for results/
- name: residuals
type: array[float64, 1]

Black Box Solver Node Example

# nodes/commercial_cfd.yaml
name: commercial_cfd
kind: node
type: command

run:
command: sh -c "starccm+ -batch run.java -np $SLURM_NTASKS"
args: []

modules:
- starccm+/2024.1

resources:
nodes: 8
cpus_per_node: 64
walltime: "12:00:00"

inputs:
- name: mesh
from: mesher/volume_mesh
type: file

outputs:
- name: results
type: file
path: output.csv

Input/Output Wiring

Inputs

FieldDescription
nameLocal name for this input inside the node
fromEither <producer_node>/<output_name> (e.g., preprocess/mesh) or data/... for project data folder
typeLogical type string (e.g., array[float64, 2], json, file)
requiredOptional boolean, defaults to true

Outputs

FieldDescription
nameLogical output name, becomes <node_name>/<output_name> in the artifact registry
typeLogical type string
pathOptional user-visible filename to copy into results/

From the node's code perspective:

  • Inputs arrive under EXPANSE_INPUTS environment variable (Arrow IPC files)
  • Outputs should be written via the language runtimes to EXPANSE_ARTIFACT_DIR

Registry Commands

Manage nodes locally and with the central registry:

CommandDescription
expanse nodes listList all nodes in the current project
expanse nodes validateValidate all node configurations
expanse nodes show <n>Display details for a specific node
expanse nodes deps <n>Show dependency graph for a node
expanse nodes push <n>Publish a node to the central registry
expanse nodes pull <ref>Fetch a node from registry into your project
expanse nodes publish <n>Release a new version of a registered node

Node Registry: References vs Copies

When working with the central node registry, you can choose between two modes:

Node Reference

# In workflow or node.yaml
inputs:
- name: preprocessed
from: registry://expanse/standard-preprocessor@v2.1/output
  • Points to a registry identifier (e.g., registry://org/node@version)
  • Node code lives in the registry, cached locally on first use
  • Automatically receives updates when you bump the version reference
  • Best for: shared utilities, standard preprocessing steps, organisation-wide tools

Node Copy (Vendored)

expanse nodes pull registry://expanse/cfd-preprocessor@v2.1
  • Creates a physical nodes/cfd-preprocessor/ folder in your project
  • Full copy of node.yaml plus all source code
  • From then on, it's a local node you control completely
  • Best for: nodes you need to modify, audit requirements, offline usage
note

The node registry service is under development. Currently, all nodes are local files under nodes/. The artifact registry (which wires from: references to actual files during execution) is fully operational.