Node Registry
Nodes are the building blocks of Expanse workflows. Each node represents a computational step that can be executed on a cluster.
Node Types
Expanse supports two primary node patterns:
-
Source Code Nodes: A folder containing
node.yamlplus source files. Used when you have code that needs to be built or run directly (Python scripts, C/Fortran programs with Makefiles). -
Black Box Solver Nodes: A standalone
node.yamlwith a bash command pointing to an existing executable. Used for commercial solvers or pre-built binaries where you just need to wrap an existing tool.
At the YAML level, all nodes use kind: node with type: command. The distinction is in how the command is structured and whether a build step is required.
Project Structure
my-simulation/
├── expanse.yaml # Cluster configuration
├── nodes/
│ ├── preprocess/ # Source code node (folder)
│ │ ├── node.yaml
│ │ ├── pyproject.toml # Python dependency spec
│ │ └── main.py
│ ├── solver/ # Source code node (folder)
│ │ ├── node.yaml
│ │ ├── Makefile # C/Fortran build
│ │ └── solver.f90
│ ├── postprocess/ # Source code node (folder)
│ │ ├── node.yaml
│ │ ├── Makefile
│ │ └── analyse.c
│ └── commercial_cfd.yaml # Black box solver (single file)
└── workflows/
└── full_pipeline.yaml
Node Folder Requirements
Each source code node folder requires specific files depending on the language runtime:
| Language | Required Files | Description |
|---|---|---|
| Python | pyproject.toml or requirements.txt | Dependency specification for pip install |
| C | Makefile | Build rules producing executable in bin/ |
| Fortran | Makefile | Build rules producing executable in bin/ |
| Shell/Bash | None | Direct command execution, no build step |
Node Configuration (node.yaml)
Each node requires a node.yaml that defines how it runs and what data it consumes/produces.
Source Code Node Example (Fortran)
# nodes/solver/node.yaml
name: solver
kind: node
type: command
cluster: archer2 # Default cluster for this node
# Build step (optional) - runs once, cached by content hash
build:
command: make -C nodes/solver
# Run step (required)
run:
command: ./bin/solver
args: []
# HPC modules to load before execution
modules:
- cray-fftw
- cray-hdf5
# Resource requirements
resources:
nodes: 4
cpus_per_node: 128
walltime: "02:00:00"
partition: standard
inputs:
- name: mesh
from: preprocess/mesh # Reference to previous node's output
type: array[float64, 2]
required: true # Optional, defaults to true
- name: params
from: data/simulation.json # Or reference project data/ folder
type: json
outputs:
- name: solution
type: array[float64, 2]
path: solution.bin # Optional user-visible filename for results/
- name: residuals
type: array[float64, 1]
Black Box Solver Node Example
# nodes/commercial_cfd.yaml
name: commercial_cfd
kind: node
type: command
run:
command: sh -c "starccm+ -batch run.java -np $SLURM_NTASKS"
args: []
modules:
- starccm+/2024.1
resources:
nodes: 8
cpus_per_node: 64
walltime: "12:00:00"
inputs:
- name: mesh
from: mesher/volume_mesh
type: file
outputs:
- name: results
type: file
path: output.csv
Input/Output Wiring
Inputs
| Field | Description |
|---|---|
name | Local name for this input inside the node |
from | Either <producer_node>/<output_name> (e.g., preprocess/mesh) or data/... for project data folder |
type | Logical type string (e.g., array[float64, 2], json, file) |
required | Optional boolean, defaults to true |
Outputs
| Field | Description |
|---|---|
name | Logical output name, becomes <node_name>/<output_name> in the artifact registry |
type | Logical type string |
path | Optional user-visible filename to copy into results/ |
From the node's code perspective:
- Inputs arrive under
EXPANSE_INPUTSenvironment variable (Arrow IPC files) - Outputs should be written via the language runtimes to
EXPANSE_ARTIFACT_DIR
Registry Commands
Manage nodes locally and with the central registry:
| Command | Description |
|---|---|
expanse nodes list | List all nodes in the current project |
expanse nodes validate | Validate all node configurations |
expanse nodes show <n> | Display details for a specific node |
expanse nodes deps <n> | Show dependency graph for a node |
expanse nodes push <n> | Publish a node to the central registry |
expanse nodes pull <ref> | Fetch a node from registry into your project |
expanse nodes publish <n> | Release a new version of a registered node |
Node Registry: References vs Copies
When working with the central node registry, you can choose between two modes:
Node Reference
# In workflow or node.yaml
inputs:
- name: preprocessed
from: registry://expanse/standard-preprocessor@v2.1/output
- Points to a registry identifier (e.g.,
registry://org/node@version) - Node code lives in the registry, cached locally on first use
- Automatically receives updates when you bump the version reference
- Best for: shared utilities, standard preprocessing steps, organisation-wide tools
Node Copy (Vendored)
expanse nodes pull registry://expanse/cfd-preprocessor@v2.1
- Creates a physical
nodes/cfd-preprocessor/folder in your project - Full copy of
node.yamlplus all source code - From then on, it's a local node you control completely
- Best for: nodes you need to modify, audit requirements, offline usage
The node registry service is under development. Currently, all nodes are local files under nodes/. The artifact registry (which wires from: references to actual files during execution) is fully operational.