Note

Go to the end to download the full example code.

Training an FNO on Darcy-Flow

We train a Fourier Neural Operator (FNO) on our small Darcy-Flow example.

This tutorial demonstrates the complete workflow of training a neural operator: 1. Loading and preprocessing the Darcy-Flow dataset 2. Creating an FNO model architecture 3. Setting up training components (optimizer, scheduler, losses) 4. Training the model 5. Evaluating predictions and zero-shot super-resolution

Note that this dataset is much smaller than one we would use in practice. The small Darcy-flow is an example built to be trained on a CPU in a few seconds, whereas normally we would train on one or multiple GPUs.

The FNO’s key advantage is its resolution invariance - it can make predictions at different resolutions without retraining, which we will demonstrate in the zero-shot super-resolution section.

Import dependencies

We import the necessary modules from neuralop for training a Fourier Neural Operator

import torch
import matplotlib.pyplot as plt
import sys
from neuralop.models import FNO
from neuralop import Trainer
from neuralop.training import AdamW
from neuralop.data.datasets import load_darcy_flow_small
from neuralop.utils import count_model_params
from neuralop import LpLoss, H1Loss

device = "cpu"

Loading the Darcy-Flow dataset

We load the small Darcy-Flow dataset with multiple resolutions for training and testing. The dataset contains permeability fields (input) and pressure fields (output).

train_loader, test_loaders, data_processor = load_darcy_flow_small(
    n_train=1000,
    batch_size=64,
    n_tests=[100, 50],
    test_resolutions=[16, 32],
    test_batch_sizes=[32, 32],
)
data_processor = data_processor.to(device)

Loading test db for resolution 16 with 100 samples
Loading test db for resolution 32 with 50 samples

Creating the FNO model

model = FNO(
    n_modes=(8, 8),
    in_channels=1,
    out_channels=1,
    hidden_channels=24,
    projection_channel_ratio=2,
)
model = model.to(device)

# Count and display the number of parameters
n_params = count_model_params(model)
print(f"\nOur model has {n_params} parameters.")
sys.stdout.flush()

Our model has 191881 parameters.

Creating the optimizer and scheduler

We use AdamW optimizer with weight decay for regularization

optimizer = AdamW(model.parameters(), lr=1e-2, weight_decay=1e-4)
scheduler = torch.optim.lr_scheduler.CosineAnnealingLR(optimizer, T_max=30)

Setting up loss functions

We use H1 loss for training and L2 loss for evaluation H1 loss is particularly good for PDE problems as it penalizes both function values and gradients

l2loss = LpLoss(d=2, p=2)  # L2 loss for function values
h1loss = H1Loss(d=2)  # H1 loss includes gradient information

train_loss = h1loss
eval_losses = {"h1": h1loss, "l2": l2loss}

Training the model

We display the training configuration and then train the model

print("\n### MODEL ###\n", model)
print("\n### OPTIMIZER ###\n", optimizer)
print("\n### SCHEDULER ###\n", scheduler)
print("\n### LOSSES ###")
print(f"\n * Train: {train_loss}")
print(f"\n * Test: {eval_losses}")
sys.stdout.flush()

### MODEL ###
 FNO(
  (positional_embedding): GridEmbeddingND()
  (fno_blocks): FNOBlocks(
    (convs): ModuleList(
      (0-3): 4 x SpectralConv(
        (weight): DenseTensor(shape=torch.Size([24, 24, 8, 5]), rank=None)
      )
    )
    (fno_skips): ModuleList(
      (0-3): 4 x Flattened1dConv(
        (conv): Conv1d(24, 24, kernel_size=(1,), stride=(1,), bias=False)
      )
    )
    (channel_mlp): ModuleList(
      (0-3): 4 x ChannelMLP(
        (fcs): ModuleList(
          (0): Conv1d(24, 12, kernel_size=(1,), stride=(1,))
          (1): Conv1d(12, 24, kernel_size=(1,), stride=(1,))
        )
      )
    )
    (channel_mlp_skips): ModuleList(
      (0-3): 4 x SoftGating()
    )
  )
  (lifting): ChannelMLP(
    (fcs): ModuleList(
      (0): Conv1d(3, 48, kernel_size=(1,), stride=(1,))
      (1): Conv1d(48, 24, kernel_size=(1,), stride=(1,))
    )
  )
  (projection): ChannelMLP(
    (fcs): ModuleList(
      (0): Conv1d(24, 48, kernel_size=(1,), stride=(1,))
      (1): Conv1d(48, 1, kernel_size=(1,), stride=(1,))
    )
  )
)

### OPTIMIZER ###
 AdamW (
Parameter Group 0
    betas: (0.9, 0.999)
    correct_bias: True
    eps: 1e-06
    initial_lr: 0.01
    lr: 0.01
    weight_decay: 0.0001
)

### SCHEDULER ###
 <torch.optim.lr_scheduler.CosineAnnealingLR object at 0x7fb5e77fe710>

### LOSSES ###

 * Train: <neuralop.losses.data_losses.H1Loss object at 0x7fb5e77fe0d0>

 * Test: {'h1': <neuralop.losses.data_losses.H1Loss object at 0x7fb5e77fe0d0>, 'l2': <neuralop.losses.data_losses.LpLoss object at 0x7fb5e77ff9d0>}

Creating the trainer

We create a Trainer object that handles the training loop, evaluation, and logging

trainer = Trainer(
    model=model,
    n_epochs=15,
    device=device,
    data_processor=data_processor,
    wandb_log=False,  # Disable Weights & Biases logging for this tutorial
    eval_interval=5,  # Evaluate every 5 epochs
    use_distributed=False,  # Single GPU/CPU training
    verbose=True,  # Print training progress
)

Training the model

We train the model on our Darcy-Flow dataset. The trainer will: 1. Run the forward pass through the FNO 2. Compute the H1 loss 3. Backpropagate and update weights 4. Evaluate on test data every 3 epochs

trainer.train(
    train_loader=train_loader,
    test_loaders=test_loaders,
    optimizer=optimizer,
    scheduler=scheduler,
    regularizer=False,
    training_loss=train_loss,
    eval_losses=eval_losses,
)

Training on 1000 samples
Testing on [50, 50] samples         on resolutions [16, 32].
/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/torch/utils/data/dataloader.py:668: UserWarning: 'pin_memory' argument is set as true but no accelerator is found, then device pinned memory won't be used.
  warnings.warn(warn_msg)
/opt/hostedtoolcache/Python/3.13.7/x64/lib/python3.13/site-packages/torch/nn/modules/module.py:1786: UserWarning: FNO.forward() received unexpected keyword arguments: ['y']. These arguments will be ignored.
  return forward_call(*args, **kwargs)
Raw outputs of shape torch.Size([64, 1, 16, 16])
/home/runner/work/neuraloperator/neuraloperator/neuralop/training/trainer.py:536: UserWarning: H1Loss.__call__() received unexpected keyword arguments: ['x']. These arguments will be ignored.
  loss += training_loss(out, **sample)
[0] time=1.03, avg_loss=0.7617, train_err=47.6062
/home/runner/work/neuraloperator/neuraloperator/neuralop/training/trainer.py:581: UserWarning: LpLoss.__call__() received unexpected keyword arguments: ['x']. These arguments will be ignored.
  val_loss = loss(out, **sample)
Eval: 16_h1=0.5218, 16_l2=0.3231, 32_h1=0.6226, 32_l2=0.3236
[5] time=1.00, avg_loss=0.2588, train_err=16.1757
Eval: 16_h1=0.2487, 16_l2=0.1704, 32_h1=0.3766, 32_l2=0.1918
[10] time=1.00, avg_loss=0.1991, train_err=12.4459
Eval: 16_h1=0.2050, 16_l2=0.1331, 32_h1=0.3422, 32_l2=0.1616

{'train_err': 11.168464809656143, 'avg_loss': 0.1786954369544983, 'avg_lasso_loss': None, 'epoch_train_time': 1.0058393670000214}

Visualizing predictions

Let’s take a look at what our model’s predicted outputs look like. We wll compare the inputs, ground-truth outputs, and model predictions side by side.

Note that in this example, we train on a very small resolution for a very small number of epochs. In practice, we would train at a larger resolution on many more samples.

test_samples = test_loaders[16].dataset

fig = plt.figure(figsize=(7, 7))
for index in range(3):
    data = test_samples[index]
    data = data_processor.preprocess(data, batched=False)

    # Input
    x = data["x"]
    # Ground-truth output
    y = data["y"]
    # Model prediction
    out = model(x.unsqueeze(0))

    # Plot input
    ax = fig.add_subplot(3, 3, index * 3 + 1)
    ax.imshow(x[0], cmap="gray")
    if index == 0:
        ax.set_title("Input x")
    plt.xticks([], [])
    plt.yticks([], [])

    # Plot ground-truth output
    ax = fig.add_subplot(3, 3, index * 3 + 2)
    ax.imshow(y.squeeze())
    if index == 0:
        ax.set_title("Ground-truth output")
    plt.xticks([], [])
    plt.yticks([], [])

    # Plot model prediction
    ax = fig.add_subplot(3, 3, index * 3 + 3)
    ax.imshow(out.squeeze().detach().numpy())
    if index == 0:
        ax.set_title("Model prediction")
    plt.xticks([], [])
    plt.yticks([], [])

fig.suptitle("FNO predictions on 16x16 Darcy-Flow data", y=0.98)
plt.tight_layout()
fig.show()

FNO predictions on 16x16 Darcy-Flow data, Input x, Ground-truth output, Model prediction

Zero-shot super-resolution evaluation

One of the key advantages of neural operators is their resolution invariance. The FNO’s invariance to the discretization of input data means we can natively make predictions on higher-resolution inputs and get higher-resolution outputs without retraining the model!

test_samples = test_loaders[32].dataset

fig = plt.figure(figsize=(7, 7))
for index in range(3):
    data = test_samples[index]
    data = data_processor.preprocess(data, batched=False)

    # Input at higher-resolution
    x = data["x"]
    # Ground-truth output at higher-resolution
    y = data["y"]
    # Model prediction at higher-resolution
    out = model(x.unsqueeze(0))

    # Plot input at higher-resolution
    ax = fig.add_subplot(3, 3, index * 3 + 1)
    ax.imshow(x[0], cmap="gray")
    if index == 0:
        ax.set_title("Input at 32x32")
    plt.xticks([], [])
    plt.yticks([], [])

    # Plot ground-truth output at higher-resolution
    ax = fig.add_subplot(3, 3, index * 3 + 2)
    ax.imshow(y.squeeze())
    if index == 0:
        ax.set_title("Ground-truth at 32x32")
    plt.xticks([], [])
    plt.yticks([], [])

    # Plot model prediction at higher-resolution
    ax = fig.add_subplot(3, 3, index * 3 + 3)
    ax.imshow(out.squeeze().detach().numpy())
    if index == 0:
        ax.set_title("FNO prediction at 32x32")
    plt.xticks([], [])
    plt.yticks([], [])

fig.suptitle("Zero-shot super-resolution: 16x16 → 32x32", y=0.98)
plt.tight_layout()
fig.show()

Zero-shot super-resolution: 16x16 → 32x32, Input at 32x32, Ground-truth at 32x32, FNO prediction at 32x32

Understanding zero-shot super-resolution

We only trained the model on data at a resolution of 16x16, and with no modifications or special prompting, we were able to perform inference on higher-resolution input data and get higher-resolution predictions! This is a powerful capability of neural operators.

In practice, we often want to evaluate neural operators at multiple resolutions to track a model’s zero-shot super-resolution performance throughout training. That’s why many of our datasets, including the small Darcy-flow we showcased, are parameterized with a list of test_resolutions to choose from.

Note: These predictions may be noisier than we would expect for a model evaluated at the same resolution at which it was trained. This is because the model hasn’t seen the higher-frequency patterns present in the 32x32 data during training. However, this demonstrates the fundamental resolution invariance of neural operators.

Total running time of the script: (0 minutes 16.839 seconds)

Gallery generated by Sphinx-Gallery