Scalability¶

Scalability measures how Exasim performance changes as the problem size, processor count, GPU count, polynomial order, or solver difficulty changes.

Strong And Weak Scaling¶

Scaling type	Definition	Main bottleneck
Strong scaling	Fixed global problem size, more ranks/devices	Communication and synchronization dominate as local work shrinks.
Weak scaling	Problem size grows with rank/device count	Partition quality, communication surface area, and solver iteration growth.

Complexity Drivers¶

The dominant costs depend on discretization and physics:

number of elements;
polynomial order and quadrature order;
number of state, gradient, auxiliary, and trace variables;
nonlinear iterations;
GMRES iterations and restart length;
preconditioner setup/application cost;
output frequency and filesystem load.

LDG Scalability¶

LDG avoids storing a full global Jacobian, which helps memory scaling. The cost is repeated residual evaluation for matrix-free Krylov products. LDG scales best when:

element-local residual work dominates communication;
GMRES iteration count is controlled by preconditioning;
matrix-vector products remain backend-resident;
output frequency is modest.

HDG Scalability¶

HDG reduces globally coupled unknowns to trace unknowns. This often improves scalability for high-order diffusion and mixed systems, but local matrix assembly, static condensation, and preconditioner storage can be significant. HDG scales best when:

trace DOF reduction offsets local matrix work;
preconditioners reduce GMRES iterations robustly;
subdomain partitions minimize inter-rank trace faces;
local dense algebra is efficient on the target CPU/GPU.

Memory Footprint¶

Memory is consumed by:

solution and residual arrays;
Krylov vectors, roughly proportional to GMRESrestart + 1;
HDG local matrices and condensed trace blocks;
preconditioner storage;
temporary buffers for fluxes, sources, visualization, and QoI;
MPI halo/trace communication buffers.

On GPUs, memory capacity can be the limiting constraint before raw FLOP/s.

Practical Scaling Workflow¶

Validate the model on one rank/device.
Run a small MPI case and compare residuals and outputs.
Increase polynomial order and mesh size separately to identify the main cost.
Measure GMRES iterations before changing hardware scale.
Tune preconditioner and GMRESrestart.
Scale ranks/devices only after the single-rank algorithm is numerically stable.
Reduce output frequency when filesystem time becomes visible.

Performance Metrics To Track¶

time per Newton iteration;
time per GMRES solve;
number of GMRES iterations;
preconditioner setup and application time;
matrix-vector product time;
residual assembly time;
communication time if available;
output time and file count.