Generic HPC¶
For a cluster without a dedicated page, start with the Superbuild for CPU-only work if your site modules provide a compatible compiler, MPI, and BLAS/LAPACK.
Use the shared HPC build chain when the Superbuild is not suitable or when building CUDA/HIP variants that require explicit Kokkos and architecture configuration. Substitute your site's modules, compilers, and scheduler.
Kokkos build per backend¶
Build Kokkos once per backend you need, into its own tree under deps/kokkos/:
| Backend | Key CMake flags | Notes |
|---|---|---|
| Serial (CPU) | -DKokkos_ENABLE_SERIAL=ON -DCMAKE_POSITION_INDEPENDENT_CODE=ON |
host compiler via -DCMAKE_CXX_COMPILER |
| CUDA (NVIDIA) | -DCMAKE_CXX_COMPILER=clang++ -DKokkos_ENABLE_CUDA=ON |
cmake ≥ 3.20, clang ≥ 12, CUDA ≥ 11 |
| HIP (AMD) | -DCMAKE_CXX_COMPILER=hipcc -DKokkos_ENABLE_HIP=ON -DKokkos_ENABLE_ROCM=ON -DKokkos_ARCH_<DEV>=ON |
set CRAYPE_LINK_TYPE=dynamic on Cray |
Set the architecture macro for your device (see the Kokkos arch table). Example HIP build for an MI250X:
cd deps/kokkos && mkdir buildhip && cd buildhip
cmake .. -DCMAKE_CXX_COMPILER=hipcc -DKokkos_ENABLE_HIP=ON -DKokkos_ENABLE_ROCM=ON \
-DKokkos_ARCH_VEGA90A=ON -DCMAKE_INSTALL_PREFIX=../buildhip
make install
Exasim configure matrix¶
Select the variants with the EXASIM_* options (they compose — you can build
several backends in one configure). See the full
options table.
| Target variant | Options |
|---|---|
| Serial | -DEXASIM_NOMPI=ON |
| MPI | -DEXASIM_MPI=ON |
| Serial + MPI | -DEXASIM_NOMPI=ON -DEXASIM_MPI=ON |
| Serial + CUDA | -DEXASIM_NOMPI=ON -DEXASIM_CUDA=ON |
| MPI + CUDA | -DEXASIM_MPI=ON -DEXASIM_CUDA=ON |
| Serial + HIP | -DEXASIM_NOMPI=ON -DEXASIM_HIP=ON |
| MPI + HIP | -DEXASIM_MPI=ON -DEXASIM_HIP=ON |
CUDA builds use -DCMAKE_CXX_COMPILER=clang++; HIP builds use hipcc (or
mpiamdclang++ for MPI+HIP on Cray). Point -DKokkos_DIR at the matching Kokkos
tree built above.
Running¶
| Variant | Launch |
|---|---|
| Serial | ./cpuEXASIM 1 datain/ dataout/out |
| CUDA / HIP, one GPU | ./gpuEXASIM 1 datain/ dataout/out |
| MPI | mpirun -np <N> ./cpumpiEXASIM 1 datain/ dataout/out |
| CUDA, many GPUs (LSF) | jsrun --smpiargs="-gpu" -n1 -a4 -c4 -g4 ./gpumpiEXASIM 1 datain/ dataout/out |
| HIP, many GPUs (Flux) | flux run -N2 -n4 -g1 -o gpu-affinity=per-task --exclusive ./gpumpiEXASIM 1 datain/ dataout/out |
Set MPICH_GPU_SUPPORT_ENABLED=1 for GPU-aware MPI. Replace <N> with the rank
count from your pdeapp.txt.
Note
Out-of-tree consumers (the Application Modes) launch
their own executable — e.g. mpirun -np <N> build/consumer_builtin pdeapp.txt
— rather than the in-tree cpumpiEXASIM binaries shown above.