Block-diagonal Jacobian of the LDG residual¶
The exact block-diagonal Jacobian \(J(U)=\partial R(U)/\partial U\) of the LDG residual, one dense block per element, and how it is assembled through the discretization pipeline. The pointwise Jacobian blocks correspond to the HDG Jacobian methods in the model contract.
1. Summary¶
For the LDG path in Exasim, the exact block-diagonal Jacobian of
with \(U\) the global vector of the degrees of freedom of \(u_h\), is most naturally implemented as one dense element block per element:
where each block
has size \((npe \cdot ncu) \times (npe \cdot ncu)\), with \(R_K\) the residual rows belonging to element \(K\).
In the current Exasim backend, the cleanest exact implementation is not to derive and assemble a separate sparse Jacobian by hand, but to reuse the existing residual tangent path:
-
backend/Discretization/residual.cpp -
backend/Discretization/qresidual.cpp -
backend/Discretization/uresidual.cpp -
backend/Discretization/getuhat.cpp -
backend/Discretization/matvec.cpp
Specifically:
-
if Enzyme AD is enabled, assemble each diagonal block column-by-column using
dResidual(...) -
otherwise, assemble each block column-by-column using
MatVec(...)with finite differences
That approach automatically includes all LDG chain-rule contributions through:
-
GetUhat(...) -
GetQ(...) -
GetW(...) -
GetAv(...) -
RuElem(...) -
RuFace(...)
2. Assumptions¶
-
\(U\) contains only the DOFs of \(u_h\), not the auxiliary variables \(q\), \(w\), or face trace
uh. -
The residual \(R(U)\) is exactly the one returned by
Residual(sol, res, app, driver_abi, master, mesh, tmp, common, handle, backend)inbackend/Discretization/residual.cpp. -
Therefore \(q\),
uh,wdg, and optional AV-related fields are implicit functions of \(U\), becauseResidual(...)recomputes them internally before assemblingres.Ru. -
A "block" means one element block of size: $\(n_{\mathrm{loc}} = npe \cdot ncu.\)$
-
The desired matrix is the exact block-diagonal of the full LDG Jacobian, not a simplified "volume-only" or "element-local-only" approximation.
-
I am focusing on the LDG residual path, not the HDG Schur-complement system.
-
I assume the block-diagonal Jacobian may later be inverted batched, for example for a block-Jacobi preconditioner.
3. Design reasoning¶
The key point is that the residual returned by Residual(...) is not just a function of the element-local \(u\). It is a composite map:
So the exact diagonal block must include all chain-rule paths created by the residual pipeline.
A. Residual structure used by Residual(...)¶
In serial, the LDG residual pipeline is:
-
GetUhat(...) -
GetQ(...) -
GetW(...)ifcommon.ncw > 0 -
GetAv(...)if AV is active and unfrozen -
RuElem(...) -
RuFace(...) -
PutFaceNodes(...) -
final scaling by
-1and optionally1/common.dtfactor
This is in backend/Discretization/residual.cpp.
So if I define the raw assembled residual before the final sign/time scaling by \(\widetilde R(U)\), then the residual returned by Residual(...) is
with
Therefore each diagonal block is
That final scalar factor must be included.
B. Element-block definition¶
Let \(U_K \in \mathbb{R}^{n_{\mathrm{loc}}}\) be the \(u_h\) DOFs of element \(K\), with
Let \(R_K(U)\) be the residual entries in res.Ru associated with element \(K\). Then
This is the exact diagonal block. It keeps all dependence of the residual rows of element \(K\) on the DOFs of the same element \(K\), while discarding all derivatives with respect to DOFs from other elements.
C. Chain rule through the LDG pipeline¶
The raw residual for element \(K\) can be viewed schematically as
Hence the exact diagonal block is
The face contribution expands further into:
This is the correct mathematical content of the diagonal block in Exasim.
D. Contribution from GetUhat(...)¶
GetUhat(...) is in backend/Discretization/getuhat.cpp.
For LDG:
-
on interior faces,
sol.uhis copied from one side's \(u\) -
on boundary faces,
sol.uhis produced byUbouDriver(...)
Therefore the derivative
can be nonzero for:
-
the element that owns the copied interior-face trace
-
any boundary face of element \(K\)
This derivative is implemented in the tangent path by:
-
GetdUhat(...) -
dUhatBlock(...)
in the same file.
That means the exact diagonal block must include the way a perturbation in \(U_K\) changes uh first, and then changes face residuals and \(q\)-recovery through that uh.
E. Contribution from GetQ(...)¶
GetQ(...) is in backend/Discretization/residual.cpp.
It computes \(q\) by:
-
RqElem(...) -
RqFace(...) -
PutFaceNodes(...) -
ApplyMinv(...) -
ArrayInsert(...)intosol.udg
From the implementation in backend/Discretization/qresidual.cpp, the elementwise operator is
where:
-
\(M_K\) is the local mass matrix inverse applied through
ApplyMinv(...) -
\(C_K\) is the element gradient operator from
RqElemBlock(...) -
\(E_K\) is the face lifting operator from
RqFaceBlock(...) -
\(S^q_K\) is the wave-only source contribution when
common.wave == 1
Therefore the exact local derivative is
plus any wave-specific term if active.
In the current backend, this derivative is not assembled explicitly as a matrix in the LDG path. Instead it is applied as a tangent operator by:
-
GetdQ(...) -
dRqElem(...) -
dRqFace(...) -
ApplyMinv(...)
in backend/Discretization/residual.cpp.
So GetdQ(...) is exactly the chain-rule implementation of \(dQ/dU\).
F. Contribution from GetW(...)¶
If common.ncw > 0, GetW(...) in backend/Discretization/residual.cpp computes the auxiliary field wdg.
This can happen through:
-
explicit time/update formulas
-
Newton solve of
Eos(w,u)=0 -
sourcew/DAE updates
Therefore the residual also contains a chain rule term
This term is conceptually required in the exact block diagonal.
However, the current AD tangent path in dRuResidual(...) contains:
// TODO: DAE functionality not implemented yet
So for cases with nontrivial \(w\) evolution, the exactness of the AD-based block assembly depends on the specific GetW(...) branch. This is an important implementation limitation.
G. Contribution from RuElem(...)¶
RuElemBlock(...) in backend/Discretization/uresidual.cpp computes the element-volume contribution.
At Gauss points it evaluates:
-
time contribution
-
SourceDriver(...) -
FluxDriver(...)
then applies geometry and basis testing through:
-
ApplyXx4(...) -
Gauss2Node(...)
So the elemental Jacobian contribution is
In the current backend, this total directional derivative is implemented by:
dRuElemBlock(...)
which consumes:
-
sol.dudg, already containingdqinserted byGetdQ(...) -
sol.dwdgif available -
sol.dodgif AV is active
So dRuElemBlock(...) is already the exact elemental Jacobian action needed for each diagonal block column.
H. Contribution from RuFace(...)¶
RuFaceBlock(...) in backend/Discretization/uresidual.cpp computes the face contribution by:
-
gathering
uh,udg,wdgon faces -
interpolating to face Gauss points
-
evaluating
FhatDriver(...)on interior faces -
evaluating
FbouDriver(...)on boundary faces -
applying
ApplyJacFhat(...) -
projecting back by
Gauss2Node(...)
Therefore the face Jacobian contribution for the diagonal block is
This total derivative is implemented by:
dRuFaceBlock(...)
which uses:
-
sol.duh -
sol.dudg -
sol.dwdg
So again, the exact diagonal-block action is already available through the tangent residual path.
I. Why the exact diagonal block should be assembled from dResidual(...)¶
This is the most important implementation decision.
Because LDG in Exasim recomputes:
-
uh -
q -
w -
AV
inside Residual(...), the exact diagonal block is not just the local derivative of RuElemBlock(...) with respect to the local nodal \(u\).
It must include all implicit dependencies created by the preprocessing steps inside Residual(...).
Therefore, if the goal is the exact block diagonal of the Jacobian of the actual residual returned by Residual(...), the safest implementation is:
-
seed a local basis vector in \(u\)
-
run
dResidual(...)orMatVec(...) -
extract only the same-element residual rows
That is exact by construction for the implemented residual.
4. Proposed implementation¶
A. Affected files¶
The implementation should be built around these files:
-
backend/Discretization/residual.cpp -
backend/Discretization/qresidual.cpp -
backend/Discretization/uresidual.cpp -
backend/Discretization/getuhat.cpp -
backend/Discretization/matvec.cpp
B. Data structure for the block-diagonal matrix¶
Let
Store the diagonal blocks as a dense batched array:
where ne1 is the number of owned elements whose residual rows are in res.Ru.
In code terms this should be a flat array of length:
nloc * nloc * common.ne1
stored element-block by element-block, column-major within each block so it is immediately compatible with batched BLAS/LAPACK-style inversion.
C. Exact assembly algorithm using dResidual(...)¶
This is the recommended exact implementation when Enzyme AD is available.
For each owned element \(K\):
-
Define local block size $\(n_{\mathrm{loc}} = npe \cdot ncu.\)$
-
For each local column \(j = 0, \dots, n_{\mathrm{loc}}-1\):
-
zero all tangent fields:
-
sol.dudg -
sol.dwdgif used -
sol.duh -
sol.dodgif AV is active -
res.dRu -
res.dRq -
res.dRh -
place a unit seed in
sol.dudgcorresponding to the \(j\)-thuDOF of element \(K\) -
call:
-
dResidual(sol, res, app, driver_abi, master, mesh, tmp, common, handle, backend) -
extract the element-residual slice of
res.dRucorresponding to element \(K\) -
store that slice as column \(j\) of the dense block \(J_{KK}\)
-
This works because dResidual(...) already performs:
-
GetdUhat(...) -
GetdQ(...) -
GetdAv(...) -
dRuElem(...) -
dRuFace(...)
and applies the final residual scaling.
So each call returns exactly:
and extracting the rows of element \(K\) gives precisely the column \(j\) of \(J_{KK}\).
D. Exact assembly algorithm using MatVec(...)¶
If Enzyme AD is not available, use the same column-by-column procedure but replace dResidual(...) with:
MatVec(...)
from backend/Discretization/matvec.cpp.
For each seed vector \(e^{(K,j)}\):
-
MatVec(...)returns \(J(U)e^{(K,j)}\) using either: -
first-order finite difference
-
second-order finite difference
-
then extract the same-element residual rows and store the column
This is more expensive and approximate, but it differentiates the actual Residual(...) pipeline and therefore includes all implemented couplings.
E. Practical column-seeding detail¶
The seed must be inserted only into the \(u\) components of sol.dudg, not into \(q\) or other fields.
That means:
-
within element \(K\)
-
within the
udglayout -
only the first
ncucomponents per node are seeded
All \(q\)-component tangent entries must start from zero and be generated by GetdQ(...).
This is essential. Otherwise the result is not \(\partial R/\partial U\), but a derivative with independently perturbed \(q\).
F. Optional inversion for block-Jacobi use¶
If the goal is a block-Jacobi preconditioner, after assembling the blocks one can invert them batched:
This can reuse the same batched inversion pattern already used elsewhere in Exasim for dense small blocks.
The natural storage layout is chosen specifically so that a single batched inversion call can process all element blocks.
G. Why this is better than hand-assembling a separate LDG Jacobian first¶
A manual Jacobian assembly would need to reproduce every chain-rule dependency already encoded in Residual(...), including:
-
interior/boundary
uhconstruction -
local \(q\)-recovery
-
local \(w\)-recovery
-
AV differentiation
-
face orientation details
-
sign/time scaling
That is possible, but it is much more invasive and easier to get wrong.
Using dResidual(...) or MatVec(...) avoids duplicating solver logic and guarantees consistency with the actual residual definition.
5. Risks or numerical concerns¶
-
The AD path is not fully general for all \(w\)/DAE branches. In
backend/Discretization/residual.cpp,dRuResidual(...)explicitly notes: -
TODO: DAE functionality not implemented yet -
Therefore:
-
AD-based block assembly is exact only for the branches actually covered by the tangent implementation
-
FD-based
MatVec(...)is more complete, but approximate -
Finite-difference block assembly is expensive:
-
\(n_{\mathrm{loc}}\) residual evaluations per element block
-
or one global
MatVec(...)per local column if done naively -
Finite-difference quality depends on:
-
common.matvecTol -
common.matvecOrder -
For interface elements in MPI, the exact diagonal block still requires the MPI tangent/residual path, because face terms depend on exchanged neighbor states even when only the diagonal rows are kept.
-
If AV is active and unfrozen, the exact block diagonal of the residual must include
GetAv(...)differentiation. Omitting that gives a different Jacobian. -
A hand-derived "element-only" approximation will generally miss orientation-dependent face-trace effects from
GetUhat(...).
6. Verification plan¶
-
Column test against
dResidual(...)-
Pick an element \(K\) and a random local vector \(z_K\).
-
Form a global seed \(z\) supported only on element \(K\).
-
Compute \(y = Jz\) with
dResidual(...). -
Extract the element-\(K\) rows of \(y\).
-
Compare with \(J_{KK} z_K\) from the assembled dense block.
-
-
Finite-difference spot check
- For a few local columns \(j\), compare the assembled column against: $\(\frac{R(U + \varepsilon e^{(K,j)}) - R(U)}{\varepsilon}\)$ restricted to the element-\(K\) residual rows.
-
Boundary and interior element checks
-
Test one boundary element and one interior element.
-
This verifies both:
-
FbouDriver(...) -
FhatDriver(...) -
UbouDriver(...)
-
-
Physics toggles
-
Verify separately for:
-
common.tdep = 0 -
common.tdep = 1 -
common.ncq = 0and> 0 -
common.ncw = 0and a case withncw > 0 -
AV frozen and unfrozen
-
-
Residual-consistency check
- Confirm the assembled block includes the final residual scaling by
-1and1/common.dtfactorexactly as inResidual(...).
- Confirm the assembled block includes the final residual scaling by
-
Performance sanity
-
Measure cost per element block.
-
If explicit block assembly is too expensive, keep the same mathematical definition but consider:
-
assembling blocks only when the nonlinear state changes significantly
-
or using the block assembly only for preconditioner setup, not every Krylov iteration
-