Discussion:
[petsc-users] gamg failure with petsc-dev
Stephan Kramer
2014-03-20 16:39:09 UTC
Permalink
Hi guys,

We have been having some problems with GAMG on petsc-dev (master) for cases that worked fine on petsc 3.4. We're solving a Stokes equation (just the velocity block) for a simple convection in a square
box (isoviscous). The problem only occurs if we supply a near null space (via MatSetNearNullSpace) where we supply the usual (1,0) (0,1) and (-y,x) (near) null space vectors. If we supply those, the
smoother complains that the diagonal of the A matrix at the first coarsened level contains a zero. If I dump out the prolongator from the finest to the first coarsened level it indeed contains a zero
column at that same index. We're pretty confident that the fine level A matrix is correct (it solves fine with LU). I've briefly spoken to Matt about this and he suggested trying to run with
-pc_gamg_agg_nsmooths 0 (as the default changed from 3.4 -> dev) but that didn't make any difference, the dumped out prolongator still has zero columns, and it crashes in the same way. Do you have any
further suggestions what to try and how to further debug this?

Cheers
Stephan
Jed Brown
2014-03-21 04:24:36 UTC
Permalink
Post by Stephan Kramer
We have been having some problems with GAMG on petsc-dev (master) for
cases that worked fine on petsc 3.4. We're solving a Stokes equation
(just the velocity block) for a simple convection in a square box
(isoviscous). The problem only occurs if we supply a near null space
(via MatSetNearNullSpace) where we supply the usual (1,0) (0,1) and
(-y,x) (near) null space vectors. If we supply those, the smoother
complains that the diagonal of the A matrix at the first coarsened
level contains a zero. If I dump out the prolongator from the finest
to the first coarsened level it indeed contains a zero column at that
same index. We're pretty confident that the fine level A matrix is
correct (it solves fine with LU). I've briefly spoken to Matt about
this and he suggested trying to run with -pc_gamg_agg_nsmooths 0 (as
the default changed from 3.4 -> dev) but that didn't make any
difference, the dumped out prolongator still has zero columns, and it
crashes in the same way. Do you have any further suggestions what to
try and how to further debug this?
Do you set the block size? Can you reproduce by modifying
src/ksp/ksp/examples/tutorials/ex49.c (plane strain elasticity)?
Stephan Kramer
2014-03-21 11:34:46 UTC
Permalink
Post by Jed Brown
Post by Stephan Kramer
We have been having some problems with GAMG on petsc-dev (master) for
cases that worked fine on petsc 3.4. We're solving a Stokes equation
(just the velocity block) for a simple convection in a square box
(isoviscous). The problem only occurs if we supply a near null space
(via MatSetNearNullSpace) where we supply the usual (1,0) (0,1) and
(-y,x) (near) null space vectors. If we supply those, the smoother
complains that the diagonal of the A matrix at the first coarsened
level contains a zero. If I dump out the prolongator from the finest
to the first coarsened level it indeed contains a zero column at that
same index. We're pretty confident that the fine level A matrix is
correct (it solves fine with LU). I've briefly spoken to Matt about
this and he suggested trying to run with -pc_gamg_agg_nsmooths 0 (as
the default changed from 3.4 -> dev) but that didn't make any
difference, the dumped out prolongator still has zero columns, and it
crashes in the same way. Do you have any further suggestions what to
try and how to further debug this?
Do you set the block size? Can you reproduce by modifying
src/ksp/ksp/examples/tutorials/ex49.c (plane strain elasticity)?
I don't set a block size, no. About ex49: Ah great, with master (just updated now) I get:

[***@stommel]{/data/stephan/git/petsc/src/ksp/ksp/examples/tutorials}$ ./ex49 -elas_pc_type gamg -mx 100 -my 100 -mat_no_inode
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Zero diagonal on row 1
[0]PETSC ERROR: See http://http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.4.4-3671-gbb161d1 GIT Date: 2014-03-21 01:14:15 +0000
[0]PETSC ERROR: ./ex49 on a linux-gnu-c-opt named stommel by skramer Fri Mar 21 11:25:55 2014
[0]PETSC ERROR: Configure options --download-fblaslapack=1 --download-blacs=1 --download-scalapack=1 --download-ptscotch=1 --download-mumps=1 --download-hypre=1 --download-suitesparse=1 --download-ml=1
[0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1728 in /data/stephan/git/petsc/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1760 in /data/stephan/git/petsc/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #3 MatSOR() line 3734 in /data/stephan/git/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: #4 PCApply_SOR() line 35 in /data/stephan/git/petsc/src/ksp/pc/impls/sor/sor.c
[0]PETSC ERROR: #5 PCApply() line 440 in /data/stephan/git/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #6 KSP_PCApply() line 227 in /data/stephan/git/petsc/include/petsc-private/kspimpl.h
[0]PETSC ERROR: #7 KSPSolve_Chebyshev() line 456 in /data/stephan/git/petsc/src/ksp/ksp/impls/cheby/cheby.c
[0]PETSC ERROR: #8 KSPSolve() line 458 in /data/stephan/git/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #9 PCMGMCycle_Private() line 19 in /data/stephan/git/petsc/src/ksp/pc/impls/mg/mg.c
[0]PETSC ERROR: #10 PCMGMCycle_Private() line 48 in /data/stephan/git/petsc/src/ksp/pc/impls/mg/mg.c
[0]PETSC ERROR: #11 PCApply_MG() line 330 in /data/stephan/git/petsc/src/ksp/pc/impls/mg/mg.c
[0]PETSC ERROR: #12 PCApply() line 440 in /data/stephan/git/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #13 KSP_PCApply() line 227 in /data/stephan/git/petsc/include/petsc-private/kspimpl.h
[0]PETSC ERROR: #14 KSPInitialResidual() line 63 in /data/stephan/git/petsc/src/ksp/ksp/interface/itres.c
[0]PETSC ERROR: #15 KSPSolve_GMRES() line 234 in /data/stephan/git/petsc/src/ksp/ksp/impls/gmres/gmres.c
[0]PETSC ERROR: #16 KSPSolve() line 458 in /data/stephan/git/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #17 solve_elasticity_2d() line 1053 in /data/stephan/git/petsc/src/ksp/ksp/examples/tutorials/ex49.c
[0]PETSC ERROR: #18 main() line 1104 in /data/stephan/git/petsc/src/ksp/ksp/examples/tutorials/ex49.c
[0]PETSC ERROR: ----------------End of Error Message -------send entire error message to petsc-***@mcs.anl.gov----------

Which is the same error we were getting on our problem
Cheers
Stephan
Stephan Kramer
2014-03-24 17:47:20 UTC
Permalink
Post by Stephan Kramer
Post by Jed Brown
Post by Stephan Kramer
We have been having some problems with GAMG on petsc-dev (master) for
cases that worked fine on petsc 3.4. We're solving a Stokes equation
(just the velocity block) for a simple convection in a square box
(isoviscous). The problem only occurs if we supply a near null space
(via MatSetNearNullSpace) where we supply the usual (1,0) (0,1) and
(-y,x) (near) null space vectors. If we supply those, the smoother
complains that the diagonal of the A matrix at the first coarsened
level contains a zero. If I dump out the prolongator from the finest
to the first coarsened level it indeed contains a zero column at that
same index. We're pretty confident that the fine level A matrix is
correct (it solves fine with LU). I've briefly spoken to Matt about
this and he suggested trying to run with -pc_gamg_agg_nsmooths 0 (as
the default changed from 3.4 -> dev) but that didn't make any
difference, the dumped out prolongator still has zero columns, and it
crashes in the same way. Do you have any further suggestions what to
try and how to further debug this?
Do you set the block size? Can you reproduce by modifying
src/ksp/ksp/examples/tutorials/ex49.c (plane strain elasticity)?
[0]PETSC ERROR: --------------------- Error Message --------------------------------------------------------------
[0]PETSC ERROR: Arguments are incompatible
[0]PETSC ERROR: Zero diagonal on row 1
[0]PETSC ERROR: See http://http://www.mcs.anl.gov/petsc/documentation/faq.html for trouble shooting.
[0]PETSC ERROR: Petsc Development GIT revision: v3.4.4-3671-gbb161d1 GIT Date: 2014-03-21 01:14:15 +0000
[0]PETSC ERROR: ./ex49 on a linux-gnu-c-opt named stommel by skramer Fri Mar 21 11:25:55 2014
[0]PETSC ERROR: Configure options --download-fblaslapack=1 --download-blacs=1 --download-scalapack=1 --download-ptscotch=1 --download-mumps=1 --download-hypre=1 --download-suitesparse=1 --download-ml=1
[0]PETSC ERROR: #1 MatInvertDiagonal_SeqAIJ() line 1728 in /data/stephan/git/petsc/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #2 MatSOR_SeqAIJ() line 1760 in /data/stephan/git/petsc/src/mat/impls/aij/seq/aij.c
[0]PETSC ERROR: #3 MatSOR() line 3734 in /data/stephan/git/petsc/src/mat/interface/matrix.c
[0]PETSC ERROR: #4 PCApply_SOR() line 35 in /data/stephan/git/petsc/src/ksp/pc/impls/sor/sor.c
[0]PETSC ERROR: #5 PCApply() line 440 in /data/stephan/git/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #6 KSP_PCApply() line 227 in /data/stephan/git/petsc/include/petsc-private/kspimpl.h
[0]PETSC ERROR: #7 KSPSolve_Chebyshev() line 456 in /data/stephan/git/petsc/src/ksp/ksp/impls/cheby/cheby.c
[0]PETSC ERROR: #8 KSPSolve() line 458 in /data/stephan/git/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #9 PCMGMCycle_Private() line 19 in /data/stephan/git/petsc/src/ksp/pc/impls/mg/mg.c
[0]PETSC ERROR: #10 PCMGMCycle_Private() line 48 in /data/stephan/git/petsc/src/ksp/pc/impls/mg/mg.c
[0]PETSC ERROR: #11 PCApply_MG() line 330 in /data/stephan/git/petsc/src/ksp/pc/impls/mg/mg.c
[0]PETSC ERROR: #12 PCApply() line 440 in /data/stephan/git/petsc/src/ksp/pc/interface/precon.c
[0]PETSC ERROR: #13 KSP_PCApply() line 227 in /data/stephan/git/petsc/include/petsc-private/kspimpl.h
[0]PETSC ERROR: #14 KSPInitialResidual() line 63 in /data/stephan/git/petsc/src/ksp/ksp/interface/itres.c
[0]PETSC ERROR: #15 KSPSolve_GMRES() line 234 in /data/stephan/git/petsc/src/ksp/ksp/impls/gmres/gmres.c
[0]PETSC ERROR: #16 KSPSolve() line 458 in /data/stephan/git/petsc/src/ksp/ksp/interface/itfunc.c
[0]PETSC ERROR: #17 solve_elasticity_2d() line 1053 in /data/stephan/git/petsc/src/ksp/ksp/examples/tutorials/ex49.c
[0]PETSC ERROR: #18 main() line 1104 in /data/stephan/git/petsc/src/ksp/ksp/examples/tutorials/ex49.c
Which is the same error we were getting on our problem
Cheers
Stephan
Ok, I found out a bit more. The fact that the prolongator has zero columns appears to arise in petsc 3.4 as well. The only reason it wasn't flagged before is that the default for the smoother (not the
aggregation smoother but the standard pre and post smoothing) changed from jacobi to sor. I can make the example work with the additional option:

$ ./ex49 -elas_pc_type gamg -mx 100 -my 100 -mat_no_inode -elas_mg_levels_1_pc_type jacobi

Vice versa, if in petsc 3.4.4 I change ex49 to include the near nullspace (the /* constrain near-null space bit */) at the end, it works with jacobi (the default in 3.4) but it breaks with sor with
the same error message as above. I'm not entirely sure why jacobi doesn't give an error with a zero on the diagonal, but the zero column also means that the related coarse dof doesn't actually affect
the fine grid solution.

I think (but I might be barking up the wrong tree here) that the zero columns appear because the aggregation method typically will have a few small aggregates that are not big enough to support the
polynomials of the near null space (i.e. the polynomials restricted to an aggregate are not linearly independent). A solution would be to reduce the number of polynomials for these aggregates (only
take the linearly independent). Obviously this has the down-side that the degrees of freedom per aggregate at the coarse level is no longer a constant making the administration more complicated. It
would be nice to find a solution though as I've always been taught that jacobi is not a robust smoother for multigrid.

Cheers
Stephan
Mark Adams
2014-03-29 21:52:30 UTC
Permalink
Sorry for getting to this late. I think you have figured it out basically
but there are a few things:

1) You must set the block size of A (bs=2) for the null spaces to work and
for aggregation MG to work properly. SA-AMG really does not make sense
unless you work at the vertex level, for which we need the block size.

2) You must be right that the zero column is because the aggregation
produced a singleton aggregate. And so the coarse grid is low rank. This
is not catastrophic, it is like a fake BC equations. The numerics just
have to work around it. Jacobi does this. I will fix SOR.

Mark
Post by Stephan Kramer
Ok, I found out a bit more. The fact that the prolongator has zero columns
appears to arise in petsc 3.4 as well. The only reason it wasn't flagged
before is that the default for the smoother (not the aggregation smoother
but the standard pre and post smoothing) changed from jacobi to sor. I can
$ ./ex49 -elas_pc_type gamg -mx 100 -my 100 -mat_no_inode
-elas_mg_levels_1_pc_type jacobi
Vice versa, if in petsc 3.4.4 I change ex49 to include the near nullspace
(the /* constrain near-null space bit */) at the end, it works with jacobi
(the default in 3.4) but it breaks with sor with the same error message as
above. I'm not entirely sure why jacobi doesn't give an error with a zero
on the diagonal, but the zero column also means that the related coarse dof
doesn't actually affect the fine grid solution.
I think (but I might be barking up the wrong tree here) that the zero
columns appear because the aggregation method typically will have a few
small aggregates that are not big enough to support the polynomials of the
near null space (i.e. the polynomials restricted to an aggregate are not
linearly independent). A solution would be to reduce the number of
polynomials for these aggregates (only take the linearly independent).
Obviously this has the down-side that the degrees of freedom per aggregate
at the coarse level is no longer a constant making the administration more
complicated. It would be nice to find a solution though as I've always been
taught that jacobi is not a robust smoother for multigrid.
Cheers
Stephan
Mark Adams
2014-04-01 15:07:14 UTC
Permalink
Stephan, I have pushed a pull request to fix this but for now you can just
use -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi. This used to
be the default be we move to SOR recently.
Mark
Post by Mark Adams
Sorry for getting to this late. I think you have figured it out basically
1) You must set the block size of A (bs=2) for the null spaces to work and
for aggregation MG to work properly. SA-AMG really does not make sense
unless you work at the vertex level, for which we need the block size.
2) You must be right that the zero column is because the aggregation
produced a singleton aggregate. And so the coarse grid is low rank. This
is not catastrophic, it is like a fake BC equations. The numerics just
have to work around it. Jacobi does this. I will fix SOR.
Mark
Post by Stephan Kramer
Ok, I found out a bit more. The fact that the prolongator has zero
columns appears to arise in petsc 3.4 as well. The only reason it wasn't
flagged before is that the default for the smoother (not the aggregation
smoother but the standard pre and post smoothing) changed from jacobi to
$ ./ex49 -elas_pc_type gamg -mx 100 -my 100 -mat_no_inode
-elas_mg_levels_1_pc_type jacobi
Vice versa, if in petsc 3.4.4 I change ex49 to include the near nullspace
(the /* constrain near-null space bit */) at the end, it works with jacobi
(the default in 3.4) but it breaks with sor with the same error message as
above. I'm not entirely sure why jacobi doesn't give an error with a zero
on the diagonal, but the zero column also means that the related coarse dof
doesn't actually affect the fine grid solution.
I think (but I might be barking up the wrong tree here) that the zero
columns appear because the aggregation method typically will have a few
small aggregates that are not big enough to support the polynomials of the
near null space (i.e. the polynomials restricted to an aggregate are not
linearly independent). A solution would be to reduce the number of
polynomials for these aggregates (only take the linearly independent).
Obviously this has the down-side that the degrees of freedom per aggregate
at the coarse level is no longer a constant making the administration more
complicated. It would be nice to find a solution though as I've always been
taught that jacobi is not a robust smoother for multigrid.
Cheers
Stephan
Stephan Kramer
2014-04-01 18:10:49 UTC
Permalink
Post by Mark Adams
Stephan, I have pushed a pull request to fix this but for now you can just
use -mg_levels_ksp_type chebyshev -mg_levels_pc_type jacobi. This used to
be the default be we move to SOR recently.
Mark
Ah, that's great news. Thanks a lot for the effort. You're right: the previous defaults should be fine for us; your fix should hopefully only improve things
Post by Mark Adams
Post by Mark Adams
Sorry for getting to this late. I think you have figured it out basically
1) You must set the block size of A (bs=2) for the null spaces to work and
for aggregation MG to work properly. SA-AMG really does not make sense
unless you work at the vertex level, for which we need the block size.
Yes indeed. I've come to realize this now by looking into how smoothed aggregation with a near null space actually works. We currently have our dofs numbered the wrong way around (vertices on the
inside, velocity component on the outside - which made sense for other eqns we solve with the model) so will take a bit of work, but might well be worth the effort

Thanks a lot for looking into this
Cheers
Stephan
Post by Mark Adams
Post by Mark Adams
2) You must be right that the zero column is because the aggregation
produced a singleton aggregate. And so the coarse grid is low rank. This
is not catastrophic, it is like a fake BC equations. The numerics just
have to work around it. Jacobi does this. I will fix SOR.
Mark
Post by Stephan Kramer
Ok, I found out a bit more. The fact that the prolongator has zero
columns appears to arise in petsc 3.4 as well. The only reason it wasn't
flagged before is that the default for the smoother (not the aggregation
smoother but the standard pre and post smoothing) changed from jacobi to
$ ./ex49 -elas_pc_type gamg -mx 100 -my 100 -mat_no_inode
-elas_mg_levels_1_pc_type jacobi
Vice versa, if in petsc 3.4.4 I change ex49 to include the near nullspace
(the /* constrain near-null space bit */) at the end, it works with jacobi
(the default in 3.4) but it breaks with sor with the same error message as
above. I'm not entirely sure why jacobi doesn't give an error with a zero
on the diagonal, but the zero column also means that the related coarse dof
doesn't actually affect the fine grid solution.
I think (but I might be barking up the wrong tree here) that the zero
columns appear because the aggregation method typically will have a few
small aggregates that are not big enough to support the polynomials of the
near null space (i.e. the polynomials restricted to an aggregate are not
linearly independent). A solution would be to reduce the number of
polynomials for these aggregates (only take the linearly independent).
Obviously this has the down-side that the degrees of freedom per aggregate
at the coarse level is no longer a constant making the administration more
complicated. It would be nice to find a solution though as I've always been
taught that jacobi is not a robust smoother for multigrid.
Cheers
Stephan
Jed Brown
2014-04-01 18:17:29 UTC
Permalink
Post by Stephan Kramer
Yes indeed. I've come to realize this now by looking into how smoothed
aggregation with a near null space actually works. We currently have
our dofs numbered the wrong way around (vertices on the inside,
velocity component on the outside - which made sense for other eqns we
solve with the model) so will take a bit of work, but might well be
worth the effort
The memory streaming and cache reuse is much better if you interlace the
degrees of freedom. This is as true now as it was at the time of the
PETSc-FUN3D papers. When evaluating the "physics", it can be useful to
pack the interlaced degrees of freedom into a vector-friendly ordering.

The AMG solve is plenty expensive that you can pack/solve/unpack an
interlaced vector at negligible cost without changing the rest of your
code.

Mark, should we provide some more flexible way to label "fields"? It
will be more complicated than the present code and I think packing into
interlaced format is faster anyway.
Mark Adams
2014-08-06 13:11:55 UTC
Permalink
Post by Jed Brown
Mark, should we provide some more flexible way to label "fields"? It
will be more complicated than the present code and I think packing into
interlaced format is faster anyway.
I'm thinking this would entail reordering the matrix and inverting this
permutation for the fine grid R & P. This would be some work for a small
number of people. It might be nice to be able to detect this and give a
warning but I don't see how this could be done.

Loading...